A Comparative Study of Open-Domain and Specific-Domain Word Sense Disambiguation Based on Quranic Information Retrieval

Rehab Hasan Abood, Sabrina Tiun

Research output: Contribution to journalArticle

Abstract

Information retrieval is the process of analysing typed query as well as to retrieve relevant document according to the user query. Several issues can significantly affect the effectiveness of information retrieval. One of the common issue is the ambiguity lies on the words where a single word could yield several meanings. The process of identifying the exact sense of word is called Word Sense Disambiguation (WSD). Quran is the holly book for nearly 1.5 billion Muslims around the world. In particularly, Quran contains numerous words that can undergone multiple meanings. Therefore, there is a vital demand to apply WSD approach on Quran, in order, to improve the information retrieval. Several WSD approaches have been proposed for Quranic retrieval. However, these approaches are divided into two main categories; open-domain WSD approach and specific-domain WSD approach. Open-domain WSD is an approach that utilizes an open-domain dictionary such as WordNet, that is exploited to provide the exact sense. Whereas, domain-specific WSD approach aims to utilize a restricted training data that contain specific senses related to the domain of Quran. Hence, this study aims to establish a comparative study to investigate the two WSD categories including domain-specific and open-domain. For the domain-specific approach, a predefined example data has been collected to train Yarwosky algorithm which is a semisupervised machine learning technique. Then, based on the training, such algorithm can classify the exact sense for the words. In contrast, WordNet which is an open-domain dictionary has been used in this study with semantic distances, in order, to identify the similarity between the query word and the results of WordNet's concepts. That dataset that has been used in this study is a Quranic translation. The experimental results have shown the mixed superiority of Yarwosky algorithm and WordNet WSD approach.

Original languageEnglish
Article number00071
JournalMATEC Web of Conferences
Volume135
DOIs
Publication statusPublished - 20 Nov 2017

Fingerprint

Information retrieval
Glossaries
Learning systems
Semantics

ASJC Scopus subject areas

  • Chemistry(all)
  • Materials Science(all)
  • Engineering(all)

Cite this

A Comparative Study of Open-Domain and Specific-Domain Word Sense Disambiguation Based on Quranic Information Retrieval. / Hasan Abood, Rehab; Tiun, Sabrina.

In: MATEC Web of Conferences, Vol. 135, 00071, 20.11.2017.

Research output: Contribution to journalArticle

@article{a5c531e6133741ecb4b47d7035dd4c46,
title = "A Comparative Study of Open-Domain and Specific-Domain Word Sense Disambiguation Based on Quranic Information Retrieval",
abstract = "Information retrieval is the process of analysing typed query as well as to retrieve relevant document according to the user query. Several issues can significantly affect the effectiveness of information retrieval. One of the common issue is the ambiguity lies on the words where a single word could yield several meanings. The process of identifying the exact sense of word is called Word Sense Disambiguation (WSD). Quran is the holly book for nearly 1.5 billion Muslims around the world. In particularly, Quran contains numerous words that can undergone multiple meanings. Therefore, there is a vital demand to apply WSD approach on Quran, in order, to improve the information retrieval. Several WSD approaches have been proposed for Quranic retrieval. However, these approaches are divided into two main categories; open-domain WSD approach and specific-domain WSD approach. Open-domain WSD is an approach that utilizes an open-domain dictionary such as WordNet, that is exploited to provide the exact sense. Whereas, domain-specific WSD approach aims to utilize a restricted training data that contain specific senses related to the domain of Quran. Hence, this study aims to establish a comparative study to investigate the two WSD categories including domain-specific and open-domain. For the domain-specific approach, a predefined example data has been collected to train Yarwosky algorithm which is a semisupervised machine learning technique. Then, based on the training, such algorithm can classify the exact sense for the words. In contrast, WordNet which is an open-domain dictionary has been used in this study with semantic distances, in order, to identify the similarity between the query word and the results of WordNet's concepts. That dataset that has been used in this study is a Quranic translation. The experimental results have shown the mixed superiority of Yarwosky algorithm and WordNet WSD approach.",
author = "{Hasan Abood}, Rehab and Sabrina Tiun",
year = "2017",
month = "11",
day = "20",
doi = "10.1051/matecconf/201713500071",
language = "English",
volume = "135",
journal = "MATEC Web of Conferences",
issn = "2261-236X",
publisher = "EDP Sciences",

}

TY - JOUR

T1 - A Comparative Study of Open-Domain and Specific-Domain Word Sense Disambiguation Based on Quranic Information Retrieval

AU - Hasan Abood, Rehab

AU - Tiun, Sabrina

PY - 2017/11/20

Y1 - 2017/11/20

N2 - Information retrieval is the process of analysing typed query as well as to retrieve relevant document according to the user query. Several issues can significantly affect the effectiveness of information retrieval. One of the common issue is the ambiguity lies on the words where a single word could yield several meanings. The process of identifying the exact sense of word is called Word Sense Disambiguation (WSD). Quran is the holly book for nearly 1.5 billion Muslims around the world. In particularly, Quran contains numerous words that can undergone multiple meanings. Therefore, there is a vital demand to apply WSD approach on Quran, in order, to improve the information retrieval. Several WSD approaches have been proposed for Quranic retrieval. However, these approaches are divided into two main categories; open-domain WSD approach and specific-domain WSD approach. Open-domain WSD is an approach that utilizes an open-domain dictionary such as WordNet, that is exploited to provide the exact sense. Whereas, domain-specific WSD approach aims to utilize a restricted training data that contain specific senses related to the domain of Quran. Hence, this study aims to establish a comparative study to investigate the two WSD categories including domain-specific and open-domain. For the domain-specific approach, a predefined example data has been collected to train Yarwosky algorithm which is a semisupervised machine learning technique. Then, based on the training, such algorithm can classify the exact sense for the words. In contrast, WordNet which is an open-domain dictionary has been used in this study with semantic distances, in order, to identify the similarity between the query word and the results of WordNet's concepts. That dataset that has been used in this study is a Quranic translation. The experimental results have shown the mixed superiority of Yarwosky algorithm and WordNet WSD approach.

AB - Information retrieval is the process of analysing typed query as well as to retrieve relevant document according to the user query. Several issues can significantly affect the effectiveness of information retrieval. One of the common issue is the ambiguity lies on the words where a single word could yield several meanings. The process of identifying the exact sense of word is called Word Sense Disambiguation (WSD). Quran is the holly book for nearly 1.5 billion Muslims around the world. In particularly, Quran contains numerous words that can undergone multiple meanings. Therefore, there is a vital demand to apply WSD approach on Quran, in order, to improve the information retrieval. Several WSD approaches have been proposed for Quranic retrieval. However, these approaches are divided into two main categories; open-domain WSD approach and specific-domain WSD approach. Open-domain WSD is an approach that utilizes an open-domain dictionary such as WordNet, that is exploited to provide the exact sense. Whereas, domain-specific WSD approach aims to utilize a restricted training data that contain specific senses related to the domain of Quran. Hence, this study aims to establish a comparative study to investigate the two WSD categories including domain-specific and open-domain. For the domain-specific approach, a predefined example data has been collected to train Yarwosky algorithm which is a semisupervised machine learning technique. Then, based on the training, such algorithm can classify the exact sense for the words. In contrast, WordNet which is an open-domain dictionary has been used in this study with semantic distances, in order, to identify the similarity between the query word and the results of WordNet's concepts. That dataset that has been used in this study is a Quranic translation. The experimental results have shown the mixed superiority of Yarwosky algorithm and WordNet WSD approach.

UR - http://www.scopus.com/inward/record.url?scp=85036468297&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85036468297&partnerID=8YFLogxK

U2 - 10.1051/matecconf/201713500071

DO - 10.1051/matecconf/201713500071

M3 - Article

AN - SCOPUS:85036468297

VL - 135

JO - MATEC Web of Conferences

JF - MATEC Web of Conferences

SN - 2261-236X

M1 - 00071

ER -