Lexical scoring system of lexical chain for Quranic document retrieval

Hamed Zakeri Rad, Sabrina Tiun, Saidah Saad

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

An Information Retrieval (IR) system aims to extract information based on a query made by a user on a particular subject from an extensive collection of text. IR is a process through which information is retrieved by submitting a query by a user in the form of keywords or to match words. In the Al-Quran, verses of the same or comparable topics are scattered throughout the text in different chapters, and it is therefore difficult for users to remember the many keywords of the verses. Therefore, in such situations, retrieving information using semantically related words is useful. In well-composed documents, the semantic integrity of the text (coherence) exists between the words. Lexical cohesion is the results of chains of related words that contribute to the continuity of the lexical meaning found within the text are a direct result of text being about the same thing (i.e. topic, etc.). This indicates that using an IR system and lexical chains are a useful and appropriate method for representing documents with concepts rather than using terms in order to have successful retrieval based on semantic relations. Therefore, a new Lexical Scoring System is proposed in this study, in addition to determining the semantic relation that exists between words whereby WordNet was used as the semantic knowledge base. The proposed scoring system helped to retrieve 86.58% of the total relevant documents in the Al-Quran based on the relevance judgment, using the lexical chain approach. Based on the findings, the study concludes that, the proposed approach on representing verses using lexical chains is appropriate and suitable for a Quranic IR system.

Original languageEnglish
Pages (from-to)59-79
Number of pages21
JournalGEMA Online Journal of Language Studies
Volume18
Issue number2
DOIs
Publication statusPublished - 1 May 2018

Fingerprint

information retrieval
semantics
group cohesion
integrity
continuity
Scoring
Quran
Information Retrieval
Verse
Key Words
Semantic Relations

Keywords

  • Information retrieval (IR)
  • Lexical chain
  • Lexical scoring system
  • Quranic semantic retrieval system
  • Semantic retrieval

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language
  • Literature and Literary Theory

Cite this

Lexical scoring system of lexical chain for Quranic document retrieval. / Rad, Hamed Zakeri; Tiun, Sabrina; Saad, Saidah.

In: GEMA Online Journal of Language Studies, Vol. 18, No. 2, 01.05.2018, p. 59-79.

Research output: Contribution to journalArticle

@article{38bda39731ff49dabbf369b472ad185b,
title = "Lexical scoring system of lexical chain for Quranic document retrieval",
abstract = "An Information Retrieval (IR) system aims to extract information based on a query made by a user on a particular subject from an extensive collection of text. IR is a process through which information is retrieved by submitting a query by a user in the form of keywords or to match words. In the Al-Quran, verses of the same or comparable topics are scattered throughout the text in different chapters, and it is therefore difficult for users to remember the many keywords of the verses. Therefore, in such situations, retrieving information using semantically related words is useful. In well-composed documents, the semantic integrity of the text (coherence) exists between the words. Lexical cohesion is the results of chains of related words that contribute to the continuity of the lexical meaning found within the text are a direct result of text being about the same thing (i.e. topic, etc.). This indicates that using an IR system and lexical chains are a useful and appropriate method for representing documents with concepts rather than using terms in order to have successful retrieval based on semantic relations. Therefore, a new Lexical Scoring System is proposed in this study, in addition to determining the semantic relation that exists between words whereby WordNet was used as the semantic knowledge base. The proposed scoring system helped to retrieve 86.58{\%} of the total relevant documents in the Al-Quran based on the relevance judgment, using the lexical chain approach. Based on the findings, the study concludes that, the proposed approach on representing verses using lexical chains is appropriate and suitable for a Quranic IR system.",
keywords = "Information retrieval (IR), Lexical chain, Lexical scoring system, Quranic semantic retrieval system, Semantic retrieval",
author = "Rad, {Hamed Zakeri} and Sabrina Tiun and Saidah Saad",
year = "2018",
month = "5",
day = "1",
doi = "10.17576/gema-2018-1802-05",
language = "English",
volume = "18",
pages = "59--79",
journal = "GEMA Online Journal of Language Studies",
issn = "1675-8021",
publisher = "Universiti Kebangsaan Malaysia",
number = "2",

}

TY - JOUR

T1 - Lexical scoring system of lexical chain for Quranic document retrieval

AU - Rad, Hamed Zakeri

AU - Tiun, Sabrina

AU - Saad, Saidah

PY - 2018/5/1

Y1 - 2018/5/1

N2 - An Information Retrieval (IR) system aims to extract information based on a query made by a user on a particular subject from an extensive collection of text. IR is a process through which information is retrieved by submitting a query by a user in the form of keywords or to match words. In the Al-Quran, verses of the same or comparable topics are scattered throughout the text in different chapters, and it is therefore difficult for users to remember the many keywords of the verses. Therefore, in such situations, retrieving information using semantically related words is useful. In well-composed documents, the semantic integrity of the text (coherence) exists between the words. Lexical cohesion is the results of chains of related words that contribute to the continuity of the lexical meaning found within the text are a direct result of text being about the same thing (i.e. topic, etc.). This indicates that using an IR system and lexical chains are a useful and appropriate method for representing documents with concepts rather than using terms in order to have successful retrieval based on semantic relations. Therefore, a new Lexical Scoring System is proposed in this study, in addition to determining the semantic relation that exists between words whereby WordNet was used as the semantic knowledge base. The proposed scoring system helped to retrieve 86.58% of the total relevant documents in the Al-Quran based on the relevance judgment, using the lexical chain approach. Based on the findings, the study concludes that, the proposed approach on representing verses using lexical chains is appropriate and suitable for a Quranic IR system.

AB - An Information Retrieval (IR) system aims to extract information based on a query made by a user on a particular subject from an extensive collection of text. IR is a process through which information is retrieved by submitting a query by a user in the form of keywords or to match words. In the Al-Quran, verses of the same or comparable topics are scattered throughout the text in different chapters, and it is therefore difficult for users to remember the many keywords of the verses. Therefore, in such situations, retrieving information using semantically related words is useful. In well-composed documents, the semantic integrity of the text (coherence) exists between the words. Lexical cohesion is the results of chains of related words that contribute to the continuity of the lexical meaning found within the text are a direct result of text being about the same thing (i.e. topic, etc.). This indicates that using an IR system and lexical chains are a useful and appropriate method for representing documents with concepts rather than using terms in order to have successful retrieval based on semantic relations. Therefore, a new Lexical Scoring System is proposed in this study, in addition to determining the semantic relation that exists between words whereby WordNet was used as the semantic knowledge base. The proposed scoring system helped to retrieve 86.58% of the total relevant documents in the Al-Quran based on the relevance judgment, using the lexical chain approach. Based on the findings, the study concludes that, the proposed approach on representing verses using lexical chains is appropriate and suitable for a Quranic IR system.

KW - Information retrieval (IR)

KW - Lexical chain

KW - Lexical scoring system

KW - Quranic semantic retrieval system

KW - Semantic retrieval

UR - http://www.scopus.com/inward/record.url?scp=85047948853&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85047948853&partnerID=8YFLogxK

U2 - 10.17576/gema-2018-1802-05

DO - 10.17576/gema-2018-1802-05

M3 - Article

AN - SCOPUS:85047948853

VL - 18

SP - 59

EP - 79

JO - GEMA Online Journal of Language Studies

JF - GEMA Online Journal of Language Studies

SN - 1675-8021

IS - 2

ER -