Enhanced Arabic Information Retrieval: Light Stemming and Stop Words

Jaffar Atwan, Masnizah Mohd, Ghassan Kanaan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

Stemming is a process of reducing inflected words to their stem, base or root from a generally written word form. For languages that is high inflected like Arabic. Stemming improve the retrieval performance by reducing words variants. The effectiveness of stop words lists with light stemming for Arabic information retrieval (General stopwords list, Khoja stopwords list, Combined stopwords list), were investigated in this paper. Using vector space model as the popular weighting scheme was examined. The idea is to combine (General and Khoja) stopwords lists with light stemming to enhance the performance, and compare their effects on retrieval. The Linguistic Data Consortium (LDC) Arabic Newswire data set was used. The best performance was achieved with the Combined stopwords list, with light stemming.

Original languageEnglish
Title of host publicationCommunications in Computer and Information Science
PublisherSpringer Verlag
Pages219-228
Number of pages10
Volume378 CCIS
ISBN (Print)9783642405662
DOIs
Publication statusPublished - 2013
Event2nd International Multi-Conference on Artificial Intelligence Technology, M-CAIT 2013 - Shah Alam
Duration: 28 Aug 201329 Aug 2013

Publication series

NameCommunications in Computer and Information Science
Volume378 CCIS
ISSN (Print)18650929

Other

Other2nd International Multi-Conference on Artificial Intelligence Technology, M-CAIT 2013
CityShah Alam
Period28/8/1329/8/13

Fingerprint

Information retrieval
Vector spaces
Linguistics

Keywords

  • Arabic
  • Information Retrieval
  • Stemming
  • Stopword

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Atwan, J., Mohd, M., & Kanaan, G. (2013). Enhanced Arabic Information Retrieval: Light Stemming and Stop Words. In Communications in Computer and Information Science (Vol. 378 CCIS, pp. 219-228). (Communications in Computer and Information Science; Vol. 378 CCIS). Springer Verlag. https://doi.org/10.1007/978-3-642-40567-9_19

Enhanced Arabic Information Retrieval : Light Stemming and Stop Words. / Atwan, Jaffar; Mohd, Masnizah; Kanaan, Ghassan.

Communications in Computer and Information Science. Vol. 378 CCIS Springer Verlag, 2013. p. 219-228 (Communications in Computer and Information Science; Vol. 378 CCIS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Atwan, J, Mohd, M & Kanaan, G 2013, Enhanced Arabic Information Retrieval: Light Stemming and Stop Words. in Communications in Computer and Information Science. vol. 378 CCIS, Communications in Computer and Information Science, vol. 378 CCIS, Springer Verlag, pp. 219-228, 2nd International Multi-Conference on Artificial Intelligence Technology, M-CAIT 2013, Shah Alam, 28/8/13. https://doi.org/10.1007/978-3-642-40567-9_19
Atwan J, Mohd M, Kanaan G. Enhanced Arabic Information Retrieval: Light Stemming and Stop Words. In Communications in Computer and Information Science. Vol. 378 CCIS. Springer Verlag. 2013. p. 219-228. (Communications in Computer and Information Science). https://doi.org/10.1007/978-3-642-40567-9_19
Atwan, Jaffar ; Mohd, Masnizah ; Kanaan, Ghassan. / Enhanced Arabic Information Retrieval : Light Stemming and Stop Words. Communications in Computer and Information Science. Vol. 378 CCIS Springer Verlag, 2013. pp. 219-228 (Communications in Computer and Information Science).
@inproceedings{25d44553341b408e947473ee39c20774,
title = "Enhanced Arabic Information Retrieval: Light Stemming and Stop Words",
abstract = "Stemming is a process of reducing inflected words to their stem, base or root from a generally written word form. For languages that is high inflected like Arabic. Stemming improve the retrieval performance by reducing words variants. The effectiveness of stop words lists with light stemming for Arabic information retrieval (General stopwords list, Khoja stopwords list, Combined stopwords list), were investigated in this paper. Using vector space model as the popular weighting scheme was examined. The idea is to combine (General and Khoja) stopwords lists with light stemming to enhance the performance, and compare their effects on retrieval. The Linguistic Data Consortium (LDC) Arabic Newswire data set was used. The best performance was achieved with the Combined stopwords list, with light stemming.",
keywords = "Arabic, Information Retrieval, Stemming, Stopword",
author = "Jaffar Atwan and Masnizah Mohd and Ghassan Kanaan",
year = "2013",
doi = "10.1007/978-3-642-40567-9_19",
language = "English",
isbn = "9783642405662",
volume = "378 CCIS",
series = "Communications in Computer and Information Science",
publisher = "Springer Verlag",
pages = "219--228",
booktitle = "Communications in Computer and Information Science",

}

TY - GEN

T1 - Enhanced Arabic Information Retrieval

T2 - Light Stemming and Stop Words

AU - Atwan, Jaffar

AU - Mohd, Masnizah

AU - Kanaan, Ghassan

PY - 2013

Y1 - 2013

N2 - Stemming is a process of reducing inflected words to their stem, base or root from a generally written word form. For languages that is high inflected like Arabic. Stemming improve the retrieval performance by reducing words variants. The effectiveness of stop words lists with light stemming for Arabic information retrieval (General stopwords list, Khoja stopwords list, Combined stopwords list), were investigated in this paper. Using vector space model as the popular weighting scheme was examined. The idea is to combine (General and Khoja) stopwords lists with light stemming to enhance the performance, and compare their effects on retrieval. The Linguistic Data Consortium (LDC) Arabic Newswire data set was used. The best performance was achieved with the Combined stopwords list, with light stemming.

AB - Stemming is a process of reducing inflected words to their stem, base or root from a generally written word form. For languages that is high inflected like Arabic. Stemming improve the retrieval performance by reducing words variants. The effectiveness of stop words lists with light stemming for Arabic information retrieval (General stopwords list, Khoja stopwords list, Combined stopwords list), were investigated in this paper. Using vector space model as the popular weighting scheme was examined. The idea is to combine (General and Khoja) stopwords lists with light stemming to enhance the performance, and compare their effects on retrieval. The Linguistic Data Consortium (LDC) Arabic Newswire data set was used. The best performance was achieved with the Combined stopwords list, with light stemming.

KW - Arabic

KW - Information Retrieval

KW - Stemming

KW - Stopword

UR - http://www.scopus.com/inward/record.url?scp=84904708576&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84904708576&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-40567-9_19

DO - 10.1007/978-3-642-40567-9_19

M3 - Conference contribution

AN - SCOPUS:84904708576

SN - 9783642405662

VL - 378 CCIS

T3 - Communications in Computer and Information Science

SP - 219

EP - 228

BT - Communications in Computer and Information Science

PB - Springer Verlag

ER -