Impact of stemmer on arabic text retrieval

Jaffar Atwan, Masnizah Mohd, Ghassan Kanaan, Qusay Bsoul

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Stemming is a process of reducing inflected words to their stem, stem or root from a generally written word form. One of the high inflected words in the languages world is Arabic Language. Stemming improve the retrieval performance by reducing words variants, and in lcrease the similarity between related words. However, an Arabic Information Retrieval (AIR) can use stemming algorithms to retrieve a greater number of documents related to the users’ query. Therefore, the aim of this paper is to evaluate the impact of three different Arabic stemmers (i.e. ‘Information Science Research Institute” (ISRI), morphological and syntax based lemmatization “Educated Text Stemmer” (ETS), and Light10 stemmer) on the Arabic Information Retrieval performance for Arabic language, we used the Linguistic Data Consortium (LDC) Arabic Newswire data set as benchmark dataset. The evaluation of the three different stemmers ranked the best performance was achieved by light10 stemmer in term of mean average precision.

Original languageEnglish
Pages (from-to)314-326
Number of pages13
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8870
Publication statusPublished - 2014
Externally publishedYes

Fingerprint

Text Retrieval
Information retrieval
Information Retrieval
Information science
Linguistics
Retrieval
Roots
Query
Benchmark
Evaluate
Evaluation
Term
Language

Keywords

  • Arabic language
  • Educated Text Stemmer
  • Information retrieval
  • Stemming

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Impact of stemmer on arabic text retrieval. / Atwan, Jaffar; Mohd, Masnizah; Kanaan, Ghassan; Bsoul, Qusay.

In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 8870, 2014, p. 314-326.

Research output: Contribution to journalArticle

@article{0cfb384aaadc484c8348517897db8eb7,
title = "Impact of stemmer on arabic text retrieval",
abstract = "Stemming is a process of reducing inflected words to their stem, stem or root from a generally written word form. One of the high inflected words in the languages world is Arabic Language. Stemming improve the retrieval performance by reducing words variants, and in lcrease the similarity between related words. However, an Arabic Information Retrieval (AIR) can use stemming algorithms to retrieve a greater number of documents related to the users’ query. Therefore, the aim of this paper is to evaluate the impact of three different Arabic stemmers (i.e. ‘Information Science Research Institute” (ISRI), morphological and syntax based lemmatization “Educated Text Stemmer” (ETS), and Light10 stemmer) on the Arabic Information Retrieval performance for Arabic language, we used the Linguistic Data Consortium (LDC) Arabic Newswire data set as benchmark dataset. The evaluation of the three different stemmers ranked the best performance was achieved by light10 stemmer in term of mean average precision.",
keywords = "Arabic language, Educated Text Stemmer, Information retrieval, Stemming",
author = "Jaffar Atwan and Masnizah Mohd and Ghassan Kanaan and Qusay Bsoul",
year = "2014",
language = "English",
volume = "8870",
pages = "314--326",
journal = "Lecture Notes in Computer Science",
issn = "0302-9743",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - Impact of stemmer on arabic text retrieval

AU - Atwan, Jaffar

AU - Mohd, Masnizah

AU - Kanaan, Ghassan

AU - Bsoul, Qusay

PY - 2014

Y1 - 2014

N2 - Stemming is a process of reducing inflected words to their stem, stem or root from a generally written word form. One of the high inflected words in the languages world is Arabic Language. Stemming improve the retrieval performance by reducing words variants, and in lcrease the similarity between related words. However, an Arabic Information Retrieval (AIR) can use stemming algorithms to retrieve a greater number of documents related to the users’ query. Therefore, the aim of this paper is to evaluate the impact of three different Arabic stemmers (i.e. ‘Information Science Research Institute” (ISRI), morphological and syntax based lemmatization “Educated Text Stemmer” (ETS), and Light10 stemmer) on the Arabic Information Retrieval performance for Arabic language, we used the Linguistic Data Consortium (LDC) Arabic Newswire data set as benchmark dataset. The evaluation of the three different stemmers ranked the best performance was achieved by light10 stemmer in term of mean average precision.

AB - Stemming is a process of reducing inflected words to their stem, stem or root from a generally written word form. One of the high inflected words in the languages world is Arabic Language. Stemming improve the retrieval performance by reducing words variants, and in lcrease the similarity between related words. However, an Arabic Information Retrieval (AIR) can use stemming algorithms to retrieve a greater number of documents related to the users’ query. Therefore, the aim of this paper is to evaluate the impact of three different Arabic stemmers (i.e. ‘Information Science Research Institute” (ISRI), morphological and syntax based lemmatization “Educated Text Stemmer” (ETS), and Light10 stemmer) on the Arabic Information Retrieval performance for Arabic language, we used the Linguistic Data Consortium (LDC) Arabic Newswire data set as benchmark dataset. The evaluation of the three different stemmers ranked the best performance was achieved by light10 stemmer in term of mean average precision.

KW - Arabic language

KW - Educated Text Stemmer

KW - Information retrieval

KW - Stemming

UR - http://www.scopus.com/inward/record.url?scp=84921367925&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84921367925&partnerID=8YFLogxK

M3 - Article

VL - 8870

SP - 314

EP - 326

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

SN - 0302-9743

ER -