A machine learning approach to anaphora resolution in Arabic

Abdullatif Abolohom, Nazlia Omar

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Anaphora resolution is a commonly studied research area of Natural Language Processing (NLP). It is crucial for many application areas of Natural Language Processing including information extraction, question answering and text summarization. Most of the earlier work done in the field of anaphora resolution is for English and other European languages. Arabic language is not sufficiently studied with respect to anaphora resolution and rarely being subjected to machine learning experiments. In this paper we present a machine learning approach to resolve the pronominal anaphora in Arabic language. In this work we determine the appropriate features to be used in this task. We consider a number of classifier namely naive Bayes, K-nearest neighbors and linear logistic regression are employed as base-classifiers for each of the feature sets. In this paper, an in-depth study has been conducted on different of feature sets for exploiting effective features and investigating their effect on performance of the Anaphora resolution. Finally, a wide range of comparative experiments on Quranic datasets are conducted, The experimental results on the Arabic Quran training corpus demonstrate that the proposed method is feasible for the pronominal anaphora resolution of Arabic.

Original languageEnglish
Pages (from-to)1956-1963
Number of pages8
JournalInternational Review on Computers and Software
Volume9
Issue number12
Publication statusPublished - 2014

Fingerprint

Learning systems
Classifiers
Logistics
Experiments
Processing

Keywords

  • Anaphora resolution
  • Natural language processing
  • Supervised machine learning

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

A machine learning approach to anaphora resolution in Arabic. / Abolohom, Abdullatif; Omar, Nazlia.

In: International Review on Computers and Software, Vol. 9, No. 12, 2014, p. 1956-1963.

Research output: Contribution to journalArticle

@article{8d07b8af0bef40c48b2f1c8611bedc06,
title = "A machine learning approach to anaphora resolution in Arabic",
abstract = "Anaphora resolution is a commonly studied research area of Natural Language Processing (NLP). It is crucial for many application areas of Natural Language Processing including information extraction, question answering and text summarization. Most of the earlier work done in the field of anaphora resolution is for English and other European languages. Arabic language is not sufficiently studied with respect to anaphora resolution and rarely being subjected to machine learning experiments. In this paper we present a machine learning approach to resolve the pronominal anaphora in Arabic language. In this work we determine the appropriate features to be used in this task. We consider a number of classifier namely naive Bayes, K-nearest neighbors and linear logistic regression are employed as base-classifiers for each of the feature sets. In this paper, an in-depth study has been conducted on different of feature sets for exploiting effective features and investigating their effect on performance of the Anaphora resolution. Finally, a wide range of comparative experiments on Quranic datasets are conducted, The experimental results on the Arabic Quran training corpus demonstrate that the proposed method is feasible for the pronominal anaphora resolution of Arabic.",
keywords = "Anaphora resolution, Natural language processing, Supervised machine learning",
author = "Abdullatif Abolohom and Nazlia Omar",
year = "2014",
language = "English",
volume = "9",
pages = "1956--1963",
journal = "International Review on Computers and Software",
issn = "1828-6003",
publisher = "Praise Worthy Prize",
number = "12",

}

TY - JOUR

T1 - A machine learning approach to anaphora resolution in Arabic

AU - Abolohom, Abdullatif

AU - Omar, Nazlia

PY - 2014

Y1 - 2014

N2 - Anaphora resolution is a commonly studied research area of Natural Language Processing (NLP). It is crucial for many application areas of Natural Language Processing including information extraction, question answering and text summarization. Most of the earlier work done in the field of anaphora resolution is for English and other European languages. Arabic language is not sufficiently studied with respect to anaphora resolution and rarely being subjected to machine learning experiments. In this paper we present a machine learning approach to resolve the pronominal anaphora in Arabic language. In this work we determine the appropriate features to be used in this task. We consider a number of classifier namely naive Bayes, K-nearest neighbors and linear logistic regression are employed as base-classifiers for each of the feature sets. In this paper, an in-depth study has been conducted on different of feature sets for exploiting effective features and investigating their effect on performance of the Anaphora resolution. Finally, a wide range of comparative experiments on Quranic datasets are conducted, The experimental results on the Arabic Quran training corpus demonstrate that the proposed method is feasible for the pronominal anaphora resolution of Arabic.

AB - Anaphora resolution is a commonly studied research area of Natural Language Processing (NLP). It is crucial for many application areas of Natural Language Processing including information extraction, question answering and text summarization. Most of the earlier work done in the field of anaphora resolution is for English and other European languages. Arabic language is not sufficiently studied with respect to anaphora resolution and rarely being subjected to machine learning experiments. In this paper we present a machine learning approach to resolve the pronominal anaphora in Arabic language. In this work we determine the appropriate features to be used in this task. We consider a number of classifier namely naive Bayes, K-nearest neighbors and linear logistic regression are employed as base-classifiers for each of the feature sets. In this paper, an in-depth study has been conducted on different of feature sets for exploiting effective features and investigating their effect on performance of the Anaphora resolution. Finally, a wide range of comparative experiments on Quranic datasets are conducted, The experimental results on the Arabic Quran training corpus demonstrate that the proposed method is feasible for the pronominal anaphora resolution of Arabic.

KW - Anaphora resolution

KW - Natural language processing

KW - Supervised machine learning

UR - http://www.scopus.com/inward/record.url?scp=84924755073&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84924755073&partnerID=8YFLogxK

M3 - Article

VL - 9

SP - 1956

EP - 1963

JO - International Review on Computers and Software

JF - International Review on Computers and Software

SN - 1828-6003

IS - 12

ER -