Named entity recognition for political domain in Arabic language

Halema H. Mhamed Alshref, Mohd Juzaiddin Ab Aziz

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Named Entity Recognition (NER) began in late 1991 with a small number of general categories such as names of persons, names of organizations and names of locations. This work describes the development and implementation of Arabic Named Entity Recognition System (ANER system) for the Arabic Language. For identification and classification Named Entities (NEs) in the text such as persons, locations, organizations and temporal values. NER plays a significant role in various types of Natural Language Processing (NLP) applications, especially in information extraction, information retrieval, machine translation, syntactic parsing/chunking and question-answering. The NER task was considerably more challenging when it was targeting a morphologically complex language such as Arabic due to its complexity. The Arabic language has some peculiarities which harden the NER task. Arabic has a rich and complex morphology. The main aim of this research was to use the rule based approach to design and implement an Arabic NER system for the political domain. The rule based approach consisted of a lexicon, in the form of verb contextual clue lists and the noun contextual clue list, together with a set of grammar rules which were responsible for recognizing and classifying NEs. Considering the system evaluations tested on ANER corpus. Taken human annotated corpus and evaluated the system and then compared the results against this corpus. The accuracy was 94.86% of ANER system, the results showed that the approach achieved an accuracy of 82.76% for Person NE, 98.3% for Location, 100% for Organization and 98.37% for MISC.

Original languageEnglish
Pages (from-to)13-21
Number of pages9
JournalAsian Journal of Applied Sciences
Volume7
Issue number1
DOIs
Publication statusPublished - 2014

Fingerprint

Syntactics
Information retrieval
Processing

Keywords

  • Arabic language
  • Arabic named entity recognition
  • Information extraction
  • Named entities
  • Natural language processing
  • Political domain

ASJC Scopus subject areas

  • General

Cite this

Named entity recognition for political domain in Arabic language. / Mhamed Alshref, Halema H.; Ab Aziz, Mohd Juzaiddin.

In: Asian Journal of Applied Sciences, Vol. 7, No. 1, 2014, p. 13-21.

Research output: Contribution to journalArticle

@article{e27f3c5e18e64cbba97e30c920a2b713,
title = "Named entity recognition for political domain in Arabic language",
abstract = "Named Entity Recognition (NER) began in late 1991 with a small number of general categories such as names of persons, names of organizations and names of locations. This work describes the development and implementation of Arabic Named Entity Recognition System (ANER system) for the Arabic Language. For identification and classification Named Entities (NEs) in the text such as persons, locations, organizations and temporal values. NER plays a significant role in various types of Natural Language Processing (NLP) applications, especially in information extraction, information retrieval, machine translation, syntactic parsing/chunking and question-answering. The NER task was considerably more challenging when it was targeting a morphologically complex language such as Arabic due to its complexity. The Arabic language has some peculiarities which harden the NER task. Arabic has a rich and complex morphology. The main aim of this research was to use the rule based approach to design and implement an Arabic NER system for the political domain. The rule based approach consisted of a lexicon, in the form of verb contextual clue lists and the noun contextual clue list, together with a set of grammar rules which were responsible for recognizing and classifying NEs. Considering the system evaluations tested on ANER corpus. Taken human annotated corpus and evaluated the system and then compared the results against this corpus. The accuracy was 94.86{\%} of ANER system, the results showed that the approach achieved an accuracy of 82.76{\%} for Person NE, 98.3{\%} for Location, 100{\%} for Organization and 98.37{\%} for MISC.",
keywords = "Arabic language, Arabic named entity recognition, Information extraction, Named entities, Natural language processing, Political domain",
author = "{Mhamed Alshref}, {Halema H.} and {Ab Aziz}, {Mohd Juzaiddin}",
year = "2014",
doi = "10.3923/ajaps.2014.13.21",
language = "English",
volume = "7",
pages = "13--21",
journal = "Asian Journal of Applied Sciences",
issn = "1996-3343",
publisher = "Science Alert",
number = "1",

}

TY - JOUR

T1 - Named entity recognition for political domain in Arabic language

AU - Mhamed Alshref, Halema H.

AU - Ab Aziz, Mohd Juzaiddin

PY - 2014

Y1 - 2014

N2 - Named Entity Recognition (NER) began in late 1991 with a small number of general categories such as names of persons, names of organizations and names of locations. This work describes the development and implementation of Arabic Named Entity Recognition System (ANER system) for the Arabic Language. For identification and classification Named Entities (NEs) in the text such as persons, locations, organizations and temporal values. NER plays a significant role in various types of Natural Language Processing (NLP) applications, especially in information extraction, information retrieval, machine translation, syntactic parsing/chunking and question-answering. The NER task was considerably more challenging when it was targeting a morphologically complex language such as Arabic due to its complexity. The Arabic language has some peculiarities which harden the NER task. Arabic has a rich and complex morphology. The main aim of this research was to use the rule based approach to design and implement an Arabic NER system for the political domain. The rule based approach consisted of a lexicon, in the form of verb contextual clue lists and the noun contextual clue list, together with a set of grammar rules which were responsible for recognizing and classifying NEs. Considering the system evaluations tested on ANER corpus. Taken human annotated corpus and evaluated the system and then compared the results against this corpus. The accuracy was 94.86% of ANER system, the results showed that the approach achieved an accuracy of 82.76% for Person NE, 98.3% for Location, 100% for Organization and 98.37% for MISC.

AB - Named Entity Recognition (NER) began in late 1991 with a small number of general categories such as names of persons, names of organizations and names of locations. This work describes the development and implementation of Arabic Named Entity Recognition System (ANER system) for the Arabic Language. For identification and classification Named Entities (NEs) in the text such as persons, locations, organizations and temporal values. NER plays a significant role in various types of Natural Language Processing (NLP) applications, especially in information extraction, information retrieval, machine translation, syntactic parsing/chunking and question-answering. The NER task was considerably more challenging when it was targeting a morphologically complex language such as Arabic due to its complexity. The Arabic language has some peculiarities which harden the NER task. Arabic has a rich and complex morphology. The main aim of this research was to use the rule based approach to design and implement an Arabic NER system for the political domain. The rule based approach consisted of a lexicon, in the form of verb contextual clue lists and the noun contextual clue list, together with a set of grammar rules which were responsible for recognizing and classifying NEs. Considering the system evaluations tested on ANER corpus. Taken human annotated corpus and evaluated the system and then compared the results against this corpus. The accuracy was 94.86% of ANER system, the results showed that the approach achieved an accuracy of 82.76% for Person NE, 98.3% for Location, 100% for Organization and 98.37% for MISC.

KW - Arabic language

KW - Arabic named entity recognition

KW - Information extraction

KW - Named entities

KW - Natural language processing

KW - Political domain

UR - http://www.scopus.com/inward/record.url?scp=84900589050&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84900589050&partnerID=8YFLogxK

U2 - 10.3923/ajaps.2014.13.21

DO - 10.3923/ajaps.2014.13.21

M3 - Article

AN - SCOPUS:84900589050

VL - 7

SP - 13

EP - 21

JO - Asian Journal of Applied Sciences

JF - Asian Journal of Applied Sciences

SN - 1996-3343

IS - 1

ER -