Arabic named entity recognition for crime documents using classifiers combination

Suhad Abdulazahra Hachim Al-Shoukry, Nazlia Omar

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Named Entity Recognition (NER) is the process of identifying proper names including person’s name, organization’s name, location’s name, dates and currencies. It plays an essential role in multiple domains such as Information Extraction (IE), Sentiment Analysis (SA) and Question Answering (QA). There have been various research efforts in terms of Arabic NER. However, identifying Arabic named entities is a challenging task regarding to the complexity that lies in Arabic language. Such complexity can be represented by the non-existence of capitalization feature which facilitates the process of NER. Furthermore, there is a lack of lexical corpora that may include all the Arabic NEs. On other hand, most of the approaches that have been proposed for Arabic NER were based on handcrafted rule-based methods which can be laborious and time consuming. Therefore, this paper aims to propose an Arabic NER based on a combination of classifiers and feature extraction for crime dataset. The dataset has been collected from online resources and undergone multiple pre-processing tasks including normalization, tokenization and stemming. Moreover, the feature extraction task which contains POS tagging, keyword trigger, definite articles and affixes has been performed in order to improve the process of recognition. Hence, three classifiers, which are Naïve Bayes (NB), Decision Tree (DT) and a sequential combination between them, will utilize these features as training set in order to classify the named entities. The experimental results have shown that the combination of DT and NB with feature extraction outperform the individual classifier by achieving 94.19% of F-measure. This shows that the combination of multiple classifiers has a significant impact on the effectiveness of Arabic NER.

Original languageEnglish
Pages (from-to)628-634
Number of pages7
JournalInternational Review on Computers and Software
Volume10
Issue number6
Publication statusPublished - 1 Jun 2015

Fingerprint

Crime
Classifiers
Feature extraction
Decision trees
Processing

Keywords

  • Arabic
  • Decision tree
  • Named entity recognition
  • Naïve bayes

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Arabic named entity recognition for crime documents using classifiers combination. / Al-Shoukry, Suhad Abdulazahra Hachim; Omar, Nazlia.

In: International Review on Computers and Software, Vol. 10, No. 6, 01.06.2015, p. 628-634.

Research output: Contribution to journalArticle

@article{ae57c626950245ea95f1ce31ce11c377,
title = "Arabic named entity recognition for crime documents using classifiers combination",
abstract = "Named Entity Recognition (NER) is the process of identifying proper names including person’s name, organization’s name, location’s name, dates and currencies. It plays an essential role in multiple domains such as Information Extraction (IE), Sentiment Analysis (SA) and Question Answering (QA). There have been various research efforts in terms of Arabic NER. However, identifying Arabic named entities is a challenging task regarding to the complexity that lies in Arabic language. Such complexity can be represented by the non-existence of capitalization feature which facilitates the process of NER. Furthermore, there is a lack of lexical corpora that may include all the Arabic NEs. On other hand, most of the approaches that have been proposed for Arabic NER were based on handcrafted rule-based methods which can be laborious and time consuming. Therefore, this paper aims to propose an Arabic NER based on a combination of classifiers and feature extraction for crime dataset. The dataset has been collected from online resources and undergone multiple pre-processing tasks including normalization, tokenization and stemming. Moreover, the feature extraction task which contains POS tagging, keyword trigger, definite articles and affixes has been performed in order to improve the process of recognition. Hence, three classifiers, which are Na{\"i}ve Bayes (NB), Decision Tree (DT) and a sequential combination between them, will utilize these features as training set in order to classify the named entities. The experimental results have shown that the combination of DT and NB with feature extraction outperform the individual classifier by achieving 94.19{\%} of F-measure. This shows that the combination of multiple classifiers has a significant impact on the effectiveness of Arabic NER.",
keywords = "Arabic, Decision tree, Named entity recognition, Na{\"i}ve bayes",
author = "Al-Shoukry, {Suhad Abdulazahra Hachim} and Nazlia Omar",
year = "2015",
month = "6",
day = "1",
language = "English",
volume = "10",
pages = "628--634",
journal = "International Review on Computers and Software",
issn = "1828-6003",
publisher = "Praise Worthy Prize",
number = "6",

}

TY - JOUR

T1 - Arabic named entity recognition for crime documents using classifiers combination

AU - Al-Shoukry, Suhad Abdulazahra Hachim

AU - Omar, Nazlia

PY - 2015/6/1

Y1 - 2015/6/1

N2 - Named Entity Recognition (NER) is the process of identifying proper names including person’s name, organization’s name, location’s name, dates and currencies. It plays an essential role in multiple domains such as Information Extraction (IE), Sentiment Analysis (SA) and Question Answering (QA). There have been various research efforts in terms of Arabic NER. However, identifying Arabic named entities is a challenging task regarding to the complexity that lies in Arabic language. Such complexity can be represented by the non-existence of capitalization feature which facilitates the process of NER. Furthermore, there is a lack of lexical corpora that may include all the Arabic NEs. On other hand, most of the approaches that have been proposed for Arabic NER were based on handcrafted rule-based methods which can be laborious and time consuming. Therefore, this paper aims to propose an Arabic NER based on a combination of classifiers and feature extraction for crime dataset. The dataset has been collected from online resources and undergone multiple pre-processing tasks including normalization, tokenization and stemming. Moreover, the feature extraction task which contains POS tagging, keyword trigger, definite articles and affixes has been performed in order to improve the process of recognition. Hence, three classifiers, which are Naïve Bayes (NB), Decision Tree (DT) and a sequential combination between them, will utilize these features as training set in order to classify the named entities. The experimental results have shown that the combination of DT and NB with feature extraction outperform the individual classifier by achieving 94.19% of F-measure. This shows that the combination of multiple classifiers has a significant impact on the effectiveness of Arabic NER.

AB - Named Entity Recognition (NER) is the process of identifying proper names including person’s name, organization’s name, location’s name, dates and currencies. It plays an essential role in multiple domains such as Information Extraction (IE), Sentiment Analysis (SA) and Question Answering (QA). There have been various research efforts in terms of Arabic NER. However, identifying Arabic named entities is a challenging task regarding to the complexity that lies in Arabic language. Such complexity can be represented by the non-existence of capitalization feature which facilitates the process of NER. Furthermore, there is a lack of lexical corpora that may include all the Arabic NEs. On other hand, most of the approaches that have been proposed for Arabic NER were based on handcrafted rule-based methods which can be laborious and time consuming. Therefore, this paper aims to propose an Arabic NER based on a combination of classifiers and feature extraction for crime dataset. The dataset has been collected from online resources and undergone multiple pre-processing tasks including normalization, tokenization and stemming. Moreover, the feature extraction task which contains POS tagging, keyword trigger, definite articles and affixes has been performed in order to improve the process of recognition. Hence, three classifiers, which are Naïve Bayes (NB), Decision Tree (DT) and a sequential combination between them, will utilize these features as training set in order to classify the named entities. The experimental results have shown that the combination of DT and NB with feature extraction outperform the individual classifier by achieving 94.19% of F-measure. This shows that the combination of multiple classifiers has a significant impact on the effectiveness of Arabic NER.

KW - Arabic

KW - Decision tree

KW - Named entity recognition

KW - Naïve bayes

UR - http://www.scopus.com/inward/record.url?scp=84938532207&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84938532207&partnerID=8YFLogxK

M3 - Article

VL - 10

SP - 628

EP - 634

JO - International Review on Computers and Software

JF - International Review on Computers and Software

SN - 1828-6003

IS - 6

ER -