Arabic named entity recognition in crime documents

M. Asharef, Nazlia Omar, M. Albared

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

Named entity recognition (NER) systems aim to automatically identify and classify the proper nouns in text. NER systems play a significant role in many areas of Natural Language Processing (NLP) such as question answering systems, text summarization and information retrieval. Unlike previous Arabic NER systems which have been built to extract named entities from general Arabic text, our task involves extracting named entities from crime documents. Extracting named entities from crime text provides basic information for crime analysis. This paper presents a rule-based approach to Arabic NER system relevant to the crime domain. Based on morphological information, predefined crime and general indicator lists and an Arabic named entity annotation corpus from crime domain, several syntactical rules and patterns of Arabic NER are induced and then formalized. Then, these rules and patterns are applied to identify and classify named entities in Arabic crime text. The result shows that the accuracy of our system is 90%, and this result indicates that the method is effective and the performance of the achieved system is satisfactory.

Original languageEnglish
Pages (from-to)1-6
Number of pages6
JournalJournal of Theoretical and Applied Information Technology
Volume44
Issue number1
Publication statusPublished - 2012

Fingerprint

Named Entity Recognition
Crime
Classify
Question Answering System
Summarization
Information Retrieval
Natural Language
Annotation
Information retrieval
Text
Processing

Keywords

  • Arabic Crime Documents
  • Named Entity Recognition
  • Natural Language Processing

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Arabic named entity recognition in crime documents. / Asharef, M.; Omar, Nazlia; Albared, M.

In: Journal of Theoretical and Applied Information Technology, Vol. 44, No. 1, 2012, p. 1-6.

Research output: Contribution to journalArticle

@article{58a3e970670e46009687d1924ea88a2d,
title = "Arabic named entity recognition in crime documents",
abstract = "Named entity recognition (NER) systems aim to automatically identify and classify the proper nouns in text. NER systems play a significant role in many areas of Natural Language Processing (NLP) such as question answering systems, text summarization and information retrieval. Unlike previous Arabic NER systems which have been built to extract named entities from general Arabic text, our task involves extracting named entities from crime documents. Extracting named entities from crime text provides basic information for crime analysis. This paper presents a rule-based approach to Arabic NER system relevant to the crime domain. Based on morphological information, predefined crime and general indicator lists and an Arabic named entity annotation corpus from crime domain, several syntactical rules and patterns of Arabic NER are induced and then formalized. Then, these rules and patterns are applied to identify and classify named entities in Arabic crime text. The result shows that the accuracy of our system is 90{\%}, and this result indicates that the method is effective and the performance of the achieved system is satisfactory.",
keywords = "Arabic Crime Documents, Named Entity Recognition, Natural Language Processing",
author = "M. Asharef and Nazlia Omar and M. Albared",
year = "2012",
language = "English",
volume = "44",
pages = "1--6",
journal = "Journal of Theoretical and Applied Information Technology",
issn = "1992-8645",
publisher = "Asian Research Publishing Network (ARPN)",
number = "1",

}

TY - JOUR

T1 - Arabic named entity recognition in crime documents

AU - Asharef, M.

AU - Omar, Nazlia

AU - Albared, M.

PY - 2012

Y1 - 2012

N2 - Named entity recognition (NER) systems aim to automatically identify and classify the proper nouns in text. NER systems play a significant role in many areas of Natural Language Processing (NLP) such as question answering systems, text summarization and information retrieval. Unlike previous Arabic NER systems which have been built to extract named entities from general Arabic text, our task involves extracting named entities from crime documents. Extracting named entities from crime text provides basic information for crime analysis. This paper presents a rule-based approach to Arabic NER system relevant to the crime domain. Based on morphological information, predefined crime and general indicator lists and an Arabic named entity annotation corpus from crime domain, several syntactical rules and patterns of Arabic NER are induced and then formalized. Then, these rules and patterns are applied to identify and classify named entities in Arabic crime text. The result shows that the accuracy of our system is 90%, and this result indicates that the method is effective and the performance of the achieved system is satisfactory.

AB - Named entity recognition (NER) systems aim to automatically identify and classify the proper nouns in text. NER systems play a significant role in many areas of Natural Language Processing (NLP) such as question answering systems, text summarization and information retrieval. Unlike previous Arabic NER systems which have been built to extract named entities from general Arabic text, our task involves extracting named entities from crime documents. Extracting named entities from crime text provides basic information for crime analysis. This paper presents a rule-based approach to Arabic NER system relevant to the crime domain. Based on morphological information, predefined crime and general indicator lists and an Arabic named entity annotation corpus from crime domain, several syntactical rules and patterns of Arabic NER are induced and then formalized. Then, these rules and patterns are applied to identify and classify named entities in Arabic crime text. The result shows that the accuracy of our system is 90%, and this result indicates that the method is effective and the performance of the achieved system is satisfactory.

KW - Arabic Crime Documents

KW - Named Entity Recognition

KW - Natural Language Processing

UR - http://www.scopus.com/inward/record.url?scp=84867798364&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84867798364&partnerID=8YFLogxK

M3 - Article

VL - 44

SP - 1

EP - 6

JO - Journal of Theoretical and Applied Information Technology

JF - Journal of Theoretical and Applied Information Technology

SN - 1992-8645

IS - 1

ER -