Rule-based named entity recognition for drugrelated crime news documents

Khmael Rakm Rahem, Nazlia Omar

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Drug abuse pertains to the consumption of a substance that may induce adverse effects to a person. In international security studies, drug trafficking has become an important topic. In this regard, drug-related crimes are identified as an extremely significant challenge faced by any community. Several techniques for investigations in the crime domain have been implemented by many researchers. However, most of these researchers focus on extracting general crime entities. The number of studies that focus on the drug crime domain is relatively limited. This paper mainly aims to propose a rule-based named entity recognition model for drug-related crime news documents. In this work, a set of heuristic and grammatical rules is used to extract named entities, such as types of drugs, amount of drugs, price of drugs, drug hiding methods, and the nationality of the suspect. A set of grammatical and heuristic rules is established based on part-ofspeech information, developed gazetteers, and indicator word lists. The combined approach of heuristic and grammatical rules achieves a good performance with an overall precision of 86%, a recall of 87%, and an F1-measure of 87%. Results indicate that the ensemble of both heuristic and grammatical rules improves the extraction effectiveness in terms of macro-F1 for all entities.

Original languageEnglish
Pages (from-to)229-235
Number of pages7
JournalJournal of Theoretical and Applied Information Technology
Volume77
Issue number2
Publication statusPublished - 20 Jul 2015

Fingerprint

Named Entity Recognition
Crime
Drugs
Heuristics
Macros
Person
Ensemble

Keywords

  • Grammatical rules
  • Heuristic rules
  • Named entity recognition
  • Rule-based approach

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Rule-based named entity recognition for drugrelated crime news documents. / Rahem, Khmael Rakm; Omar, Nazlia.

In: Journal of Theoretical and Applied Information Technology, Vol. 77, No. 2, 20.07.2015, p. 229-235.

Research output: Contribution to journalArticle

@article{e2245c2df82b4f87a8c4e3cf976c9472,
title = "Rule-based named entity recognition for drugrelated crime news documents",
abstract = "Drug abuse pertains to the consumption of a substance that may induce adverse effects to a person. In international security studies, drug trafficking has become an important topic. In this regard, drug-related crimes are identified as an extremely significant challenge faced by any community. Several techniques for investigations in the crime domain have been implemented by many researchers. However, most of these researchers focus on extracting general crime entities. The number of studies that focus on the drug crime domain is relatively limited. This paper mainly aims to propose a rule-based named entity recognition model for drug-related crime news documents. In this work, a set of heuristic and grammatical rules is used to extract named entities, such as types of drugs, amount of drugs, price of drugs, drug hiding methods, and the nationality of the suspect. A set of grammatical and heuristic rules is established based on part-ofspeech information, developed gazetteers, and indicator word lists. The combined approach of heuristic and grammatical rules achieves a good performance with an overall precision of 86{\%}, a recall of 87{\%}, and an F1-measure of 87{\%}. Results indicate that the ensemble of both heuristic and grammatical rules improves the extraction effectiveness in terms of macro-F1 for all entities.",
keywords = "Grammatical rules, Heuristic rules, Named entity recognition, Rule-based approach",
author = "Rahem, {Khmael Rakm} and Nazlia Omar",
year = "2015",
month = "7",
day = "20",
language = "English",
volume = "77",
pages = "229--235",
journal = "Journal of Theoretical and Applied Information Technology",
issn = "1992-8645",
publisher = "Asian Research Publishing Network (ARPN)",
number = "2",

}

TY - JOUR

T1 - Rule-based named entity recognition for drugrelated crime news documents

AU - Rahem, Khmael Rakm

AU - Omar, Nazlia

PY - 2015/7/20

Y1 - 2015/7/20

N2 - Drug abuse pertains to the consumption of a substance that may induce adverse effects to a person. In international security studies, drug trafficking has become an important topic. In this regard, drug-related crimes are identified as an extremely significant challenge faced by any community. Several techniques for investigations in the crime domain have been implemented by many researchers. However, most of these researchers focus on extracting general crime entities. The number of studies that focus on the drug crime domain is relatively limited. This paper mainly aims to propose a rule-based named entity recognition model for drug-related crime news documents. In this work, a set of heuristic and grammatical rules is used to extract named entities, such as types of drugs, amount of drugs, price of drugs, drug hiding methods, and the nationality of the suspect. A set of grammatical and heuristic rules is established based on part-ofspeech information, developed gazetteers, and indicator word lists. The combined approach of heuristic and grammatical rules achieves a good performance with an overall precision of 86%, a recall of 87%, and an F1-measure of 87%. Results indicate that the ensemble of both heuristic and grammatical rules improves the extraction effectiveness in terms of macro-F1 for all entities.

AB - Drug abuse pertains to the consumption of a substance that may induce adverse effects to a person. In international security studies, drug trafficking has become an important topic. In this regard, drug-related crimes are identified as an extremely significant challenge faced by any community. Several techniques for investigations in the crime domain have been implemented by many researchers. However, most of these researchers focus on extracting general crime entities. The number of studies that focus on the drug crime domain is relatively limited. This paper mainly aims to propose a rule-based named entity recognition model for drug-related crime news documents. In this work, a set of heuristic and grammatical rules is used to extract named entities, such as types of drugs, amount of drugs, price of drugs, drug hiding methods, and the nationality of the suspect. A set of grammatical and heuristic rules is established based on part-ofspeech information, developed gazetteers, and indicator word lists. The combined approach of heuristic and grammatical rules achieves a good performance with an overall precision of 86%, a recall of 87%, and an F1-measure of 87%. Results indicate that the ensemble of both heuristic and grammatical rules improves the extraction effectiveness in terms of macro-F1 for all entities.

KW - Grammatical rules

KW - Heuristic rules

KW - Named entity recognition

KW - Rule-based approach

UR - http://www.scopus.com/inward/record.url?scp=84937579414&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84937579414&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84937579414

VL - 77

SP - 229

EP - 235

JO - Journal of Theoretical and Applied Information Technology

JF - Journal of Theoretical and Applied Information Technology

SN - 1992-8645

IS - 2

ER -