Named Entity Enrichment Based on Subject-Object Anaphora Resolution

Mary Ting, Abdul Kadir Rabiah, Azreen Azman, Tengku Mohd Tengku Sembok, Fatimah Ahmad

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Named Entity Recognition (NER) is an early stage processing of an Information Extraction, which identifies and classify entities in text. The outcomes of the task have become the foundation of building complex Information Extraction applications. With the enormous amount of information available everywhere, the area has gained lots of attention from the research community. Currently, there are two main approaches to perform the Named Entity Recognition; rule-based approach and machine learning approach. In order to improve the accuracy of the classification and performance of the recognizer, some researchers have implemented a hybrid approach, which is the combination of both approaches. Even though many research and works have been done in the Named Entity Recognition, there is still room available for improvement. This paper proposed to increase the accuracy of entities detected by implementing anaphora resolution during the preprocessing phase and a hybrid approach to classify the detected tokens during the classification phase. The hybrid approach is combined the Conditional Random Field (CRF) classifier with a gazetteer and pattern rules to perform classification. The result has shown that the application of anaphora and gazetteer has increased 46% accuracy of the detected entities for the person class.

Original languageEnglish
Title of host publicationIntelligent Computing - Proceedings of the 2019 Computing Conference
EditorsRahul Bhatia, Supriya Kapoor, Kohei Arai
PublisherSpringer Verlag
Pages873-884
Number of pages12
ISBN (Print)9783030228675
DOIs
Publication statusPublished - 1 Jan 2019
EventComputing Conference, 2019 - London, United Kingdom
Duration: 16 Jul 201917 Jul 2019

Publication series

NameAdvances in Intelligent Systems and Computing
Volume998
ISSN (Print)2194-5357

Conference

ConferenceComputing Conference, 2019
CountryUnited Kingdom
CityLondon
Period16/7/1917/7/19

Fingerprint

Learning systems
Classifiers
Processing

Keywords

  • Anaphora resolution
  • CRF classifier
  • Gazetteer
  • Named Entity Recognition
  • Rule-based extraction

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Computer Science(all)

Cite this

Ting, M., Rabiah, A. K., Azman, A., Sembok, T. M. T., & Ahmad, F. (2019). Named Entity Enrichment Based on Subject-Object Anaphora Resolution. In R. Bhatia, S. Kapoor, & K. Arai (Eds.), Intelligent Computing - Proceedings of the 2019 Computing Conference (pp. 873-884). (Advances in Intelligent Systems and Computing; Vol. 998). Springer Verlag. https://doi.org/10.1007/978-3-030-22868-2_60

Named Entity Enrichment Based on Subject-Object Anaphora Resolution. / Ting, Mary; Rabiah, Abdul Kadir; Azman, Azreen; Sembok, Tengku Mohd Tengku; Ahmad, Fatimah.

Intelligent Computing - Proceedings of the 2019 Computing Conference. ed. / Rahul Bhatia; Supriya Kapoor; Kohei Arai. Springer Verlag, 2019. p. 873-884 (Advances in Intelligent Systems and Computing; Vol. 998).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ting, M, Rabiah, AK, Azman, A, Sembok, TMT & Ahmad, F 2019, Named Entity Enrichment Based on Subject-Object Anaphora Resolution. in R Bhatia, S Kapoor & K Arai (eds), Intelligent Computing - Proceedings of the 2019 Computing Conference. Advances in Intelligent Systems and Computing, vol. 998, Springer Verlag, pp. 873-884, Computing Conference, 2019, London, United Kingdom, 16/7/19. https://doi.org/10.1007/978-3-030-22868-2_60
Ting M, Rabiah AK, Azman A, Sembok TMT, Ahmad F. Named Entity Enrichment Based on Subject-Object Anaphora Resolution. In Bhatia R, Kapoor S, Arai K, editors, Intelligent Computing - Proceedings of the 2019 Computing Conference. Springer Verlag. 2019. p. 873-884. (Advances in Intelligent Systems and Computing). https://doi.org/10.1007/978-3-030-22868-2_60
Ting, Mary ; Rabiah, Abdul Kadir ; Azman, Azreen ; Sembok, Tengku Mohd Tengku ; Ahmad, Fatimah. / Named Entity Enrichment Based on Subject-Object Anaphora Resolution. Intelligent Computing - Proceedings of the 2019 Computing Conference. editor / Rahul Bhatia ; Supriya Kapoor ; Kohei Arai. Springer Verlag, 2019. pp. 873-884 (Advances in Intelligent Systems and Computing).
@inproceedings{ecc49b5015054a2195de6526ae8dbc06,
title = "Named Entity Enrichment Based on Subject-Object Anaphora Resolution",
abstract = "Named Entity Recognition (NER) is an early stage processing of an Information Extraction, which identifies and classify entities in text. The outcomes of the task have become the foundation of building complex Information Extraction applications. With the enormous amount of information available everywhere, the area has gained lots of attention from the research community. Currently, there are two main approaches to perform the Named Entity Recognition; rule-based approach and machine learning approach. In order to improve the accuracy of the classification and performance of the recognizer, some researchers have implemented a hybrid approach, which is the combination of both approaches. Even though many research and works have been done in the Named Entity Recognition, there is still room available for improvement. This paper proposed to increase the accuracy of entities detected by implementing anaphora resolution during the preprocessing phase and a hybrid approach to classify the detected tokens during the classification phase. The hybrid approach is combined the Conditional Random Field (CRF) classifier with a gazetteer and pattern rules to perform classification. The result has shown that the application of anaphora and gazetteer has increased 46{\%} accuracy of the detected entities for the person class.",
keywords = "Anaphora resolution, CRF classifier, Gazetteer, Named Entity Recognition, Rule-based extraction",
author = "Mary Ting and Rabiah, {Abdul Kadir} and Azreen Azman and Sembok, {Tengku Mohd Tengku} and Fatimah Ahmad",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/978-3-030-22868-2_60",
language = "English",
isbn = "9783030228675",
series = "Advances in Intelligent Systems and Computing",
publisher = "Springer Verlag",
pages = "873--884",
editor = "Rahul Bhatia and Supriya Kapoor and Kohei Arai",
booktitle = "Intelligent Computing - Proceedings of the 2019 Computing Conference",

}

TY - GEN

T1 - Named Entity Enrichment Based on Subject-Object Anaphora Resolution

AU - Ting, Mary

AU - Rabiah, Abdul Kadir

AU - Azman, Azreen

AU - Sembok, Tengku Mohd Tengku

AU - Ahmad, Fatimah

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Named Entity Recognition (NER) is an early stage processing of an Information Extraction, which identifies and classify entities in text. The outcomes of the task have become the foundation of building complex Information Extraction applications. With the enormous amount of information available everywhere, the area has gained lots of attention from the research community. Currently, there are two main approaches to perform the Named Entity Recognition; rule-based approach and machine learning approach. In order to improve the accuracy of the classification and performance of the recognizer, some researchers have implemented a hybrid approach, which is the combination of both approaches. Even though many research and works have been done in the Named Entity Recognition, there is still room available for improvement. This paper proposed to increase the accuracy of entities detected by implementing anaphora resolution during the preprocessing phase and a hybrid approach to classify the detected tokens during the classification phase. The hybrid approach is combined the Conditional Random Field (CRF) classifier with a gazetteer and pattern rules to perform classification. The result has shown that the application of anaphora and gazetteer has increased 46% accuracy of the detected entities for the person class.

AB - Named Entity Recognition (NER) is an early stage processing of an Information Extraction, which identifies and classify entities in text. The outcomes of the task have become the foundation of building complex Information Extraction applications. With the enormous amount of information available everywhere, the area has gained lots of attention from the research community. Currently, there are two main approaches to perform the Named Entity Recognition; rule-based approach and machine learning approach. In order to improve the accuracy of the classification and performance of the recognizer, some researchers have implemented a hybrid approach, which is the combination of both approaches. Even though many research and works have been done in the Named Entity Recognition, there is still room available for improvement. This paper proposed to increase the accuracy of entities detected by implementing anaphora resolution during the preprocessing phase and a hybrid approach to classify the detected tokens during the classification phase. The hybrid approach is combined the Conditional Random Field (CRF) classifier with a gazetteer and pattern rules to perform classification. The result has shown that the application of anaphora and gazetteer has increased 46% accuracy of the detected entities for the person class.

KW - Anaphora resolution

KW - CRF classifier

KW - Gazetteer

KW - Named Entity Recognition

KW - Rule-based extraction

UR - http://www.scopus.com/inward/record.url?scp=85069489854&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85069489854&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-22868-2_60

DO - 10.1007/978-3-030-22868-2_60

M3 - Conference contribution

SN - 9783030228675

T3 - Advances in Intelligent Systems and Computing

SP - 873

EP - 884

BT - Intelligent Computing - Proceedings of the 2019 Computing Conference

A2 - Bhatia, Rahul

A2 - Kapoor, Supriya

A2 - Arai, Kohei

PB - Springer Verlag

ER -