Malay name entity recognition using limited resources

Noorhuzaimi Mohd Noor, Junaida Sulaiman, Shahrul Azman Mohd Noah

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Named entity recognition (NER) is one of the researches under information retrieval (IR) field. NER is a task to recognize and classify the information in an unstructured text into something that can be understandable. This paper presents NER for Malay text by using limited resources that known as Malay Name Entity Recognition and Classification (MNERC). Due to limited annotated Malay corpus and un-shared resources the usage of limited gazetteer with a part of the name such as people, organization, location, and position has been proposed. The complex name has been solved by using rule based. The complex name consists of the symbol and word dan (in English and), which is needed, extra rules to determine either its part of entity name or ordinary word. The complex name has been processed during the text preprocessing with involving not only the rules but also the simple gazetteer (simple dictionary consist of part of name). The result has been compared regardless the data use and quantity. The result shows 97% of f-measures that produce by the proposed solution. The result shows the enhancement of NER performance compare with previous research even though by using the limited resources.

Original languageEnglish
Pages (from-to)2968-2971
Number of pages4
JournalAdvanced Science Letters
Volume22
Issue number10
DOIs
Publication statusPublished - 1 Oct 2016

Fingerprint

Named Entity Recognition
Names
Resources
resource
Glossaries
Information retrieval
resources
Information Retrieval
Preprocessing
Enhancement
Classify
Information Storage and Retrieval
information retrieval
Research
dictionary
symbol
Text
organization

Keywords

  • Information retrieval
  • Name entity recognition
  • Semantic class

ASJC Scopus subject areas

  • Health(social science)
  • Computer Science(all)
  • Education
  • Mathematics(all)
  • Environmental Science(all)
  • Engineering(all)
  • Energy(all)

Cite this

Malay name entity recognition using limited resources. / Noor, Noorhuzaimi Mohd; Sulaiman, Junaida; Mohd Noah, Shahrul Azman.

In: Advanced Science Letters, Vol. 22, No. 10, 01.10.2016, p. 2968-2971.

Research output: Contribution to journalArticle

Noor, Noorhuzaimi Mohd ; Sulaiman, Junaida ; Mohd Noah, Shahrul Azman. / Malay name entity recognition using limited resources. In: Advanced Science Letters. 2016 ; Vol. 22, No. 10. pp. 2968-2971.
@article{0065ac829c5b4b0ab6eb079fccc06d67,
title = "Malay name entity recognition using limited resources",
abstract = "Named entity recognition (NER) is one of the researches under information retrieval (IR) field. NER is a task to recognize and classify the information in an unstructured text into something that can be understandable. This paper presents NER for Malay text by using limited resources that known as Malay Name Entity Recognition and Classification (MNERC). Due to limited annotated Malay corpus and un-shared resources the usage of limited gazetteer with a part of the name such as people, organization, location, and position has been proposed. The complex name has been solved by using rule based. The complex name consists of the symbol and word dan (in English and), which is needed, extra rules to determine either its part of entity name or ordinary word. The complex name has been processed during the text preprocessing with involving not only the rules but also the simple gazetteer (simple dictionary consist of part of name). The result has been compared regardless the data use and quantity. The result shows 97{\%} of f-measures that produce by the proposed solution. The result shows the enhancement of NER performance compare with previous research even though by using the limited resources.",
keywords = "Information retrieval, Name entity recognition, Semantic class",
author = "Noor, {Noorhuzaimi Mohd} and Junaida Sulaiman and {Mohd Noah}, {Shahrul Azman}",
year = "2016",
month = "10",
day = "1",
doi = "10.1166/asl.2016.7124",
language = "English",
volume = "22",
pages = "2968--2971",
journal = "Advanced Science Letters",
issn = "1936-6612",
publisher = "American Scientific Publishers",
number = "10",

}

TY - JOUR

T1 - Malay name entity recognition using limited resources

AU - Noor, Noorhuzaimi Mohd

AU - Sulaiman, Junaida

AU - Mohd Noah, Shahrul Azman

PY - 2016/10/1

Y1 - 2016/10/1

N2 - Named entity recognition (NER) is one of the researches under information retrieval (IR) field. NER is a task to recognize and classify the information in an unstructured text into something that can be understandable. This paper presents NER for Malay text by using limited resources that known as Malay Name Entity Recognition and Classification (MNERC). Due to limited annotated Malay corpus and un-shared resources the usage of limited gazetteer with a part of the name such as people, organization, location, and position has been proposed. The complex name has been solved by using rule based. The complex name consists of the symbol and word dan (in English and), which is needed, extra rules to determine either its part of entity name or ordinary word. The complex name has been processed during the text preprocessing with involving not only the rules but also the simple gazetteer (simple dictionary consist of part of name). The result has been compared regardless the data use and quantity. The result shows 97% of f-measures that produce by the proposed solution. The result shows the enhancement of NER performance compare with previous research even though by using the limited resources.

AB - Named entity recognition (NER) is one of the researches under information retrieval (IR) field. NER is a task to recognize and classify the information in an unstructured text into something that can be understandable. This paper presents NER for Malay text by using limited resources that known as Malay Name Entity Recognition and Classification (MNERC). Due to limited annotated Malay corpus and un-shared resources the usage of limited gazetteer with a part of the name such as people, organization, location, and position has been proposed. The complex name has been solved by using rule based. The complex name consists of the symbol and word dan (in English and), which is needed, extra rules to determine either its part of entity name or ordinary word. The complex name has been processed during the text preprocessing with involving not only the rules but also the simple gazetteer (simple dictionary consist of part of name). The result has been compared regardless the data use and quantity. The result shows 97% of f-measures that produce by the proposed solution. The result shows the enhancement of NER performance compare with previous research even though by using the limited resources.

KW - Information retrieval

KW - Name entity recognition

KW - Semantic class

UR - http://www.scopus.com/inward/record.url?scp=85009062521&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85009062521&partnerID=8YFLogxK

U2 - 10.1166/asl.2016.7124

DO - 10.1166/asl.2016.7124

M3 - Article

AN - SCOPUS:85009062521

VL - 22

SP - 2968

EP - 2971

JO - Advanced Science Letters

JF - Advanced Science Letters

SN - 1936-6612

IS - 10

ER -