Implementation of Kadazan Tagger Based on Brill's Method

Research output: Contribution to journalArticle

Abstract

We present and evaluate the implementation of Part of Speech (POS) Tagging for the Kadazan language by using the Transformation-based approach. The main purpose of this study is to develop an automatic POS tagging for the Kadazan language, which had never, been developed before. POS tagging can tag the Kadazan corpus automatically and can help reduce the disambiguation problem of this language. The implementation of this approach in this study is to achieve a better and higher accuracy or at least similar to that of the other tagging approaches such as the statistical and the original rule-based approach. This approach can transform the tags based on the prescribed set of rules. A number of objectives were set in order to achieve the main purpose of this study. Firstly, to apply the lexical and contextual rules for this language. Secondly, to implement the Brill's algorithm based on the set of rules and finally to determine the effectiveness of the Kadazan Part of Speech by using this approach. The tagging system had been trained using four Kadazan corpuses containing 5663 words in all. Based on the evaluation results, the tagging system had achieved around 93% accuracy.

Original languageEnglish
Pages (from-to)177-190
Number of pages14
JournalJournal of ICT Research and Applications
Volume7
Issue number3
DOIs
Publication statusPublished - 2013

Fingerprint

Tag
Tagging
Language
Evaluation
Rule-based

Keywords

  • Brill's tagger
  • Kadazan language
  • Part of Speech tagger
  • Rule-based
  • Statistical
  • Transformation-based

ASJC Scopus subject areas

  • Computer Science(all)
  • Electrical and Electronic Engineering
  • Information Systems and Management

Cite this

Implementation of Kadazan Tagger Based on Brill's Method. / Alex, Marylyn; Zakaria, Lailatul Qadri.

In: Journal of ICT Research and Applications, Vol. 7, No. 3, 2013, p. 177-190.

Research output: Contribution to journalArticle

@article{04295123928c4c29acd88d00942e61bb,
title = "Implementation of Kadazan Tagger Based on Brill's Method",
abstract = "We present and evaluate the implementation of Part of Speech (POS) Tagging for the Kadazan language by using the Transformation-based approach. The main purpose of this study is to develop an automatic POS tagging for the Kadazan language, which had never, been developed before. POS tagging can tag the Kadazan corpus automatically and can help reduce the disambiguation problem of this language. The implementation of this approach in this study is to achieve a better and higher accuracy or at least similar to that of the other tagging approaches such as the statistical and the original rule-based approach. This approach can transform the tags based on the prescribed set of rules. A number of objectives were set in order to achieve the main purpose of this study. Firstly, to apply the lexical and contextual rules for this language. Secondly, to implement the Brill's algorithm based on the set of rules and finally to determine the effectiveness of the Kadazan Part of Speech by using this approach. The tagging system had been trained using four Kadazan corpuses containing 5663 words in all. Based on the evaluation results, the tagging system had achieved around 93{\%} accuracy.",
keywords = "Brill's tagger, Kadazan language, Part of Speech tagger, Rule-based, Statistical, Transformation-based",
author = "Marylyn Alex and Zakaria, {Lailatul Qadri}",
year = "2013",
doi = "10.5614/itbj.ict.res.appl.2013.7.3.1",
language = "English",
volume = "7",
pages = "177--190",
journal = "Journal of ICT Research and Applications",
issn = "2337-5787",
publisher = "Institut Teknologi Bandung (ITB)",
number = "3",

}

TY - JOUR

T1 - Implementation of Kadazan Tagger Based on Brill's Method

AU - Alex, Marylyn

AU - Zakaria, Lailatul Qadri

PY - 2013

Y1 - 2013

N2 - We present and evaluate the implementation of Part of Speech (POS) Tagging for the Kadazan language by using the Transformation-based approach. The main purpose of this study is to develop an automatic POS tagging for the Kadazan language, which had never, been developed before. POS tagging can tag the Kadazan corpus automatically and can help reduce the disambiguation problem of this language. The implementation of this approach in this study is to achieve a better and higher accuracy or at least similar to that of the other tagging approaches such as the statistical and the original rule-based approach. This approach can transform the tags based on the prescribed set of rules. A number of objectives were set in order to achieve the main purpose of this study. Firstly, to apply the lexical and contextual rules for this language. Secondly, to implement the Brill's algorithm based on the set of rules and finally to determine the effectiveness of the Kadazan Part of Speech by using this approach. The tagging system had been trained using four Kadazan corpuses containing 5663 words in all. Based on the evaluation results, the tagging system had achieved around 93% accuracy.

AB - We present and evaluate the implementation of Part of Speech (POS) Tagging for the Kadazan language by using the Transformation-based approach. The main purpose of this study is to develop an automatic POS tagging for the Kadazan language, which had never, been developed before. POS tagging can tag the Kadazan corpus automatically and can help reduce the disambiguation problem of this language. The implementation of this approach in this study is to achieve a better and higher accuracy or at least similar to that of the other tagging approaches such as the statistical and the original rule-based approach. This approach can transform the tags based on the prescribed set of rules. A number of objectives were set in order to achieve the main purpose of this study. Firstly, to apply the lexical and contextual rules for this language. Secondly, to implement the Brill's algorithm based on the set of rules and finally to determine the effectiveness of the Kadazan Part of Speech by using this approach. The tagging system had been trained using four Kadazan corpuses containing 5663 words in all. Based on the evaluation results, the tagging system had achieved around 93% accuracy.

KW - Brill's tagger

KW - Kadazan language

KW - Part of Speech tagger

KW - Rule-based

KW - Statistical

KW - Transformation-based

UR - http://www.scopus.com/inward/record.url?scp=84901756635&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84901756635&partnerID=8YFLogxK

U2 - 10.5614/itbj.ict.res.appl.2013.7.3.1

DO - 10.5614/itbj.ict.res.appl.2013.7.3.1

M3 - Article

VL - 7

SP - 177

EP - 190

JO - Journal of ICT Research and Applications

JF - Journal of ICT Research and Applications

SN - 2337-5787

IS - 3

ER -