Adaptive learning for lemmatization in morphology analysis

Mary Ting, Abdul Kadir Rabiah, Tengku Mohd Tengku Sembok, Fatimah Ahmad, Azreen Azman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Morphological analysis is used to study the internal structure words by reducing the number of vocabularies used while retaining the semantic meaning of the knowledge in NLP system. Most of the existing algorithms are focusing on stemmatization instead of lemmatization process. Even with technology advancement, yet none of the available lemmatization algorithms able to produce 100% accurate result. The base words produced by the current algorithm might be unusable as it alters the overall meaning it tried to represent, which will directly affect the outcome of NLP systems. This paper proposed a new method to handle lemmatization process during the morphological analysis. The method consists three layers of lemmatization process, which incorporatethe used of Stanford parser API, WordNet database and adaptive learning technique. The lemmatized words yields from the proposed method are more accurate, thus it will improve the semantic knowledge represented and stored in the knowledge base.

Original languageEnglish
Title of host publicationCommunications in Computer and Information Science
PublisherSpringer Verlag
Pages343-357
Number of pages15
Volume513
ISBN (Print)9783319175294
DOIs
Publication statusPublished - 2015
Event13th International Conference on New Trends in Intelligent Software Methodology Tools and Techniques, SoMeT 2014 - Langkawi, Malaysia
Duration: 22 Sep 201424 Sep 2014

Publication series

NameCommunications in Computer and Information Science
Volume513
ISSN (Print)18650929

Other

Other13th International Conference on New Trends in Intelligent Software Methodology Tools and Techniques, SoMeT 2014
CountryMalaysia
CityLangkawi
Period22/9/1424/9/14

Fingerprint

Semantics
Application programming interfaces (API)

Keywords

  • Adaptive learning
  • Lemmatization
  • Morphology analysis
  • Natural language processing
  • Semi-supervised learning

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Ting, M., Rabiah, A. K., Sembok, T. M. T., Ahmad, F., & Azman, A. (2015). Adaptive learning for lemmatization in morphology analysis. In Communications in Computer and Information Science (Vol. 513, pp. 343-357). (Communications in Computer and Information Science; Vol. 513). Springer Verlag. https://doi.org/10.1007/978-3-319-17530-0_24

Adaptive learning for lemmatization in morphology analysis. / Ting, Mary; Rabiah, Abdul Kadir; Sembok, Tengku Mohd Tengku; Ahmad, Fatimah; Azman, Azreen.

Communications in Computer and Information Science. Vol. 513 Springer Verlag, 2015. p. 343-357 (Communications in Computer and Information Science; Vol. 513).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ting, M, Rabiah, AK, Sembok, TMT, Ahmad, F & Azman, A 2015, Adaptive learning for lemmatization in morphology analysis. in Communications in Computer and Information Science. vol. 513, Communications in Computer and Information Science, vol. 513, Springer Verlag, pp. 343-357, 13th International Conference on New Trends in Intelligent Software Methodology Tools and Techniques, SoMeT 2014, Langkawi, Malaysia, 22/9/14. https://doi.org/10.1007/978-3-319-17530-0_24
Ting M, Rabiah AK, Sembok TMT, Ahmad F, Azman A. Adaptive learning for lemmatization in morphology analysis. In Communications in Computer and Information Science. Vol. 513. Springer Verlag. 2015. p. 343-357. (Communications in Computer and Information Science). https://doi.org/10.1007/978-3-319-17530-0_24
Ting, Mary ; Rabiah, Abdul Kadir ; Sembok, Tengku Mohd Tengku ; Ahmad, Fatimah ; Azman, Azreen. / Adaptive learning for lemmatization in morphology analysis. Communications in Computer and Information Science. Vol. 513 Springer Verlag, 2015. pp. 343-357 (Communications in Computer and Information Science).
@inproceedings{30629017c95d48a1a84a06b0353f2096,
title = "Adaptive learning for lemmatization in morphology analysis",
abstract = "Morphological analysis is used to study the internal structure words by reducing the number of vocabularies used while retaining the semantic meaning of the knowledge in NLP system. Most of the existing algorithms are focusing on stemmatization instead of lemmatization process. Even with technology advancement, yet none of the available lemmatization algorithms able to produce 100{\%} accurate result. The base words produced by the current algorithm might be unusable as it alters the overall meaning it tried to represent, which will directly affect the outcome of NLP systems. This paper proposed a new method to handle lemmatization process during the morphological analysis. The method consists three layers of lemmatization process, which incorporatethe used of Stanford parser API, WordNet database and adaptive learning technique. The lemmatized words yields from the proposed method are more accurate, thus it will improve the semantic knowledge represented and stored in the knowledge base.",
keywords = "Adaptive learning, Lemmatization, Morphology analysis, Natural language processing, Semi-supervised learning",
author = "Mary Ting and Rabiah, {Abdul Kadir} and Sembok, {Tengku Mohd Tengku} and Fatimah Ahmad and Azreen Azman",
year = "2015",
doi = "10.1007/978-3-319-17530-0_24",
language = "English",
isbn = "9783319175294",
volume = "513",
series = "Communications in Computer and Information Science",
publisher = "Springer Verlag",
pages = "343--357",
booktitle = "Communications in Computer and Information Science",

}

TY - GEN

T1 - Adaptive learning for lemmatization in morphology analysis

AU - Ting, Mary

AU - Rabiah, Abdul Kadir

AU - Sembok, Tengku Mohd Tengku

AU - Ahmad, Fatimah

AU - Azman, Azreen

PY - 2015

Y1 - 2015

N2 - Morphological analysis is used to study the internal structure words by reducing the number of vocabularies used while retaining the semantic meaning of the knowledge in NLP system. Most of the existing algorithms are focusing on stemmatization instead of lemmatization process. Even with technology advancement, yet none of the available lemmatization algorithms able to produce 100% accurate result. The base words produced by the current algorithm might be unusable as it alters the overall meaning it tried to represent, which will directly affect the outcome of NLP systems. This paper proposed a new method to handle lemmatization process during the morphological analysis. The method consists three layers of lemmatization process, which incorporatethe used of Stanford parser API, WordNet database and adaptive learning technique. The lemmatized words yields from the proposed method are more accurate, thus it will improve the semantic knowledge represented and stored in the knowledge base.

AB - Morphological analysis is used to study the internal structure words by reducing the number of vocabularies used while retaining the semantic meaning of the knowledge in NLP system. Most of the existing algorithms are focusing on stemmatization instead of lemmatization process. Even with technology advancement, yet none of the available lemmatization algorithms able to produce 100% accurate result. The base words produced by the current algorithm might be unusable as it alters the overall meaning it tried to represent, which will directly affect the outcome of NLP systems. This paper proposed a new method to handle lemmatization process during the morphological analysis. The method consists three layers of lemmatization process, which incorporatethe used of Stanford parser API, WordNet database and adaptive learning technique. The lemmatized words yields from the proposed method are more accurate, thus it will improve the semantic knowledge represented and stored in the knowledge base.

KW - Adaptive learning

KW - Lemmatization

KW - Morphology analysis

KW - Natural language processing

KW - Semi-supervised learning

UR - http://www.scopus.com/inward/record.url?scp=84942626557&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84942626557&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-17530-0_24

DO - 10.1007/978-3-319-17530-0_24

M3 - Conference contribution

SN - 9783319175294

VL - 513

T3 - Communications in Computer and Information Science

SP - 343

EP - 357

BT - Communications in Computer and Information Science

PB - Springer Verlag

ER -