A Malay stemmer for Jawi characters

Suliana Sulaiman, Khairuddin Omar, Nazlia Omar, Mohd. Zamri Murah, Hamdan Abdul Rahman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

The Malay language may be written using either Roman or Jawi characters. Most Malay stemmers cover only Roman (Rumi) affixes. This paper proposes a stemmer for Jawi characters using two sets of rules in Jawi: one set of rules is used to stem various forms of derived words, and another set is used to replace the use of a dictionary by producing the root word for each derivative. This stemmer has been tested using 1185 derived words consisting of prefix, circumfix, suffix, and infix. The results show that 84.89% of Jawi root words have been successfully stemmed.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages668-676
Number of pages9
Volume7106 LNAI
DOIs
Publication statusPublished - 2011
Event24th Australasian Joint Conference on Artificial Intelligence, AI 2011 - Perth, WA
Duration: 5 Dec 20118 Dec 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7106 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other24th Australasian Joint Conference on Artificial Intelligence, AI 2011
CityPerth, WA
Period5/12/118/12/11

Fingerprint

Glossaries
Derivatives
Roots
Suffix
Prefix
Cover
Derivative
Character

Keywords

  • Jawi
  • Malay Stemmer
  • Stemmer

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Sulaiman, S., Omar, K., Omar, N., Murah, M. Z., & Abdul Rahman, H. (2011). A Malay stemmer for Jawi characters. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7106 LNAI, pp. 668-676). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7106 LNAI). https://doi.org/10.1007/978-3-642-25832-9_68

A Malay stemmer for Jawi characters. / Sulaiman, Suliana; Omar, Khairuddin; Omar, Nazlia; Murah, Mohd. Zamri; Abdul Rahman, Hamdan.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7106 LNAI 2011. p. 668-676 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7106 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sulaiman, S, Omar, K, Omar, N, Murah, MZ & Abdul Rahman, H 2011, A Malay stemmer for Jawi characters. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 7106 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7106 LNAI, pp. 668-676, 24th Australasian Joint Conference on Artificial Intelligence, AI 2011, Perth, WA, 5/12/11. https://doi.org/10.1007/978-3-642-25832-9_68
Sulaiman S, Omar K, Omar N, Murah MZ, Abdul Rahman H. A Malay stemmer for Jawi characters. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7106 LNAI. 2011. p. 668-676. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-25832-9_68
Sulaiman, Suliana ; Omar, Khairuddin ; Omar, Nazlia ; Murah, Mohd. Zamri ; Abdul Rahman, Hamdan. / A Malay stemmer for Jawi characters. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7106 LNAI 2011. pp. 668-676 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{9d318cc83de84d268458a7b423e45ae8,
title = "A Malay stemmer for Jawi characters",
abstract = "The Malay language may be written using either Roman or Jawi characters. Most Malay stemmers cover only Roman (Rumi) affixes. This paper proposes a stemmer for Jawi characters using two sets of rules in Jawi: one set of rules is used to stem various forms of derived words, and another set is used to replace the use of a dictionary by producing the root word for each derivative. This stemmer has been tested using 1185 derived words consisting of prefix, circumfix, suffix, and infix. The results show that 84.89{\%} of Jawi root words have been successfully stemmed.",
keywords = "Jawi, Malay Stemmer, Stemmer",
author = "Suliana Sulaiman and Khairuddin Omar and Nazlia Omar and Murah, {Mohd. Zamri} and {Abdul Rahman}, Hamdan",
year = "2011",
doi = "10.1007/978-3-642-25832-9_68",
language = "English",
isbn = "9783642258312",
volume = "7106 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "668--676",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - A Malay stemmer for Jawi characters

AU - Sulaiman, Suliana

AU - Omar, Khairuddin

AU - Omar, Nazlia

AU - Murah, Mohd. Zamri

AU - Abdul Rahman, Hamdan

PY - 2011

Y1 - 2011

N2 - The Malay language may be written using either Roman or Jawi characters. Most Malay stemmers cover only Roman (Rumi) affixes. This paper proposes a stemmer for Jawi characters using two sets of rules in Jawi: one set of rules is used to stem various forms of derived words, and another set is used to replace the use of a dictionary by producing the root word for each derivative. This stemmer has been tested using 1185 derived words consisting of prefix, circumfix, suffix, and infix. The results show that 84.89% of Jawi root words have been successfully stemmed.

AB - The Malay language may be written using either Roman or Jawi characters. Most Malay stemmers cover only Roman (Rumi) affixes. This paper proposes a stemmer for Jawi characters using two sets of rules in Jawi: one set of rules is used to stem various forms of derived words, and another set is used to replace the use of a dictionary by producing the root word for each derivative. This stemmer has been tested using 1185 derived words consisting of prefix, circumfix, suffix, and infix. The results show that 84.89% of Jawi root words have been successfully stemmed.

KW - Jawi

KW - Malay Stemmer

KW - Stemmer

UR - http://www.scopus.com/inward/record.url?scp=83755173670&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=83755173670&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-25832-9_68

DO - 10.1007/978-3-642-25832-9_68

M3 - Conference contribution

SN - 9783642258312

VL - 7106 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 668

EP - 676

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -