Implementation of Buckwalter transliteration to Malay corpora

Juhaida Abu Bakar, Khairuddin Omar, Mohammad Faidzul Nasrudin, Mohd. Zamri Murah, Che Wan Shamsul Bahri C W Ahmad

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Assigning lexical categories to words is an important step in the automated analysis of a text. Modern Natural Language Processing (NLP) algorithms are based on machine learning; learn rules automatically through the analysis of large corpora of typical real world examples. The Buckwalter transliteration has become a standard to be followed in natural language processing research community that works on Arabic. In this paper, we discuss the encoding in Malay language corpus written in Jawi. The purpose of this work is to conform and standardize the corpora between the similar characters. Four different letters with the Arabic language identified and new defined Buckwalter symbols were assigned to the letters. Collections of 114 chapter in al-Quran translated in Jawi has been used as a corpora. The similar corpora between Jawi and Arabic language will be manipulated to determined out-of-vocabulary problem (OOV) in POS-tags.

Original languageEnglish
Title of host publicationInternational Conference on Intelligent Systems Design and Applications, ISDA
PublisherIEEE Computer Society
Pages213-218
Number of pages6
ISBN (Print)9781479935161
DOIs
Publication statusPublished - 10 Oct 2014
Event2013 13th International Conference on Intellient Systems Design and Applications, ISDA 2013 - Salangor
Duration: 8 Dec 201310 Dec 2013

Other

Other2013 13th International Conference on Intellient Systems Design and Applications, ISDA 2013
CitySalangor
Period8/12/1310/12/13

Fingerprint

Processing
Learning systems

Keywords

  • Buckwalter transliteration
  • Corpora
  • Encoding
  • Jawi script
  • POS-tags

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Signal Processing
  • Control and Systems Engineering

Cite this

Bakar, J. A., Omar, K., Nasrudin, M. F., Murah, M. Z., & Ahmad, C. W. S. B. C. W. (2014). Implementation of Buckwalter transliteration to Malay corpora. In International Conference on Intelligent Systems Design and Applications, ISDA (pp. 213-218). [6920737] IEEE Computer Society. https://doi.org/10.1109/ISDA.2013.6920737

Implementation of Buckwalter transliteration to Malay corpora. / Bakar, Juhaida Abu; Omar, Khairuddin; Nasrudin, Mohammad Faidzul; Murah, Mohd. Zamri; Ahmad, Che Wan Shamsul Bahri C W.

International Conference on Intelligent Systems Design and Applications, ISDA. IEEE Computer Society, 2014. p. 213-218 6920737.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bakar, JA, Omar, K, Nasrudin, MF, Murah, MZ & Ahmad, CWSBCW 2014, Implementation of Buckwalter transliteration to Malay corpora. in International Conference on Intelligent Systems Design and Applications, ISDA., 6920737, IEEE Computer Society, pp. 213-218, 2013 13th International Conference on Intellient Systems Design and Applications, ISDA 2013, Salangor, 8/12/13. https://doi.org/10.1109/ISDA.2013.6920737
Bakar JA, Omar K, Nasrudin MF, Murah MZ, Ahmad CWSBCW. Implementation of Buckwalter transliteration to Malay corpora. In International Conference on Intelligent Systems Design and Applications, ISDA. IEEE Computer Society. 2014. p. 213-218. 6920737 https://doi.org/10.1109/ISDA.2013.6920737
Bakar, Juhaida Abu ; Omar, Khairuddin ; Nasrudin, Mohammad Faidzul ; Murah, Mohd. Zamri ; Ahmad, Che Wan Shamsul Bahri C W. / Implementation of Buckwalter transliteration to Malay corpora. International Conference on Intelligent Systems Design and Applications, ISDA. IEEE Computer Society, 2014. pp. 213-218
@inproceedings{e68659da6ba44f58b7f6e79b84fd1a8f,
title = "Implementation of Buckwalter transliteration to Malay corpora",
abstract = "Assigning lexical categories to words is an important step in the automated analysis of a text. Modern Natural Language Processing (NLP) algorithms are based on machine learning; learn rules automatically through the analysis of large corpora of typical real world examples. The Buckwalter transliteration has become a standard to be followed in natural language processing research community that works on Arabic. In this paper, we discuss the encoding in Malay language corpus written in Jawi. The purpose of this work is to conform and standardize the corpora between the similar characters. Four different letters with the Arabic language identified and new defined Buckwalter symbols were assigned to the letters. Collections of 114 chapter in al-Quran translated in Jawi has been used as a corpora. The similar corpora between Jawi and Arabic language will be manipulated to determined out-of-vocabulary problem (OOV) in POS-tags.",
keywords = "Buckwalter transliteration, Corpora, Encoding, Jawi script, POS-tags",
author = "Bakar, {Juhaida Abu} and Khairuddin Omar and Nasrudin, {Mohammad Faidzul} and Murah, {Mohd. Zamri} and Ahmad, {Che Wan Shamsul Bahri C W}",
year = "2014",
month = "10",
day = "10",
doi = "10.1109/ISDA.2013.6920737",
language = "English",
isbn = "9781479935161",
pages = "213--218",
booktitle = "International Conference on Intelligent Systems Design and Applications, ISDA",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - Implementation of Buckwalter transliteration to Malay corpora

AU - Bakar, Juhaida Abu

AU - Omar, Khairuddin

AU - Nasrudin, Mohammad Faidzul

AU - Murah, Mohd. Zamri

AU - Ahmad, Che Wan Shamsul Bahri C W

PY - 2014/10/10

Y1 - 2014/10/10

N2 - Assigning lexical categories to words is an important step in the automated analysis of a text. Modern Natural Language Processing (NLP) algorithms are based on machine learning; learn rules automatically through the analysis of large corpora of typical real world examples. The Buckwalter transliteration has become a standard to be followed in natural language processing research community that works on Arabic. In this paper, we discuss the encoding in Malay language corpus written in Jawi. The purpose of this work is to conform and standardize the corpora between the similar characters. Four different letters with the Arabic language identified and new defined Buckwalter symbols were assigned to the letters. Collections of 114 chapter in al-Quran translated in Jawi has been used as a corpora. The similar corpora between Jawi and Arabic language will be manipulated to determined out-of-vocabulary problem (OOV) in POS-tags.

AB - Assigning lexical categories to words is an important step in the automated analysis of a text. Modern Natural Language Processing (NLP) algorithms are based on machine learning; learn rules automatically through the analysis of large corpora of typical real world examples. The Buckwalter transliteration has become a standard to be followed in natural language processing research community that works on Arabic. In this paper, we discuss the encoding in Malay language corpus written in Jawi. The purpose of this work is to conform and standardize the corpora between the similar characters. Four different letters with the Arabic language identified and new defined Buckwalter symbols were assigned to the letters. Collections of 114 chapter in al-Quran translated in Jawi has been used as a corpora. The similar corpora between Jawi and Arabic language will be manipulated to determined out-of-vocabulary problem (OOV) in POS-tags.

KW - Buckwalter transliteration

KW - Corpora

KW - Encoding

KW - Jawi script

KW - POS-tags

UR - http://www.scopus.com/inward/record.url?scp=84908220890&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84908220890&partnerID=8YFLogxK

U2 - 10.1109/ISDA.2013.6920737

DO - 10.1109/ISDA.2013.6920737

M3 - Conference contribution

SN - 9781479935161

SP - 213

EP - 218

BT - International Conference on Intelligent Systems Design and Applications, ISDA

PB - IEEE Computer Society

ER -