DGMS: Dataset Generator Based on Malay Stemmer Algorithm

Zailani Abdullah, Siti Zaharah Mohamad, Norul Syazawini Zulkifli, Tutut Herawan, Abdul Razak Hamdan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Text mining is an interdisciplinary field of information retrieval, data mining, machine learning, statistics and computational linguistics. Text mining analysis is more complicated than data mining because it involves with unstructured and fuzzy data. On top of that, generation of datasets that are based on the text documents is still not available. Therefore in this study, we proposed a model and finally a tool called Dataset Generator Based on Malay Stemmer Algorithm (DGMS) and experimented based on the news articles from National News Agency of Malaysian (Bernama). The result shows that, the DGMS tool can be used to extract the features and finally generated the desired dataset.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Data Engineering, DaEng 2015
EditorsJemal H. Abawajy, Rozaida Ghazali, Mustafa Mat Deris, Hairulnizam Mahdin, Tutut Herawan, Mohamed Othman
PublisherSpringer Verlag
Pages51-60
Number of pages10
ISBN (Print)9789811317972
DOIs
Publication statusPublished - 1 Jan 2019
Event2nd International Conference on Advanced Data and Information Engineering, DaEng 2015 - Bali, Indonesia
Duration: 25 Apr 201526 Apr 2015

Publication series

NameLecture Notes in Electrical Engineering
Volume520
ISSN (Print)1876-1100
ISSN (Electronic)1876-1119

Conference

Conference2nd International Conference on Advanced Data and Information Engineering, DaEng 2015
CountryIndonesia
CityBali
Period25/4/1526/4/15

Fingerprint

Data mining
Computational linguistics
Information retrieval
Learning systems
Statistics

Keywords

  • Dataset
  • Malay
  • Stemmer
  • Text mining

ASJC Scopus subject areas

  • Industrial and Manufacturing Engineering

Cite this

Abdullah, Z., Mohamad, S. Z., Zulkifli, N. S., Herawan, T., & Hamdan, A. R. (2019). DGMS: Dataset Generator Based on Malay Stemmer Algorithm. In J. H. Abawajy, R. Ghazali, M. M. Deris, H. Mahdin, T. Herawan, & M. Othman (Eds.), Proceedings of the International Conference on Data Engineering, DaEng 2015 (pp. 51-60). (Lecture Notes in Electrical Engineering; Vol. 520). Springer Verlag. https://doi.org/10.1007/978-981-13-1799-6_6

DGMS : Dataset Generator Based on Malay Stemmer Algorithm. / Abdullah, Zailani; Mohamad, Siti Zaharah; Zulkifli, Norul Syazawini; Herawan, Tutut; Hamdan, Abdul Razak.

Proceedings of the International Conference on Data Engineering, DaEng 2015. ed. / Jemal H. Abawajy; Rozaida Ghazali; Mustafa Mat Deris; Hairulnizam Mahdin; Tutut Herawan; Mohamed Othman. Springer Verlag, 2019. p. 51-60 (Lecture Notes in Electrical Engineering; Vol. 520).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abdullah, Z, Mohamad, SZ, Zulkifli, NS, Herawan, T & Hamdan, AR 2019, DGMS: Dataset Generator Based on Malay Stemmer Algorithm. in JH Abawajy, R Ghazali, MM Deris, H Mahdin, T Herawan & M Othman (eds), Proceedings of the International Conference on Data Engineering, DaEng 2015. Lecture Notes in Electrical Engineering, vol. 520, Springer Verlag, pp. 51-60, 2nd International Conference on Advanced Data and Information Engineering, DaEng 2015, Bali, Indonesia, 25/4/15. https://doi.org/10.1007/978-981-13-1799-6_6
Abdullah Z, Mohamad SZ, Zulkifli NS, Herawan T, Hamdan AR. DGMS: Dataset Generator Based on Malay Stemmer Algorithm. In Abawajy JH, Ghazali R, Deris MM, Mahdin H, Herawan T, Othman M, editors, Proceedings of the International Conference on Data Engineering, DaEng 2015. Springer Verlag. 2019. p. 51-60. (Lecture Notes in Electrical Engineering). https://doi.org/10.1007/978-981-13-1799-6_6
Abdullah, Zailani ; Mohamad, Siti Zaharah ; Zulkifli, Norul Syazawini ; Herawan, Tutut ; Hamdan, Abdul Razak. / DGMS : Dataset Generator Based on Malay Stemmer Algorithm. Proceedings of the International Conference on Data Engineering, DaEng 2015. editor / Jemal H. Abawajy ; Rozaida Ghazali ; Mustafa Mat Deris ; Hairulnizam Mahdin ; Tutut Herawan ; Mohamed Othman. Springer Verlag, 2019. pp. 51-60 (Lecture Notes in Electrical Engineering).
@inproceedings{4025eb2b654044aba2d20684fb603345,
title = "DGMS: Dataset Generator Based on Malay Stemmer Algorithm",
abstract = "Text mining is an interdisciplinary field of information retrieval, data mining, machine learning, statistics and computational linguistics. Text mining analysis is more complicated than data mining because it involves with unstructured and fuzzy data. On top of that, generation of datasets that are based on the text documents is still not available. Therefore in this study, we proposed a model and finally a tool called Dataset Generator Based on Malay Stemmer Algorithm (DGMS) and experimented based on the news articles from National News Agency of Malaysian (Bernama). The result shows that, the DGMS tool can be used to extract the features and finally generated the desired dataset.",
keywords = "Dataset, Malay, Stemmer, Text mining",
author = "Zailani Abdullah and Mohamad, {Siti Zaharah} and Zulkifli, {Norul Syazawini} and Tutut Herawan and Hamdan, {Abdul Razak}",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/978-981-13-1799-6_6",
language = "English",
isbn = "9789811317972",
series = "Lecture Notes in Electrical Engineering",
publisher = "Springer Verlag",
pages = "51--60",
editor = "Abawajy, {Jemal H.} and Rozaida Ghazali and Deris, {Mustafa Mat} and Hairulnizam Mahdin and Tutut Herawan and Mohamed Othman",
booktitle = "Proceedings of the International Conference on Data Engineering, DaEng 2015",

}

TY - GEN

T1 - DGMS

T2 - Dataset Generator Based on Malay Stemmer Algorithm

AU - Abdullah, Zailani

AU - Mohamad, Siti Zaharah

AU - Zulkifli, Norul Syazawini

AU - Herawan, Tutut

AU - Hamdan, Abdul Razak

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Text mining is an interdisciplinary field of information retrieval, data mining, machine learning, statistics and computational linguistics. Text mining analysis is more complicated than data mining because it involves with unstructured and fuzzy data. On top of that, generation of datasets that are based on the text documents is still not available. Therefore in this study, we proposed a model and finally a tool called Dataset Generator Based on Malay Stemmer Algorithm (DGMS) and experimented based on the news articles from National News Agency of Malaysian (Bernama). The result shows that, the DGMS tool can be used to extract the features and finally generated the desired dataset.

AB - Text mining is an interdisciplinary field of information retrieval, data mining, machine learning, statistics and computational linguistics. Text mining analysis is more complicated than data mining because it involves with unstructured and fuzzy data. On top of that, generation of datasets that are based on the text documents is still not available. Therefore in this study, we proposed a model and finally a tool called Dataset Generator Based on Malay Stemmer Algorithm (DGMS) and experimented based on the news articles from National News Agency of Malaysian (Bernama). The result shows that, the DGMS tool can be used to extract the features and finally generated the desired dataset.

KW - Dataset

KW - Malay

KW - Stemmer

KW - Text mining

UR - http://www.scopus.com/inward/record.url?scp=85071452613&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071452613&partnerID=8YFLogxK

U2 - 10.1007/978-981-13-1799-6_6

DO - 10.1007/978-981-13-1799-6_6

M3 - Conference contribution

AN - SCOPUS:85071452613

SN - 9789811317972

T3 - Lecture Notes in Electrical Engineering

SP - 51

EP - 60

BT - Proceedings of the International Conference on Data Engineering, DaEng 2015

A2 - Abawajy, Jemal H.

A2 - Ghazali, Rozaida

A2 - Deris, Mustafa Mat

A2 - Mahdin, Hairulnizam

A2 - Herawan, Tutut

A2 - Othman, Mohamed

PB - Springer Verlag

ER -