Experiments on Malay short text classification

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this study, experiments are conducted on Malay short text using three diverse types of classifiers: KNN, SVM and NB. The classifiers were used to test the features of a bag-of-words (BOW) and a variant of TF.IDF; TF-IDF, smoothed TF-IDF and ITC. A Malay short text dataset was developed based on tweets from Twitter data and classified into two separate classes. The experiments were conducted on 50 % and 20 % sizes of the test data. The results demonstrated that the most highly consistent result was achieved by the SVM classifier with ITC as the feature, where the Precision, Recall, and F1-Score were all achieved at 95 %.

Original languageEnglish
Title of host publicationProceedings of the 2017 6th International Conference on Electrical Engineering and Informatics
Subtitle of host publicationSustainable Society Through Digital Innovation, ICEEI 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-4
Number of pages4
Volume2017-November
ISBN (Electronic)9781538604755
DOIs
Publication statusPublished - 9 Mar 2018
Event6th International Conference on Electrical Engineering and Informatics, ICEEI 2017 - Langkawi, Malaysia
Duration: 25 Nov 201727 Nov 2017

Other

Other6th International Conference on Electrical Engineering and Informatics, ICEEI 2017
CountryMalaysia
CityLangkawi
Period25/11/1727/11/17

Fingerprint

Text Classification
TF-IDF
Classifiers
Classifier
Experiment
Experiments
Datasets
Text

Keywords

  • Malay text
  • Short text classification
  • tweets
  • Twitter data

ASJC Scopus subject areas

  • Artificial Intelligence
  • Control and Optimization
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Information Systems
  • Software
  • Electrical and Electronic Engineering
  • Health Informatics

Cite this

Tiun, S. (2018). Experiments on Malay short text classification. In Proceedings of the 2017 6th International Conference on Electrical Engineering and Informatics: Sustainable Society Through Digital Innovation, ICEEI 2017 (Vol. 2017-November, pp. 1-4). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICEEI.2017.8312371

Experiments on Malay short text classification. / Tiun, Sabrina.

Proceedings of the 2017 6th International Conference on Electrical Engineering and Informatics: Sustainable Society Through Digital Innovation, ICEEI 2017. Vol. 2017-November Institute of Electrical and Electronics Engineers Inc., 2018. p. 1-4.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Tiun, S 2018, Experiments on Malay short text classification. in Proceedings of the 2017 6th International Conference on Electrical Engineering and Informatics: Sustainable Society Through Digital Innovation, ICEEI 2017. vol. 2017-November, Institute of Electrical and Electronics Engineers Inc., pp. 1-4, 6th International Conference on Electrical Engineering and Informatics, ICEEI 2017, Langkawi, Malaysia, 25/11/17. https://doi.org/10.1109/ICEEI.2017.8312371
Tiun S. Experiments on Malay short text classification. In Proceedings of the 2017 6th International Conference on Electrical Engineering and Informatics: Sustainable Society Through Digital Innovation, ICEEI 2017. Vol. 2017-November. Institute of Electrical and Electronics Engineers Inc. 2018. p. 1-4 https://doi.org/10.1109/ICEEI.2017.8312371
Tiun, Sabrina. / Experiments on Malay short text classification. Proceedings of the 2017 6th International Conference on Electrical Engineering and Informatics: Sustainable Society Through Digital Innovation, ICEEI 2017. Vol. 2017-November Institute of Electrical and Electronics Engineers Inc., 2018. pp. 1-4
@inproceedings{3a88d6a4e6d048979efba4b8288b6480,
title = "Experiments on Malay short text classification",
abstract = "In this study, experiments are conducted on Malay short text using three diverse types of classifiers: KNN, SVM and NB. The classifiers were used to test the features of a bag-of-words (BOW) and a variant of TF.IDF; TF-IDF, smoothed TF-IDF and ITC. A Malay short text dataset was developed based on tweets from Twitter data and classified into two separate classes. The experiments were conducted on 50 {\%} and 20 {\%} sizes of the test data. The results demonstrated that the most highly consistent result was achieved by the SVM classifier with ITC as the feature, where the Precision, Recall, and F1-Score were all achieved at 95 {\%}.",
keywords = "Malay text, Short text classification, tweets, Twitter data",
author = "Sabrina Tiun",
year = "2018",
month = "3",
day = "9",
doi = "10.1109/ICEEI.2017.8312371",
language = "English",
volume = "2017-November",
pages = "1--4",
booktitle = "Proceedings of the 2017 6th International Conference on Electrical Engineering and Informatics",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Experiments on Malay short text classification

AU - Tiun, Sabrina

PY - 2018/3/9

Y1 - 2018/3/9

N2 - In this study, experiments are conducted on Malay short text using three diverse types of classifiers: KNN, SVM and NB. The classifiers were used to test the features of a bag-of-words (BOW) and a variant of TF.IDF; TF-IDF, smoothed TF-IDF and ITC. A Malay short text dataset was developed based on tweets from Twitter data and classified into two separate classes. The experiments were conducted on 50 % and 20 % sizes of the test data. The results demonstrated that the most highly consistent result was achieved by the SVM classifier with ITC as the feature, where the Precision, Recall, and F1-Score were all achieved at 95 %.

AB - In this study, experiments are conducted on Malay short text using three diverse types of classifiers: KNN, SVM and NB. The classifiers were used to test the features of a bag-of-words (BOW) and a variant of TF.IDF; TF-IDF, smoothed TF-IDF and ITC. A Malay short text dataset was developed based on tweets from Twitter data and classified into two separate classes. The experiments were conducted on 50 % and 20 % sizes of the test data. The results demonstrated that the most highly consistent result was achieved by the SVM classifier with ITC as the feature, where the Precision, Recall, and F1-Score were all achieved at 95 %.

KW - Malay text

KW - Short text classification

KW - tweets

KW - Twitter data

UR - http://www.scopus.com/inward/record.url?scp=85050823636&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85050823636&partnerID=8YFLogxK

U2 - 10.1109/ICEEI.2017.8312371

DO - 10.1109/ICEEI.2017.8312371

M3 - Conference contribution

AN - SCOPUS:85050823636

VL - 2017-November

SP - 1

EP - 4

BT - Proceedings of the 2017 6th International Conference on Electrical Engineering and Informatics

PB - Institute of Electrical and Electronics Engineers Inc.

ER -