Normalization of noisy texts in Malaysian online reviews

Norlela Samsudin, Mazidah Puteh, Abdul Razak Hamdan, Mohd Zakree Ahmad Nazri

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

The process of gathering useful information from online messages has increased as more and more people use the Internet and other online applications such as Facebook and Twitter to communicate with each other. One of the problems in processing online messages is the high number of noisy texts that exist in these messages. Few studies have shown that the noisy texts decreased the result of text mining activities. On the other hand, very few works have investigated on the patterns of noisy texts that are created by Malaysians. In this study, a common noisy terms list and an artificial abbreviations list were created using specific rules and were utilized to select candidates of correct words for a noisy term. Later, the correct term was selected based on a bi-gram words index. The experiments used online messages that were created by the Malaysians. The result shows that normalization of noisy texts using artificial abbreviations list compliments the use of common noisy texts list.

Original languageEnglish
Pages (from-to)147-159
Number of pages13
JournalJournal of Information and Communication Technology
Volume12
Issue number1
Publication statusPublished - 2013

Fingerprint

Normalization
Internet
Abbreviation
Processing
Experiments
Term
Text Mining
Text
Review
Experiment

Keywords

  • Artificial abbreviation
  • Noisy texts
  • Normalization of noisy texts

ASJC Scopus subject areas

  • Computer Science(all)
  • Mathematics(all)

Cite this

Normalization of noisy texts in Malaysian online reviews. / Samsudin, Norlela; Puteh, Mazidah; Hamdan, Abdul Razak; Ahmad Nazri, Mohd Zakree.

In: Journal of Information and Communication Technology, Vol. 12, No. 1, 2013, p. 147-159.

Research output: Contribution to journalArticle

@article{889879e113764bb99bb6330d87cf2c92,
title = "Normalization of noisy texts in Malaysian online reviews",
abstract = "The process of gathering useful information from online messages has increased as more and more people use the Internet and other online applications such as Facebook and Twitter to communicate with each other. One of the problems in processing online messages is the high number of noisy texts that exist in these messages. Few studies have shown that the noisy texts decreased the result of text mining activities. On the other hand, very few works have investigated on the patterns of noisy texts that are created by Malaysians. In this study, a common noisy terms list and an artificial abbreviations list were created using specific rules and were utilized to select candidates of correct words for a noisy term. Later, the correct term was selected based on a bi-gram words index. The experiments used online messages that were created by the Malaysians. The result shows that normalization of noisy texts using artificial abbreviations list compliments the use of common noisy texts list.",
keywords = "Artificial abbreviation, Noisy texts, Normalization of noisy texts",
author = "Norlela Samsudin and Mazidah Puteh and Hamdan, {Abdul Razak} and {Ahmad Nazri}, {Mohd Zakree}",
year = "2013",
language = "English",
volume = "12",
pages = "147--159",
journal = "Journal of Information and Communication Technology",
issn = "1675-414X",
publisher = "Universiti Utara Malaysia Press",
number = "1",

}

TY - JOUR

T1 - Normalization of noisy texts in Malaysian online reviews

AU - Samsudin, Norlela

AU - Puteh, Mazidah

AU - Hamdan, Abdul Razak

AU - Ahmad Nazri, Mohd Zakree

PY - 2013

Y1 - 2013

N2 - The process of gathering useful information from online messages has increased as more and more people use the Internet and other online applications such as Facebook and Twitter to communicate with each other. One of the problems in processing online messages is the high number of noisy texts that exist in these messages. Few studies have shown that the noisy texts decreased the result of text mining activities. On the other hand, very few works have investigated on the patterns of noisy texts that are created by Malaysians. In this study, a common noisy terms list and an artificial abbreviations list were created using specific rules and were utilized to select candidates of correct words for a noisy term. Later, the correct term was selected based on a bi-gram words index. The experiments used online messages that were created by the Malaysians. The result shows that normalization of noisy texts using artificial abbreviations list compliments the use of common noisy texts list.

AB - The process of gathering useful information from online messages has increased as more and more people use the Internet and other online applications such as Facebook and Twitter to communicate with each other. One of the problems in processing online messages is the high number of noisy texts that exist in these messages. Few studies have shown that the noisy texts decreased the result of text mining activities. On the other hand, very few works have investigated on the patterns of noisy texts that are created by Malaysians. In this study, a common noisy terms list and an artificial abbreviations list were created using specific rules and were utilized to select candidates of correct words for a noisy term. Later, the correct term was selected based on a bi-gram words index. The experiments used online messages that were created by the Malaysians. The result shows that normalization of noisy texts using artificial abbreviations list compliments the use of common noisy texts list.

KW - Artificial abbreviation

KW - Noisy texts

KW - Normalization of noisy texts

UR - http://www.scopus.com/inward/record.url?scp=84893018570&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84893018570&partnerID=8YFLogxK

M3 - Article

VL - 12

SP - 147

EP - 159

JO - Journal of Information and Communication Technology

JF - Journal of Information and Communication Technology

SN - 1675-414X

IS - 1

ER -