Enhanced feature for short document classification

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Now a days, the use of short text has been increased dramatically in which many applications are being relied on short text such as mobile messaging, breaking news social media and queries. The key challenging behind the short text lies on the limitation of acquiring context information from such text. This limitation increases both sparsity and ambiguity of the text. The traditional approaches that have been used for the classical text such as bag-of-words, seems to be insufficient due to the too limited information that could be extracted from the short text. This leads to loss the semantic knowledge and the semantic relations between the words within the short text. Hence, this study aims to propose a new feature selection method based on Interesting Term Count (ITC) with an external knowledge of WordNet and weighting to new weight (di) to identify the variation between classes on the base of ITC. The proposed feature selection approach aims at identifying the frequent terms without losing the semantic manner where the WordNet will be utilized in order to provide the semantic correspondences among the words within the short text. Furthermore, three classification methods have been used including support vector machine, J48 and Naive Bayes. The evaluation has been performed by applying the three classifiers with the proposed feature selection method and without the proposed feature selection method. Experimental results shown an outperformance of the classifiers with the proposed feature selection method. This can imply the effectiveness behind using the proposed ITC with external source knowledge for the short text classification.

Original languageEnglish
Pages (from-to)3534-3540
Number of pages7
JournalJournal of Engineering and Applied Sciences
Volume12
Issue number13
DOIs
Publication statusPublished - 2017

Fingerprint

Feature extraction
Semantics
Classifiers
Support vector machines

Keywords

  • Feature selection
  • ITC
  • J48
  • NB
  • Short text
  • SVM
  • Text classification
  • WordNet

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Enhanced feature for short document classification. / Hasan, Ali Abdulkadhim; Tiun, Sabrina; Mohd. Yusof, Maryati; Mokhtar, Umi Asma`; Ambari, Dian Indrayani.

In: Journal of Engineering and Applied Sciences, Vol. 12, No. 13, 2017, p. 3534-3540.

Research output: Contribution to journalArticle

@article{0f6f39d65bdf4860b7e406405225c8f8,
title = "Enhanced feature for short document classification",
abstract = "Now a days, the use of short text has been increased dramatically in which many applications are being relied on short text such as mobile messaging, breaking news social media and queries. The key challenging behind the short text lies on the limitation of acquiring context information from such text. This limitation increases both sparsity and ambiguity of the text. The traditional approaches that have been used for the classical text such as bag-of-words, seems to be insufficient due to the too limited information that could be extracted from the short text. This leads to loss the semantic knowledge and the semantic relations between the words within the short text. Hence, this study aims to propose a new feature selection method based on Interesting Term Count (ITC) with an external knowledge of WordNet and weighting to new weight (di) to identify the variation between classes on the base of ITC. The proposed feature selection approach aims at identifying the frequent terms without losing the semantic manner where the WordNet will be utilized in order to provide the semantic correspondences among the words within the short text. Furthermore, three classification methods have been used including support vector machine, J48 and Naive Bayes. The evaluation has been performed by applying the three classifiers with the proposed feature selection method and without the proposed feature selection method. Experimental results shown an outperformance of the classifiers with the proposed feature selection method. This can imply the effectiveness behind using the proposed ITC with external source knowledge for the short text classification.",
keywords = "Feature selection, ITC, J48, NB, Short text, SVM, Text classification, WordNet",
author = "Hasan, {Ali Abdulkadhim} and Sabrina Tiun and {Mohd. Yusof}, Maryati and Mokhtar, {Umi Asma`} and Ambari, {Dian Indrayani}",
year = "2017",
doi = "10.3923/jeasci.2017.3534.3540",
language = "English",
volume = "12",
pages = "3534--3540",
journal = "Journal of Engineering and Applied Sciences",
issn = "1816-949X",
publisher = "Medwell Journals",
number = "13",

}

TY - JOUR

T1 - Enhanced feature for short document classification

AU - Hasan, Ali Abdulkadhim

AU - Tiun, Sabrina

AU - Mohd. Yusof, Maryati

AU - Mokhtar, Umi Asma`

AU - Ambari, Dian Indrayani

PY - 2017

Y1 - 2017

N2 - Now a days, the use of short text has been increased dramatically in which many applications are being relied on short text such as mobile messaging, breaking news social media and queries. The key challenging behind the short text lies on the limitation of acquiring context information from such text. This limitation increases both sparsity and ambiguity of the text. The traditional approaches that have been used for the classical text such as bag-of-words, seems to be insufficient due to the too limited information that could be extracted from the short text. This leads to loss the semantic knowledge and the semantic relations between the words within the short text. Hence, this study aims to propose a new feature selection method based on Interesting Term Count (ITC) with an external knowledge of WordNet and weighting to new weight (di) to identify the variation between classes on the base of ITC. The proposed feature selection approach aims at identifying the frequent terms without losing the semantic manner where the WordNet will be utilized in order to provide the semantic correspondences among the words within the short text. Furthermore, three classification methods have been used including support vector machine, J48 and Naive Bayes. The evaluation has been performed by applying the three classifiers with the proposed feature selection method and without the proposed feature selection method. Experimental results shown an outperformance of the classifiers with the proposed feature selection method. This can imply the effectiveness behind using the proposed ITC with external source knowledge for the short text classification.

AB - Now a days, the use of short text has been increased dramatically in which many applications are being relied on short text such as mobile messaging, breaking news social media and queries. The key challenging behind the short text lies on the limitation of acquiring context information from such text. This limitation increases both sparsity and ambiguity of the text. The traditional approaches that have been used for the classical text such as bag-of-words, seems to be insufficient due to the too limited information that could be extracted from the short text. This leads to loss the semantic knowledge and the semantic relations between the words within the short text. Hence, this study aims to propose a new feature selection method based on Interesting Term Count (ITC) with an external knowledge of WordNet and weighting to new weight (di) to identify the variation between classes on the base of ITC. The proposed feature selection approach aims at identifying the frequent terms without losing the semantic manner where the WordNet will be utilized in order to provide the semantic correspondences among the words within the short text. Furthermore, three classification methods have been used including support vector machine, J48 and Naive Bayes. The evaluation has been performed by applying the three classifiers with the proposed feature selection method and without the proposed feature selection method. Experimental results shown an outperformance of the classifiers with the proposed feature selection method. This can imply the effectiveness behind using the proposed ITC with external source knowledge for the short text classification.

KW - Feature selection

KW - ITC

KW - J48

KW - NB

KW - Short text

KW - SVM

KW - Text classification

KW - WordNet

UR - http://www.scopus.com/inward/record.url?scp=85028517253&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85028517253&partnerID=8YFLogxK

U2 - 10.3923/jeasci.2017.3534.3540

DO - 10.3923/jeasci.2017.3534.3540

M3 - Article

AN - SCOPUS:85028517253

VL - 12

SP - 3534

EP - 3540

JO - Journal of Engineering and Applied Sciences

JF - Journal of Engineering and Applied Sciences

SN - 1816-949X

IS - 13

ER -