Ant colony optimization for text feature selection in sentiment analysis

Research output: Contribution to journalArticle

Abstract

In sentiment analysis, the high dimensionality of the feature vector is a key problem because it can decrease the accuracy of sentiment classification and make it difficult to obtain the optimum subset of features. To solve this problem, this study proposes a new text feature selection method that uses a wrapper approach, integrated with ant colony optimization (ACO) to guide the feature selection process. It also uses the k-nearest neighbour (KNN) as a classifier to evaluate and generate a candidate subset of optimum features. To test the subset of optimum features, algorithm dependency relations were used to find the relationship between the feature and the sentiment word in customer reviews. The output of the feature subset, which was derived using the proposed ACO-KNN algorithm, was used as an input to identify and extract sentiment words from sentences in customer reviews. The resulting relationship between features and sentiment words was tested and evaluated to determine the accuracy based on precision, recall, and F-score. The performance of the proposed ACO-KNN algorithm on customer review datasets was evaluated and compared with that of two hybrid algorithms from the literature, namely, the genetic algorithm with information gain and information gain with rough set attribute reduction. The results of the experiments showed that the proposed ACO-KNN algorithm was able to obtain the optimum subset of features and can improve the accuracy of sentiment classification.

Original languageEnglish
Pages (from-to)133-158
Number of pages26
JournalIntelligent Data Analysis
Volume23
Issue number1
DOIs
Publication statusPublished - 1 Jan 2019

Fingerprint

Sentiment Analysis
Ant colony optimization
Feature Selection
Feature extraction
Nearest Neighbor
Set theory
Subset
Information Gain
Customers
Attribute Reduction
Wrapper
Hybrid Algorithm
Feature Vector
Rough Set
Dimensionality
Classifiers
Genetic algorithms
Classifier
Text
Genetic Algorithm

Keywords

  • ant colony optimization
  • k-nearest neighbour
  • metaheuristic algorithm
  • Sentiment analysis
  • text feature selection

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Cite this

Ant colony optimization for text feature selection in sentiment analysis. / Ahmad, Siti Rohaidah; Abu Bakar, Azuraliza; Yaakub, Mohd Ridzwan.

In: Intelligent Data Analysis, Vol. 23, No. 1, 01.01.2019, p. 133-158.

Research output: Contribution to journalArticle

@article{4f13e7976e3d4baa8a3d92a2a2f0044d,
title = "Ant colony optimization for text feature selection in sentiment analysis",
abstract = "In sentiment analysis, the high dimensionality of the feature vector is a key problem because it can decrease the accuracy of sentiment classification and make it difficult to obtain the optimum subset of features. To solve this problem, this study proposes a new text feature selection method that uses a wrapper approach, integrated with ant colony optimization (ACO) to guide the feature selection process. It also uses the k-nearest neighbour (KNN) as a classifier to evaluate and generate a candidate subset of optimum features. To test the subset of optimum features, algorithm dependency relations were used to find the relationship between the feature and the sentiment word in customer reviews. The output of the feature subset, which was derived using the proposed ACO-KNN algorithm, was used as an input to identify and extract sentiment words from sentences in customer reviews. The resulting relationship between features and sentiment words was tested and evaluated to determine the accuracy based on precision, recall, and F-score. The performance of the proposed ACO-KNN algorithm on customer review datasets was evaluated and compared with that of two hybrid algorithms from the literature, namely, the genetic algorithm with information gain and information gain with rough set attribute reduction. The results of the experiments showed that the proposed ACO-KNN algorithm was able to obtain the optimum subset of features and can improve the accuracy of sentiment classification.",
keywords = "ant colony optimization, k-nearest neighbour, metaheuristic algorithm, Sentiment analysis, text feature selection",
author = "Ahmad, {Siti Rohaidah} and {Abu Bakar}, Azuraliza and Yaakub, {Mohd Ridzwan}",
year = "2019",
month = "1",
day = "1",
doi = "10.3233/IDA-173740",
language = "English",
volume = "23",
pages = "133--158",
journal = "Intelligent Data Analysis",
issn = "1088-467X",
publisher = "IOS Press",
number = "1",

}

TY - JOUR

T1 - Ant colony optimization for text feature selection in sentiment analysis

AU - Ahmad, Siti Rohaidah

AU - Abu Bakar, Azuraliza

AU - Yaakub, Mohd Ridzwan

PY - 2019/1/1

Y1 - 2019/1/1

N2 - In sentiment analysis, the high dimensionality of the feature vector is a key problem because it can decrease the accuracy of sentiment classification and make it difficult to obtain the optimum subset of features. To solve this problem, this study proposes a new text feature selection method that uses a wrapper approach, integrated with ant colony optimization (ACO) to guide the feature selection process. It also uses the k-nearest neighbour (KNN) as a classifier to evaluate and generate a candidate subset of optimum features. To test the subset of optimum features, algorithm dependency relations were used to find the relationship between the feature and the sentiment word in customer reviews. The output of the feature subset, which was derived using the proposed ACO-KNN algorithm, was used as an input to identify and extract sentiment words from sentences in customer reviews. The resulting relationship between features and sentiment words was tested and evaluated to determine the accuracy based on precision, recall, and F-score. The performance of the proposed ACO-KNN algorithm on customer review datasets was evaluated and compared with that of two hybrid algorithms from the literature, namely, the genetic algorithm with information gain and information gain with rough set attribute reduction. The results of the experiments showed that the proposed ACO-KNN algorithm was able to obtain the optimum subset of features and can improve the accuracy of sentiment classification.

AB - In sentiment analysis, the high dimensionality of the feature vector is a key problem because it can decrease the accuracy of sentiment classification and make it difficult to obtain the optimum subset of features. To solve this problem, this study proposes a new text feature selection method that uses a wrapper approach, integrated with ant colony optimization (ACO) to guide the feature selection process. It also uses the k-nearest neighbour (KNN) as a classifier to evaluate and generate a candidate subset of optimum features. To test the subset of optimum features, algorithm dependency relations were used to find the relationship between the feature and the sentiment word in customer reviews. The output of the feature subset, which was derived using the proposed ACO-KNN algorithm, was used as an input to identify and extract sentiment words from sentences in customer reviews. The resulting relationship between features and sentiment words was tested and evaluated to determine the accuracy based on precision, recall, and F-score. The performance of the proposed ACO-KNN algorithm on customer review datasets was evaluated and compared with that of two hybrid algorithms from the literature, namely, the genetic algorithm with information gain and information gain with rough set attribute reduction. The results of the experiments showed that the proposed ACO-KNN algorithm was able to obtain the optimum subset of features and can improve the accuracy of sentiment classification.

KW - ant colony optimization

KW - k-nearest neighbour

KW - metaheuristic algorithm

KW - Sentiment analysis

KW - text feature selection

UR - http://www.scopus.com/inward/record.url?scp=85062223766&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062223766&partnerID=8YFLogxK

U2 - 10.3233/IDA-173740

DO - 10.3233/IDA-173740

M3 - Article

VL - 23

SP - 133

EP - 158

JO - Intelligent Data Analysis

JF - Intelligent Data Analysis

SN - 1088-467X

IS - 1

ER -