Sentiment lexicon interpolation and polarity estimation of objective and out-of-vocabularywords to improve sentiment classification on microblogging

Yongyos Kaewpitakkun, Kiyoaki Shirai, Masnizah Mohd

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

Sentiment analysis has become an important classification task because a large amount of user-generated content is published over the Internet. Sentiment lexicons have been used successfully to classify the sentiment of user review datasets. More recently, microblogging services such as Twitter have become a popular data source in the domain of sentiment analysis. However, analyzing sentiments on tweets is still difficult because tweets are very short and contain slang, informal expressions, emoticons, mistyping and many words not found in a dictionary. In addition, more than 90 percent of the words in public sentiment lexicons, such as SentiWordNet, are objective words, which are often considered less important in a classification module. In this paper, we introduce a hybrid approach that incorporates sentiment lexicons into a machine learning approach to improve sentiment classification in tweets. We automatically construct an Add-on lexicon that compiles the polarity scores of objective words and out-ofvocabulary (OOV) words from tweet corpora. We also introduce a novel feature weighting method by interpolating sentiment lexicon score into uni-gram vectors in the Support Vector Machine (SVM). Results of our experiment show that our method is effective and significantly improves the sentiment classification accuracy compared to a baseline unigram model.

Original languageEnglish
Title of host publicationProceedings of the 28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014
PublisherFaculty of Pharmaceutical Sciences, Chulalongkorn University
Pages204-213
Number of pages10
ISBN (Electronic)9786165518871
Publication statusPublished - 2014
Externally publishedYes
Event28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014 - Phuket, Thailand
Duration: 12 Dec 201414 Dec 2014

Other

Other28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014
CountryThailand
CityPhuket
Period12/12/1414/12/14

Fingerprint

Interpolation
Glossaries
Support vector machines
Learning systems
Internet
Polarity
Lexicon
Sentiment
Experiments

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science (miscellaneous)

Cite this

Kaewpitakkun, Y., Shirai, K., & Mohd, M. (2014). Sentiment lexicon interpolation and polarity estimation of objective and out-of-vocabularywords to improve sentiment classification on microblogging. In Proceedings of the 28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014 (pp. 204-213). Faculty of Pharmaceutical Sciences, Chulalongkorn University.

Sentiment lexicon interpolation and polarity estimation of objective and out-of-vocabularywords to improve sentiment classification on microblogging. / Kaewpitakkun, Yongyos; Shirai, Kiyoaki; Mohd, Masnizah.

Proceedings of the 28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014. Faculty of Pharmaceutical Sciences, Chulalongkorn University, 2014. p. 204-213.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kaewpitakkun, Y, Shirai, K & Mohd, M 2014, Sentiment lexicon interpolation and polarity estimation of objective and out-of-vocabularywords to improve sentiment classification on microblogging. in Proceedings of the 28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014. Faculty of Pharmaceutical Sciences, Chulalongkorn University, pp. 204-213, 28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014, Phuket, Thailand, 12/12/14.
Kaewpitakkun Y, Shirai K, Mohd M. Sentiment lexicon interpolation and polarity estimation of objective and out-of-vocabularywords to improve sentiment classification on microblogging. In Proceedings of the 28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014. Faculty of Pharmaceutical Sciences, Chulalongkorn University. 2014. p. 204-213
Kaewpitakkun, Yongyos ; Shirai, Kiyoaki ; Mohd, Masnizah. / Sentiment lexicon interpolation and polarity estimation of objective and out-of-vocabularywords to improve sentiment classification on microblogging. Proceedings of the 28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014. Faculty of Pharmaceutical Sciences, Chulalongkorn University, 2014. pp. 204-213
@inproceedings{02b75936999548c1bb7a58df2c422fae,
title = "Sentiment lexicon interpolation and polarity estimation of objective and out-of-vocabularywords to improve sentiment classification on microblogging",
abstract = "Sentiment analysis has become an important classification task because a large amount of user-generated content is published over the Internet. Sentiment lexicons have been used successfully to classify the sentiment of user review datasets. More recently, microblogging services such as Twitter have become a popular data source in the domain of sentiment analysis. However, analyzing sentiments on tweets is still difficult because tweets are very short and contain slang, informal expressions, emoticons, mistyping and many words not found in a dictionary. In addition, more than 90 percent of the words in public sentiment lexicons, such as SentiWordNet, are objective words, which are often considered less important in a classification module. In this paper, we introduce a hybrid approach that incorporates sentiment lexicons into a machine learning approach to improve sentiment classification in tweets. We automatically construct an Add-on lexicon that compiles the polarity scores of objective words and out-ofvocabulary (OOV) words from tweet corpora. We also introduce a novel feature weighting method by interpolating sentiment lexicon score into uni-gram vectors in the Support Vector Machine (SVM). Results of our experiment show that our method is effective and significantly improves the sentiment classification accuracy compared to a baseline unigram model.",
author = "Yongyos Kaewpitakkun and Kiyoaki Shirai and Masnizah Mohd",
year = "2014",
language = "English",
pages = "204--213",
booktitle = "Proceedings of the 28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014",
publisher = "Faculty of Pharmaceutical Sciences, Chulalongkorn University",

}

TY - GEN

T1 - Sentiment lexicon interpolation and polarity estimation of objective and out-of-vocabularywords to improve sentiment classification on microblogging

AU - Kaewpitakkun, Yongyos

AU - Shirai, Kiyoaki

AU - Mohd, Masnizah

PY - 2014

Y1 - 2014

N2 - Sentiment analysis has become an important classification task because a large amount of user-generated content is published over the Internet. Sentiment lexicons have been used successfully to classify the sentiment of user review datasets. More recently, microblogging services such as Twitter have become a popular data source in the domain of sentiment analysis. However, analyzing sentiments on tweets is still difficult because tweets are very short and contain slang, informal expressions, emoticons, mistyping and many words not found in a dictionary. In addition, more than 90 percent of the words in public sentiment lexicons, such as SentiWordNet, are objective words, which are often considered less important in a classification module. In this paper, we introduce a hybrid approach that incorporates sentiment lexicons into a machine learning approach to improve sentiment classification in tweets. We automatically construct an Add-on lexicon that compiles the polarity scores of objective words and out-ofvocabulary (OOV) words from tweet corpora. We also introduce a novel feature weighting method by interpolating sentiment lexicon score into uni-gram vectors in the Support Vector Machine (SVM). Results of our experiment show that our method is effective and significantly improves the sentiment classification accuracy compared to a baseline unigram model.

AB - Sentiment analysis has become an important classification task because a large amount of user-generated content is published over the Internet. Sentiment lexicons have been used successfully to classify the sentiment of user review datasets. More recently, microblogging services such as Twitter have become a popular data source in the domain of sentiment analysis. However, analyzing sentiments on tweets is still difficult because tweets are very short and contain slang, informal expressions, emoticons, mistyping and many words not found in a dictionary. In addition, more than 90 percent of the words in public sentiment lexicons, such as SentiWordNet, are objective words, which are often considered less important in a classification module. In this paper, we introduce a hybrid approach that incorporates sentiment lexicons into a machine learning approach to improve sentiment classification in tweets. We automatically construct an Add-on lexicon that compiles the polarity scores of objective words and out-ofvocabulary (OOV) words from tweet corpora. We also introduce a novel feature weighting method by interpolating sentiment lexicon score into uni-gram vectors in the Support Vector Machine (SVM). Results of our experiment show that our method is effective and significantly improves the sentiment classification accuracy compared to a baseline unigram model.

UR - http://www.scopus.com/inward/record.url?scp=84994076453&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994076453&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84994076453

SP - 204

EP - 213

BT - Proceedings of the 28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014

PB - Faculty of Pharmaceutical Sciences, Chulalongkorn University

ER -