Hybrid N-gram model using Naïve Bayes for classification of political sentiments on Twitter

Research output: Contribution to journalArticle

Abstract

Twitter, an online micro-blogging and social networking service, provides registered users the ability to write in 140 characters anything they wish and hence providing them the opportunity to express their opinions and sentiments on events taking place. Politically sentimental tweets are top-trending tweets; whenever election is near, users tweet about their favorite candidates or political parties and at times give their reasons for that. In this study, we hybridize two n-gram [two n-gram models used in this study are unigram and n-gram. Therefore, in this study, where unigram is mentioned that refers to a least-order n-gram (unigram) and where n-gram is mentioned that refers to the highest-order (full sentence or tweet level) n-gram] models and applied Laplace smoothing to Naïve Bayesian classifier and Katz back-off on the model. This was done in order to smoothen and address the limitation of accuracy in terms of precision and recall of n-gram models caused by the ‘zero count problem.’ Result from our baseline model shows an increase of 6.05% in average F-Harmonic accuracy in comparison with the n-gram model and 1.75% increase in comparison with the semantic-topic model proposed from a previous study on the same dataset, i.e., Obama–McCain dataset.

Original languageEnglish
JournalNeural Computing and Applications
DOIs
Publication statusPublished - 1 Jan 2019

Fingerprint

Classifiers
Semantics

Keywords

  • Data mining
  • n-gram
  • Naïve Bayes
  • Sentiment analysis
  • Social network

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Cite this

@article{297a9287740c4dca9fb10bb816187126,
title = "Hybrid N-gram model using Na{\"i}ve Bayes for classification of political sentiments on Twitter",
abstract = "Twitter, an online micro-blogging and social networking service, provides registered users the ability to write in 140 characters anything they wish and hence providing them the opportunity to express their opinions and sentiments on events taking place. Politically sentimental tweets are top-trending tweets; whenever election is near, users tweet about their favorite candidates or political parties and at times give their reasons for that. In this study, we hybridize two n-gram [two n-gram models used in this study are unigram and n-gram. Therefore, in this study, where unigram is mentioned that refers to a least-order n-gram (unigram) and where n-gram is mentioned that refers to the highest-order (full sentence or tweet level) n-gram] models and applied Laplace smoothing to Na{\"i}ve Bayesian classifier and Katz back-off on the model. This was done in order to smoothen and address the limitation of accuracy in terms of precision and recall of n-gram models caused by the ‘zero count problem.’ Result from our baseline model shows an increase of 6.05{\%} in average F-Harmonic accuracy in comparison with the n-gram model and 1.75{\%} increase in comparison with the semantic-topic model proposed from a previous study on the same dataset, i.e., Obama–McCain dataset.",
keywords = "Data mining, n-gram, Na{\"i}ve Bayes, Sentiment analysis, Social network",
author = "Jamilu Awwalu and {Abu Bakar}, Azuraliza and Yaakub, {Mohd Ridzwan}",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/s00521-019-04248-z",
language = "English",
journal = "Neural Computing and Applications",
issn = "0941-0643",
publisher = "Springer London",

}

TY - JOUR

T1 - Hybrid N-gram model using Naïve Bayes for classification of political sentiments on Twitter

AU - Awwalu, Jamilu

AU - Abu Bakar, Azuraliza

AU - Yaakub, Mohd Ridzwan

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Twitter, an online micro-blogging and social networking service, provides registered users the ability to write in 140 characters anything they wish and hence providing them the opportunity to express their opinions and sentiments on events taking place. Politically sentimental tweets are top-trending tweets; whenever election is near, users tweet about their favorite candidates or political parties and at times give their reasons for that. In this study, we hybridize two n-gram [two n-gram models used in this study are unigram and n-gram. Therefore, in this study, where unigram is mentioned that refers to a least-order n-gram (unigram) and where n-gram is mentioned that refers to the highest-order (full sentence or tweet level) n-gram] models and applied Laplace smoothing to Naïve Bayesian classifier and Katz back-off on the model. This was done in order to smoothen and address the limitation of accuracy in terms of precision and recall of n-gram models caused by the ‘zero count problem.’ Result from our baseline model shows an increase of 6.05% in average F-Harmonic accuracy in comparison with the n-gram model and 1.75% increase in comparison with the semantic-topic model proposed from a previous study on the same dataset, i.e., Obama–McCain dataset.

AB - Twitter, an online micro-blogging and social networking service, provides registered users the ability to write in 140 characters anything they wish and hence providing them the opportunity to express their opinions and sentiments on events taking place. Politically sentimental tweets are top-trending tweets; whenever election is near, users tweet about their favorite candidates or political parties and at times give their reasons for that. In this study, we hybridize two n-gram [two n-gram models used in this study are unigram and n-gram. Therefore, in this study, where unigram is mentioned that refers to a least-order n-gram (unigram) and where n-gram is mentioned that refers to the highest-order (full sentence or tweet level) n-gram] models and applied Laplace smoothing to Naïve Bayesian classifier and Katz back-off on the model. This was done in order to smoothen and address the limitation of accuracy in terms of precision and recall of n-gram models caused by the ‘zero count problem.’ Result from our baseline model shows an increase of 6.05% in average F-Harmonic accuracy in comparison with the n-gram model and 1.75% increase in comparison with the semantic-topic model proposed from a previous study on the same dataset, i.e., Obama–McCain dataset.

KW - Data mining

KW - n-gram

KW - Naïve Bayes

KW - Sentiment analysis

KW - Social network

UR - http://www.scopus.com/inward/record.url?scp=85066135733&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066135733&partnerID=8YFLogxK

U2 - 10.1007/s00521-019-04248-z

DO - 10.1007/s00521-019-04248-z

M3 - Article

AN - SCOPUS:85066135733

JO - Neural Computing and Applications

JF - Neural Computing and Applications

SN - 0941-0643

ER -