Comparison of machine learning approaches on Arabic twitter sentiment analysis

Merfat M. Altawaier, Sabrina Tiun

Research output: Contribution to journalArticle

15 Citations (Scopus)

Abstract

With the dramatic expansion of information over the internet, users around the world express their opinion daily on the social network such as Facebook and Twitter. Large corporations nowadays invest on analyzing these opinions in order to assess their products or services by knowing the people feedback toward such business. The process of knowing users' opinions toward particular product or services whether positive or negative is called sentiment analysis. Arabic is one of the common languages that have been addressed regarding sentiment analysis. In the literature, several approaches have been proposed for Arabic sentiment analysis and most of these approaches are using machine learning techniques. Machine learning techniques are various and have different performances. Therefore, in this study, we try to identify a simple, but workable approach for Arabic sentiment analysis on Twitter. Hence, this study aims to investigate the machine learning technique in terms of Arabic sentiment analysis on Twitter. Three techniques have been used including Naïve Bayes, Decision Tree (DT) and Support Vector Machine (SVM). In addition, two simple sub-tasks pre-processing have been also used; Term Frequency-Inverse Document Frequency (TF-IDF) and Arabic stemming to get the heaviest weight term as the feature for tweet classification. TF-IDF aims to identify the most frequent words, whereas stemming aims to retrieve the stem of the word by removing the inflectional derivations. The dataset that has been used is Modern Arabic Corpus which consists of Arabic tweets. The performance of classification has been evaluated based on the information retrieval metrics precision, recall, and f-measure. The experimental results have shown that DT has outperformed the other techniques by obtaining 78% of f-measure.

Original languageEnglish
Pages (from-to)1067-1073
Number of pages7
JournalInternational Journal on Advanced Science, Engineering and Information Technology
Volume6
Issue number6
DOIs
Publication statusPublished - 2016

Fingerprint

artificial intelligence
Learning systems
Decision Trees
Decision trees
Information Storage and Retrieval
Information retrieval
Social Support
Internet
Support vector machines
Industry
information retrieval
Language
methodology
social networks
corporations
Feedback
Weights and Measures
Processing
Machine Learning
stems

Keywords

  • Arabic sentiment analysis
  • Opinion mining
  • Twitter data

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Computer Science(all)
  • Engineering(all)

Cite this

Comparison of machine learning approaches on Arabic twitter sentiment analysis. / Altawaier, Merfat M.; Tiun, Sabrina.

In: International Journal on Advanced Science, Engineering and Information Technology, Vol. 6, No. 6, 2016, p. 1067-1073.

Research output: Contribution to journalArticle

@article{ed5446832eb24baf819e880ffae93000,
title = "Comparison of machine learning approaches on Arabic twitter sentiment analysis",
abstract = "With the dramatic expansion of information over the internet, users around the world express their opinion daily on the social network such as Facebook and Twitter. Large corporations nowadays invest on analyzing these opinions in order to assess their products or services by knowing the people feedback toward such business. The process of knowing users' opinions toward particular product or services whether positive or negative is called sentiment analysis. Arabic is one of the common languages that have been addressed regarding sentiment analysis. In the literature, several approaches have been proposed for Arabic sentiment analysis and most of these approaches are using machine learning techniques. Machine learning techniques are various and have different performances. Therefore, in this study, we try to identify a simple, but workable approach for Arabic sentiment analysis on Twitter. Hence, this study aims to investigate the machine learning technique in terms of Arabic sentiment analysis on Twitter. Three techniques have been used including Na{\"i}ve Bayes, Decision Tree (DT) and Support Vector Machine (SVM). In addition, two simple sub-tasks pre-processing have been also used; Term Frequency-Inverse Document Frequency (TF-IDF) and Arabic stemming to get the heaviest weight term as the feature for tweet classification. TF-IDF aims to identify the most frequent words, whereas stemming aims to retrieve the stem of the word by removing the inflectional derivations. The dataset that has been used is Modern Arabic Corpus which consists of Arabic tweets. The performance of classification has been evaluated based on the information retrieval metrics precision, recall, and f-measure. The experimental results have shown that DT has outperformed the other techniques by obtaining 78{\%} of f-measure.",
keywords = "Arabic sentiment analysis, Opinion mining, Twitter data",
author = "Altawaier, {Merfat M.} and Sabrina Tiun",
year = "2016",
doi = "10.18517/ijaseit.6.6.1456",
language = "English",
volume = "6",
pages = "1067--1073",
journal = "International Journal on Advanced Science, Engineering and Information Technology",
issn = "2088-5334",
publisher = "INSIGHT - Indonesian Society for Knowledge and Human Development",
number = "6",

}

TY - JOUR

T1 - Comparison of machine learning approaches on Arabic twitter sentiment analysis

AU - Altawaier, Merfat M.

AU - Tiun, Sabrina

PY - 2016

Y1 - 2016

N2 - With the dramatic expansion of information over the internet, users around the world express their opinion daily on the social network such as Facebook and Twitter. Large corporations nowadays invest on analyzing these opinions in order to assess their products or services by knowing the people feedback toward such business. The process of knowing users' opinions toward particular product or services whether positive or negative is called sentiment analysis. Arabic is one of the common languages that have been addressed regarding sentiment analysis. In the literature, several approaches have been proposed for Arabic sentiment analysis and most of these approaches are using machine learning techniques. Machine learning techniques are various and have different performances. Therefore, in this study, we try to identify a simple, but workable approach for Arabic sentiment analysis on Twitter. Hence, this study aims to investigate the machine learning technique in terms of Arabic sentiment analysis on Twitter. Three techniques have been used including Naïve Bayes, Decision Tree (DT) and Support Vector Machine (SVM). In addition, two simple sub-tasks pre-processing have been also used; Term Frequency-Inverse Document Frequency (TF-IDF) and Arabic stemming to get the heaviest weight term as the feature for tweet classification. TF-IDF aims to identify the most frequent words, whereas stemming aims to retrieve the stem of the word by removing the inflectional derivations. The dataset that has been used is Modern Arabic Corpus which consists of Arabic tweets. The performance of classification has been evaluated based on the information retrieval metrics precision, recall, and f-measure. The experimental results have shown that DT has outperformed the other techniques by obtaining 78% of f-measure.

AB - With the dramatic expansion of information over the internet, users around the world express their opinion daily on the social network such as Facebook and Twitter. Large corporations nowadays invest on analyzing these opinions in order to assess their products or services by knowing the people feedback toward such business. The process of knowing users' opinions toward particular product or services whether positive or negative is called sentiment analysis. Arabic is one of the common languages that have been addressed regarding sentiment analysis. In the literature, several approaches have been proposed for Arabic sentiment analysis and most of these approaches are using machine learning techniques. Machine learning techniques are various and have different performances. Therefore, in this study, we try to identify a simple, but workable approach for Arabic sentiment analysis on Twitter. Hence, this study aims to investigate the machine learning technique in terms of Arabic sentiment analysis on Twitter. Three techniques have been used including Naïve Bayes, Decision Tree (DT) and Support Vector Machine (SVM). In addition, two simple sub-tasks pre-processing have been also used; Term Frequency-Inverse Document Frequency (TF-IDF) and Arabic stemming to get the heaviest weight term as the feature for tweet classification. TF-IDF aims to identify the most frequent words, whereas stemming aims to retrieve the stem of the word by removing the inflectional derivations. The dataset that has been used is Modern Arabic Corpus which consists of Arabic tweets. The performance of classification has been evaluated based on the information retrieval metrics precision, recall, and f-measure. The experimental results have shown that DT has outperformed the other techniques by obtaining 78% of f-measure.

KW - Arabic sentiment analysis

KW - Opinion mining

KW - Twitter data

UR - http://www.scopus.com/inward/record.url?scp=85010211562&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85010211562&partnerID=8YFLogxK

U2 - 10.18517/ijaseit.6.6.1456

DO - 10.18517/ijaseit.6.6.1456

M3 - Article

AN - SCOPUS:85010211562

VL - 6

SP - 1067

EP - 1073

JO - International Journal on Advanced Science, Engineering and Information Technology

JF - International Journal on Advanced Science, Engineering and Information Technology

SN - 2088-5334

IS - 6

ER -