Trend detection in the arabic social media using voting combination

Ali Sabah Abdulameer, Saidah Saad, Lailatul Qadri Zakaria

Research output: Contribution to journalArticle

Abstract

The amount of information has been increasing tremendously, especially with the use of social media applications, such as Twitter, Facebook, and YouTube. Twitter is a common social application that enables users to share their current thoughts and actions, comment on breaking news, and engage in discussions. Trends are typically driven by emerging events, breaking news, and general topics that attract the attention of a large fraction of Twitter users. Thus, trend detection is highly valuable to news reporters and analysts because they may point to fast-evolving news stories. Researchers have been attempting to detect trends using machine-learning techniques, such as clustering method based on major languages (e.g., English, German, and French). The Arabic language remains in its infancy, but the Arabic social media have been contributing to a large amount of data because of the significant events in the Middle East. The present research aims to detect trends in the Arabic social media. However, this research must overcome several issues such as processing of Arabic user-generated content and lack of resources. To solve these issues, this research presents a voting combination clustering approach, which is divided into six phases, namely, dataset collection from Twitter, text pre-processing, spam filtering using Naïve Bayes, feature selection based on term frequency–inverse document frequency and entropy, statistical analyses, and evaluation. Three statistical approaches for clustering are used, namely, co-occurrence, k-means, and voting combination. The analyses are performed to classify the trends into three categories, namely, Arabic nationality events, personal events, and other events. Experimental results indicate that the voting combination clustering achieved 93%, 87%, and 90% for precision, recall, and f-measure in trend detection, respectively. Finally, trend detection of events is important to companies, governments, national security agencies, and journalists to develop strategies to rectify them.

Original languageEnglish
Pages (from-to)432-443
Number of pages12
JournalJournal of Theoretical and Applied Information Technology
Volume81
Issue number3
Publication statusPublished - 30 Nov 2015

Fingerprint

Social Media
Voting
National security
Processing
Clustering
Learning systems
Feature extraction
Entropy
Spam
K-means
Bayes
Trends
Clustering Methods
Industry
Feature Selection
Preprocessing
Machine Learning
Filtering
Classify
Resources

Keywords

  • Arabic social media
  • K-means and voting combination
  • Term clustering
  • Trend detection

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Trend detection in the arabic social media using voting combination. / Abdulameer, Ali Sabah; Saad, Saidah; Zakaria, Lailatul Qadri.

In: Journal of Theoretical and Applied Information Technology, Vol. 81, No. 3, 30.11.2015, p. 432-443.

Research output: Contribution to journalArticle

@article{1573a380791f44e598ac52cf10bd1503,
title = "Trend detection in the arabic social media using voting combination",
abstract = "The amount of information has been increasing tremendously, especially with the use of social media applications, such as Twitter, Facebook, and YouTube. Twitter is a common social application that enables users to share their current thoughts and actions, comment on breaking news, and engage in discussions. Trends are typically driven by emerging events, breaking news, and general topics that attract the attention of a large fraction of Twitter users. Thus, trend detection is highly valuable to news reporters and analysts because they may point to fast-evolving news stories. Researchers have been attempting to detect trends using machine-learning techniques, such as clustering method based on major languages (e.g., English, German, and French). The Arabic language remains in its infancy, but the Arabic social media have been contributing to a large amount of data because of the significant events in the Middle East. The present research aims to detect trends in the Arabic social media. However, this research must overcome several issues such as processing of Arabic user-generated content and lack of resources. To solve these issues, this research presents a voting combination clustering approach, which is divided into six phases, namely, dataset collection from Twitter, text pre-processing, spam filtering using Na{\"i}ve Bayes, feature selection based on term frequency–inverse document frequency and entropy, statistical analyses, and evaluation. Three statistical approaches for clustering are used, namely, co-occurrence, k-means, and voting combination. The analyses are performed to classify the trends into three categories, namely, Arabic nationality events, personal events, and other events. Experimental results indicate that the voting combination clustering achieved 93{\%}, 87{\%}, and 90{\%} for precision, recall, and f-measure in trend detection, respectively. Finally, trend detection of events is important to companies, governments, national security agencies, and journalists to develop strategies to rectify them.",
keywords = "Arabic social media, K-means and voting combination, Term clustering, Trend detection",
author = "Abdulameer, {Ali Sabah} and Saidah Saad and Zakaria, {Lailatul Qadri}",
year = "2015",
month = "11",
day = "30",
language = "English",
volume = "81",
pages = "432--443",
journal = "Journal of Theoretical and Applied Information Technology",
issn = "1992-8645",
publisher = "Asian Research Publishing Network (ARPN)",
number = "3",

}

TY - JOUR

T1 - Trend detection in the arabic social media using voting combination

AU - Abdulameer, Ali Sabah

AU - Saad, Saidah

AU - Zakaria, Lailatul Qadri

PY - 2015/11/30

Y1 - 2015/11/30

N2 - The amount of information has been increasing tremendously, especially with the use of social media applications, such as Twitter, Facebook, and YouTube. Twitter is a common social application that enables users to share their current thoughts and actions, comment on breaking news, and engage in discussions. Trends are typically driven by emerging events, breaking news, and general topics that attract the attention of a large fraction of Twitter users. Thus, trend detection is highly valuable to news reporters and analysts because they may point to fast-evolving news stories. Researchers have been attempting to detect trends using machine-learning techniques, such as clustering method based on major languages (e.g., English, German, and French). The Arabic language remains in its infancy, but the Arabic social media have been contributing to a large amount of data because of the significant events in the Middle East. The present research aims to detect trends in the Arabic social media. However, this research must overcome several issues such as processing of Arabic user-generated content and lack of resources. To solve these issues, this research presents a voting combination clustering approach, which is divided into six phases, namely, dataset collection from Twitter, text pre-processing, spam filtering using Naïve Bayes, feature selection based on term frequency–inverse document frequency and entropy, statistical analyses, and evaluation. Three statistical approaches for clustering are used, namely, co-occurrence, k-means, and voting combination. The analyses are performed to classify the trends into three categories, namely, Arabic nationality events, personal events, and other events. Experimental results indicate that the voting combination clustering achieved 93%, 87%, and 90% for precision, recall, and f-measure in trend detection, respectively. Finally, trend detection of events is important to companies, governments, national security agencies, and journalists to develop strategies to rectify them.

AB - The amount of information has been increasing tremendously, especially with the use of social media applications, such as Twitter, Facebook, and YouTube. Twitter is a common social application that enables users to share their current thoughts and actions, comment on breaking news, and engage in discussions. Trends are typically driven by emerging events, breaking news, and general topics that attract the attention of a large fraction of Twitter users. Thus, trend detection is highly valuable to news reporters and analysts because they may point to fast-evolving news stories. Researchers have been attempting to detect trends using machine-learning techniques, such as clustering method based on major languages (e.g., English, German, and French). The Arabic language remains in its infancy, but the Arabic social media have been contributing to a large amount of data because of the significant events in the Middle East. The present research aims to detect trends in the Arabic social media. However, this research must overcome several issues such as processing of Arabic user-generated content and lack of resources. To solve these issues, this research presents a voting combination clustering approach, which is divided into six phases, namely, dataset collection from Twitter, text pre-processing, spam filtering using Naïve Bayes, feature selection based on term frequency–inverse document frequency and entropy, statistical analyses, and evaluation. Three statistical approaches for clustering are used, namely, co-occurrence, k-means, and voting combination. The analyses are performed to classify the trends into three categories, namely, Arabic nationality events, personal events, and other events. Experimental results indicate that the voting combination clustering achieved 93%, 87%, and 90% for precision, recall, and f-measure in trend detection, respectively. Finally, trend detection of events is important to companies, governments, national security agencies, and journalists to develop strategies to rectify them.

KW - Arabic social media

KW - K-means and voting combination

KW - Term clustering

KW - Trend detection

UR - http://www.scopus.com/inward/record.url?scp=84948954886&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84948954886&partnerID=8YFLogxK

M3 - Article

VL - 81

SP - 432

EP - 443

JO - Journal of Theoretical and Applied Information Technology

JF - Journal of Theoretical and Applied Information Technology

SN - 1992-8645

IS - 3

ER -