Abstract
The amount of information has been increasing tremendously, especially with the use of social media applications, such as Twitter, Facebook, and YouTube. Twitter is a common social application that enables users to share their current thoughts and actions, comment on breaking news, and engage in discussions. Trends are typically driven by emerging events, breaking news, and general topics that attract the attention of a large fraction of Twitter users. Thus, trend detection is highly valuable to news reporters and analysts because they may point to fast-evolving news stories. Researchers have been attempting to detect trends using machine-learning techniques, such as clustering method based on major languages (e.g., English, German, and French). The Arabic language remains in its infancy, but the Arabic social media have been contributing to a large amount of data because of the significant events in the Middle East. The present research aims to detect trends in the Arabic social media. However, this research must overcome several issues such as processing of Arabic user-generated content and lack of resources. To solve these issues, this research presents a voting combination clustering approach, which is divided into six phases, namely, dataset collection from Twitter, text pre-processing, spam filtering using Naïve Bayes, feature selection based on term frequency–inverse document frequency and entropy, statistical analyses, and evaluation. Three statistical approaches for clustering are used, namely, co-occurrence, k-means, and voting combination. The analyses are performed to classify the trends into three categories, namely, Arabic nationality events, personal events, and other events. Experimental results indicate that the voting combination clustering achieved 93%, 87%, and 90% for precision, recall, and f-measure in trend detection, respectively. Finally, trend detection of events is important to companies, governments, national security agencies, and journalists to develop strategies to rectify them.
Original language | English |
---|---|
Pages (from-to) | 432-443 |
Number of pages | 12 |
Journal | Journal of Theoretical and Applied Information Technology |
Volume | 81 |
Issue number | 3 |
Publication status | Published - 30 Nov 2015 |
Fingerprint
Keywords
- Arabic social media
- K-means and voting combination
- Term clustering
- Trend detection
ASJC Scopus subject areas
- Computer Science(all)
- Theoretical Computer Science
Cite this
Trend detection in the arabic social media using voting combination. / Abdulameer, Ali Sabah; Saad, Saidah; Zakaria, Lailatul Qadri.
In: Journal of Theoretical and Applied Information Technology, Vol. 81, No. 3, 30.11.2015, p. 432-443.Research output: Contribution to journal › Article
}
TY - JOUR
T1 - Trend detection in the arabic social media using voting combination
AU - Abdulameer, Ali Sabah
AU - Saad, Saidah
AU - Zakaria, Lailatul Qadri
PY - 2015/11/30
Y1 - 2015/11/30
N2 - The amount of information has been increasing tremendously, especially with the use of social media applications, such as Twitter, Facebook, and YouTube. Twitter is a common social application that enables users to share their current thoughts and actions, comment on breaking news, and engage in discussions. Trends are typically driven by emerging events, breaking news, and general topics that attract the attention of a large fraction of Twitter users. Thus, trend detection is highly valuable to news reporters and analysts because they may point to fast-evolving news stories. Researchers have been attempting to detect trends using machine-learning techniques, such as clustering method based on major languages (e.g., English, German, and French). The Arabic language remains in its infancy, but the Arabic social media have been contributing to a large amount of data because of the significant events in the Middle East. The present research aims to detect trends in the Arabic social media. However, this research must overcome several issues such as processing of Arabic user-generated content and lack of resources. To solve these issues, this research presents a voting combination clustering approach, which is divided into six phases, namely, dataset collection from Twitter, text pre-processing, spam filtering using Naïve Bayes, feature selection based on term frequency–inverse document frequency and entropy, statistical analyses, and evaluation. Three statistical approaches for clustering are used, namely, co-occurrence, k-means, and voting combination. The analyses are performed to classify the trends into three categories, namely, Arabic nationality events, personal events, and other events. Experimental results indicate that the voting combination clustering achieved 93%, 87%, and 90% for precision, recall, and f-measure in trend detection, respectively. Finally, trend detection of events is important to companies, governments, national security agencies, and journalists to develop strategies to rectify them.
AB - The amount of information has been increasing tremendously, especially with the use of social media applications, such as Twitter, Facebook, and YouTube. Twitter is a common social application that enables users to share their current thoughts and actions, comment on breaking news, and engage in discussions. Trends are typically driven by emerging events, breaking news, and general topics that attract the attention of a large fraction of Twitter users. Thus, trend detection is highly valuable to news reporters and analysts because they may point to fast-evolving news stories. Researchers have been attempting to detect trends using machine-learning techniques, such as clustering method based on major languages (e.g., English, German, and French). The Arabic language remains in its infancy, but the Arabic social media have been contributing to a large amount of data because of the significant events in the Middle East. The present research aims to detect trends in the Arabic social media. However, this research must overcome several issues such as processing of Arabic user-generated content and lack of resources. To solve these issues, this research presents a voting combination clustering approach, which is divided into six phases, namely, dataset collection from Twitter, text pre-processing, spam filtering using Naïve Bayes, feature selection based on term frequency–inverse document frequency and entropy, statistical analyses, and evaluation. Three statistical approaches for clustering are used, namely, co-occurrence, k-means, and voting combination. The analyses are performed to classify the trends into three categories, namely, Arabic nationality events, personal events, and other events. Experimental results indicate that the voting combination clustering achieved 93%, 87%, and 90% for precision, recall, and f-measure in trend detection, respectively. Finally, trend detection of events is important to companies, governments, national security agencies, and journalists to develop strategies to rectify them.
KW - Arabic social media
KW - K-means and voting combination
KW - Term clustering
KW - Trend detection
UR - http://www.scopus.com/inward/record.url?scp=84948954886&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84948954886&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:84948954886
VL - 81
SP - 432
EP - 443
JO - Journal of Theoretical and Applied Information Technology
JF - Journal of Theoretical and Applied Information Technology
SN - 1992-8645
IS - 3
ER -