Automatic Arabic text summarization using clustering and keyphrase extraction

Hamzah Noori Fejer, Nazlia Omar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

As the number of electronic documents increases rapidly, the need for faster techniques to assess the relevance of these documents emerges. A summary is a concise representation of underlying text. A full understanding of the document is essential to form an ideal summary. However, achieving full understanding is either difficult or impossible for computers. Therefore, selecting important sentences from the original text and presenting these sentences as a summary present the most common techniques in automated text summarization. This paper propose a hybrid clustering method(partitioning and hierarchical) to group many Arabic documents into several clusters. Then keyphrase extraction module is applied to extract important Keyphrases from each cluster, which helps identify the most important sentences and find similar sentences based on several similarity algorithms. It applied to extract one sentence from a group of similar sentences while ignoring the other similar sentences (i.e., sentences that have a greater similarity than the predefined threshold). This model is designed for both single-and multi-document Arabic text summarization. The Recall-Oriented Understudy for Gisting Evaluation (ROGUE) matrix used for the evaluation. For the summarization dataset, Essex Arabic Summaries Corpus was used. It has many topic based articles with multiple human summaries. This model achieved an accuracy of 80 % for single-document and 62% for multi-document summarization.

Original languageEnglish
Title of host publicationConference Proceedings - 6th International Conference on Information Technology and Multimedia at UNITEN: Cultivating Creativity and Enabling Technology Through the Internet of Things, ICIMU 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages293-298
Number of pages6
ISBN (Print)9781479954230
DOIs
Publication statusPublished - 23 Mar 2015
Event6th International Conference on Information Technology and Multimedia, ICIMU 2014 - Putrajaya, Malaysia
Duration: 18 Nov 201420 Nov 2014

Other

Other6th International Conference on Information Technology and Multimedia, ICIMU 2014
CountryMalaysia
CityPutrajaya
Period18/11/1420/11/14

Keywords

  • Clustering
  • Keyphrase Extraction
  • ROUGE Matrix
  • Similarity
  • Text Summarization

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems
  • Software

Cite this

Fejer, H. N., & Omar, N. (2015). Automatic Arabic text summarization using clustering and keyphrase extraction. In Conference Proceedings - 6th International Conference on Information Technology and Multimedia at UNITEN: Cultivating Creativity and Enabling Technology Through the Internet of Things, ICIMU 2014 (pp. 293-298). [7066647] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICIMU.2014.7066647

Automatic Arabic text summarization using clustering and keyphrase extraction. / Fejer, Hamzah Noori; Omar, Nazlia.

Conference Proceedings - 6th International Conference on Information Technology and Multimedia at UNITEN: Cultivating Creativity and Enabling Technology Through the Internet of Things, ICIMU 2014. Institute of Electrical and Electronics Engineers Inc., 2015. p. 293-298 7066647.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Fejer, HN & Omar, N 2015, Automatic Arabic text summarization using clustering and keyphrase extraction. in Conference Proceedings - 6th International Conference on Information Technology and Multimedia at UNITEN: Cultivating Creativity and Enabling Technology Through the Internet of Things, ICIMU 2014., 7066647, Institute of Electrical and Electronics Engineers Inc., pp. 293-298, 6th International Conference on Information Technology and Multimedia, ICIMU 2014, Putrajaya, Malaysia, 18/11/14. https://doi.org/10.1109/ICIMU.2014.7066647
Fejer HN, Omar N. Automatic Arabic text summarization using clustering and keyphrase extraction. In Conference Proceedings - 6th International Conference on Information Technology and Multimedia at UNITEN: Cultivating Creativity and Enabling Technology Through the Internet of Things, ICIMU 2014. Institute of Electrical and Electronics Engineers Inc. 2015. p. 293-298. 7066647 https://doi.org/10.1109/ICIMU.2014.7066647
Fejer, Hamzah Noori ; Omar, Nazlia. / Automatic Arabic text summarization using clustering and keyphrase extraction. Conference Proceedings - 6th International Conference on Information Technology and Multimedia at UNITEN: Cultivating Creativity and Enabling Technology Through the Internet of Things, ICIMU 2014. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 293-298
@inproceedings{e55dc9d0b9534a488ea713ad5480e1b7,
title = "Automatic Arabic text summarization using clustering and keyphrase extraction",
abstract = "As the number of electronic documents increases rapidly, the need for faster techniques to assess the relevance of these documents emerges. A summary is a concise representation of underlying text. A full understanding of the document is essential to form an ideal summary. However, achieving full understanding is either difficult or impossible for computers. Therefore, selecting important sentences from the original text and presenting these sentences as a summary present the most common techniques in automated text summarization. This paper propose a hybrid clustering method(partitioning and hierarchical) to group many Arabic documents into several clusters. Then keyphrase extraction module is applied to extract important Keyphrases from each cluster, which helps identify the most important sentences and find similar sentences based on several similarity algorithms. It applied to extract one sentence from a group of similar sentences while ignoring the other similar sentences (i.e., sentences that have a greater similarity than the predefined threshold). This model is designed for both single-and multi-document Arabic text summarization. The Recall-Oriented Understudy for Gisting Evaluation (ROGUE) matrix used for the evaluation. For the summarization dataset, Essex Arabic Summaries Corpus was used. It has many topic based articles with multiple human summaries. This model achieved an accuracy of 80 {\%} for single-document and 62{\%} for multi-document summarization.",
keywords = "Clustering, Keyphrase Extraction, ROUGE Matrix, Similarity, Text Summarization",
author = "Fejer, {Hamzah Noori} and Nazlia Omar",
year = "2015",
month = "3",
day = "23",
doi = "10.1109/ICIMU.2014.7066647",
language = "English",
isbn = "9781479954230",
pages = "293--298",
booktitle = "Conference Proceedings - 6th International Conference on Information Technology and Multimedia at UNITEN: Cultivating Creativity and Enabling Technology Through the Internet of Things, ICIMU 2014",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Automatic Arabic text summarization using clustering and keyphrase extraction

AU - Fejer, Hamzah Noori

AU - Omar, Nazlia

PY - 2015/3/23

Y1 - 2015/3/23

N2 - As the number of electronic documents increases rapidly, the need for faster techniques to assess the relevance of these documents emerges. A summary is a concise representation of underlying text. A full understanding of the document is essential to form an ideal summary. However, achieving full understanding is either difficult or impossible for computers. Therefore, selecting important sentences from the original text and presenting these sentences as a summary present the most common techniques in automated text summarization. This paper propose a hybrid clustering method(partitioning and hierarchical) to group many Arabic documents into several clusters. Then keyphrase extraction module is applied to extract important Keyphrases from each cluster, which helps identify the most important sentences and find similar sentences based on several similarity algorithms. It applied to extract one sentence from a group of similar sentences while ignoring the other similar sentences (i.e., sentences that have a greater similarity than the predefined threshold). This model is designed for both single-and multi-document Arabic text summarization. The Recall-Oriented Understudy for Gisting Evaluation (ROGUE) matrix used for the evaluation. For the summarization dataset, Essex Arabic Summaries Corpus was used. It has many topic based articles with multiple human summaries. This model achieved an accuracy of 80 % for single-document and 62% for multi-document summarization.

AB - As the number of electronic documents increases rapidly, the need for faster techniques to assess the relevance of these documents emerges. A summary is a concise representation of underlying text. A full understanding of the document is essential to form an ideal summary. However, achieving full understanding is either difficult or impossible for computers. Therefore, selecting important sentences from the original text and presenting these sentences as a summary present the most common techniques in automated text summarization. This paper propose a hybrid clustering method(partitioning and hierarchical) to group many Arabic documents into several clusters. Then keyphrase extraction module is applied to extract important Keyphrases from each cluster, which helps identify the most important sentences and find similar sentences based on several similarity algorithms. It applied to extract one sentence from a group of similar sentences while ignoring the other similar sentences (i.e., sentences that have a greater similarity than the predefined threshold). This model is designed for both single-and multi-document Arabic text summarization. The Recall-Oriented Understudy for Gisting Evaluation (ROGUE) matrix used for the evaluation. For the summarization dataset, Essex Arabic Summaries Corpus was used. It has many topic based articles with multiple human summaries. This model achieved an accuracy of 80 % for single-document and 62% for multi-document summarization.

KW - Clustering

KW - Keyphrase Extraction

KW - ROUGE Matrix

KW - Similarity

KW - Text Summarization

UR - http://www.scopus.com/inward/record.url?scp=84937510517&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84937510517&partnerID=8YFLogxK

U2 - 10.1109/ICIMU.2014.7066647

DO - 10.1109/ICIMU.2014.7066647

M3 - Conference contribution

AN - SCOPUS:84937510517

SN - 9781479954230

SP - 293

EP - 298

BT - Conference Proceedings - 6th International Conference on Information Technology and Multimedia at UNITEN: Cultivating Creativity and Enabling Technology Through the Internet of Things, ICIMU 2014

PB - Institute of Electrical and Electronics Engineers Inc.

ER -