Poetry classification using support vector machines

Research output: Contribution to journalArticle

18 Citations (Scopus)

Abstract

Problem statement: Traditional Malay poetry called pantun is a form of art to express ideas, emotions and feelings in the form of rhyming lines. Malay poetry usually has a broad and deep meaning making it difficult to be interpreted. Moreover, few efforts have been done on automatic classification of literary text such as poetry. Approach: This research concerns with the classification of Malay pantun using Support Vector Machines (SVM). The capability of SVM through Radial Basic Function (RBF) and linear kernel function are implemented to classify pantun by theme, as well as poetry or non-poetry. A total of 1500 pantun are divided into 10 themes with 214 Malaysian folklore documents used as the training and testing datasets. We used tfidf for both classification experiments and the shape feature for the classification of poetry and non-poetry experiment alone. Results: The results of each experiment showed that the linear kernel achieved a better percentage of average accuracy compared to the RBF kernel. Conclusion: The results show the potential of SVM technique in classifying poems into various classification of which previous approaches only focused on classifying prose only.

Original languageEnglish
Pages (from-to)1441-1446
Number of pages6
JournalJournal of Computer Science
Volume8
Issue number9
DOIs
Publication statusPublished - 2012

Fingerprint

Support vector machines
Experiments
Testing

Keywords

  • Classify pantun
  • Express ideas
  • Malay poetry
  • Malaysian folklore
  • Radial Basic Function (RBF)
  • Support vector machines
  • Text classification

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications
  • Artificial Intelligence

Cite this

Poetry classification using support vector machines. / Jamal, Noraini; Mohd, Masnizah; Mohd Noah, Shahrul Azman.

In: Journal of Computer Science, Vol. 8, No. 9, 2012, p. 1441-1446.

Research output: Contribution to journalArticle

@article{c58e21b11cff46c492bae1fb389b2936,
title = "Poetry classification using support vector machines",
abstract = "Problem statement: Traditional Malay poetry called pantun is a form of art to express ideas, emotions and feelings in the form of rhyming lines. Malay poetry usually has a broad and deep meaning making it difficult to be interpreted. Moreover, few efforts have been done on automatic classification of literary text such as poetry. Approach: This research concerns with the classification of Malay pantun using Support Vector Machines (SVM). The capability of SVM through Radial Basic Function (RBF) and linear kernel function are implemented to classify pantun by theme, as well as poetry or non-poetry. A total of 1500 pantun are divided into 10 themes with 214 Malaysian folklore documents used as the training and testing datasets. We used tfidf for both classification experiments and the shape feature for the classification of poetry and non-poetry experiment alone. Results: The results of each experiment showed that the linear kernel achieved a better percentage of average accuracy compared to the RBF kernel. Conclusion: The results show the potential of SVM technique in classifying poems into various classification of which previous approaches only focused on classifying prose only.",
keywords = "Classify pantun, Express ideas, Malay poetry, Malaysian folklore, Radial Basic Function (RBF), Support vector machines, Text classification",
author = "Noraini Jamal and Masnizah Mohd and {Mohd Noah}, {Shahrul Azman}",
year = "2012",
doi = "10.3844/jcssp.2012.1441.1446",
language = "English",
volume = "8",
pages = "1441--1446",
journal = "Journal of Computer Science",
issn = "1549-3636",
publisher = "Science Publications",
number = "9",

}

TY - JOUR

T1 - Poetry classification using support vector machines

AU - Jamal, Noraini

AU - Mohd, Masnizah

AU - Mohd Noah, Shahrul Azman

PY - 2012

Y1 - 2012

N2 - Problem statement: Traditional Malay poetry called pantun is a form of art to express ideas, emotions and feelings in the form of rhyming lines. Malay poetry usually has a broad and deep meaning making it difficult to be interpreted. Moreover, few efforts have been done on automatic classification of literary text such as poetry. Approach: This research concerns with the classification of Malay pantun using Support Vector Machines (SVM). The capability of SVM through Radial Basic Function (RBF) and linear kernel function are implemented to classify pantun by theme, as well as poetry or non-poetry. A total of 1500 pantun are divided into 10 themes with 214 Malaysian folklore documents used as the training and testing datasets. We used tfidf for both classification experiments and the shape feature for the classification of poetry and non-poetry experiment alone. Results: The results of each experiment showed that the linear kernel achieved a better percentage of average accuracy compared to the RBF kernel. Conclusion: The results show the potential of SVM technique in classifying poems into various classification of which previous approaches only focused on classifying prose only.

AB - Problem statement: Traditional Malay poetry called pantun is a form of art to express ideas, emotions and feelings in the form of rhyming lines. Malay poetry usually has a broad and deep meaning making it difficult to be interpreted. Moreover, few efforts have been done on automatic classification of literary text such as poetry. Approach: This research concerns with the classification of Malay pantun using Support Vector Machines (SVM). The capability of SVM through Radial Basic Function (RBF) and linear kernel function are implemented to classify pantun by theme, as well as poetry or non-poetry. A total of 1500 pantun are divided into 10 themes with 214 Malaysian folklore documents used as the training and testing datasets. We used tfidf for both classification experiments and the shape feature for the classification of poetry and non-poetry experiment alone. Results: The results of each experiment showed that the linear kernel achieved a better percentage of average accuracy compared to the RBF kernel. Conclusion: The results show the potential of SVM technique in classifying poems into various classification of which previous approaches only focused on classifying prose only.

KW - Classify pantun

KW - Express ideas

KW - Malay poetry

KW - Malaysian folklore

KW - Radial Basic Function (RBF)

KW - Support vector machines

KW - Text classification

UR - http://www.scopus.com/inward/record.url?scp=84866151454&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866151454&partnerID=8YFLogxK

U2 - 10.3844/jcssp.2012.1441.1446

DO - 10.3844/jcssp.2012.1441.1446

M3 - Article

AN - SCOPUS:84866151454

VL - 8

SP - 1441

EP - 1446

JO - Journal of Computer Science

JF - Journal of Computer Science

SN - 1549-3636

IS - 9

ER -