A two-step feature selection method for quranic text classification

A. Adeleke, N. A. Samsudin, Z. A. Othman, S. K. Ahmad Khalid

Research output: Contribution to journalArticle

Abstract

Feature selection is an integral phase in text classification problems. It is primarily applied in preprocessing text data prior to labeling. However, there exist some limitations with the FS techniques. The filter-based FS techniques have the drawback of lower accuracy performance while the wrapper-based techniques are highly computationally expensive to process. In this paper, a two-step FS method is presented. In the first step, chisquare (CH) filter-based technique is used to reduce the dimensionality of the feature set and then wrapper correlation-based (CFS) technique is employed in the second step to further select most relevant features from the reduced feature set. Specifically, the ultimate aim is to reduce the computational runtime while achieving high classification accuracy. Subsequently, the proposed method was applied in labeling instances of the input data (Quranic verses) using standard classifiers: naïve bayes (NB), support vector machine (SVM), decision trees (J48). The results report the proposed method achieved accuracy result of 93.6% at 4.17secs.

Original languageEnglish
Pages (from-to)730-736
Number of pages7
JournalIndonesian Journal of Electrical Engineering and Computer Science
Volume16
Issue number2
DOIs
Publication statusPublished - 1 Jan 2019

Fingerprint

Text Classification
Feature Selection
Labeling
Feature extraction
Decision trees
Wrapper
Support vector machines
Classifiers
Bayes Classifier
Filter
Chi-square
Decision tree
Classification Problems
Dimensionality
Preprocessing
Support Vector Machine

Keywords

  • Classifier
  • Feature selection
  • Holy Quran
  • Text classification

ASJC Scopus subject areas

  • Signal Processing
  • Information Systems
  • Hardware and Architecture
  • Computer Networks and Communications
  • Control and Optimization
  • Electrical and Electronic Engineering

Cite this

A two-step feature selection method for quranic text classification. / Adeleke, A.; Samsudin, N. A.; Othman, Z. A.; Ahmad Khalid, S. K.

In: Indonesian Journal of Electrical Engineering and Computer Science, Vol. 16, No. 2, 01.01.2019, p. 730-736.

Research output: Contribution to journalArticle

@article{30e0197c75b74d4593ff0fd807a104d9,
title = "A two-step feature selection method for quranic text classification",
abstract = "Feature selection is an integral phase in text classification problems. It is primarily applied in preprocessing text data prior to labeling. However, there exist some limitations with the FS techniques. The filter-based FS techniques have the drawback of lower accuracy performance while the wrapper-based techniques are highly computationally expensive to process. In this paper, a two-step FS method is presented. In the first step, chisquare (CH) filter-based technique is used to reduce the dimensionality of the feature set and then wrapper correlation-based (CFS) technique is employed in the second step to further select most relevant features from the reduced feature set. Specifically, the ultimate aim is to reduce the computational runtime while achieving high classification accuracy. Subsequently, the proposed method was applied in labeling instances of the input data (Quranic verses) using standard classifiers: na{\"i}ve bayes (NB), support vector machine (SVM), decision trees (J48). The results report the proposed method achieved accuracy result of 93.6{\%} at 4.17secs.",
keywords = "Classifier, Feature selection, Holy Quran, Text classification",
author = "A. Adeleke and Samsudin, {N. A.} and Othman, {Z. A.} and {Ahmad Khalid}, {S. K.}",
year = "2019",
month = "1",
day = "1",
doi = "10.11591/ijeecs.v16.i2.pp730-736",
language = "English",
volume = "16",
pages = "730--736",
journal = "Indonesian Journal of Electrical Engineering and Computer Science",
issn = "2502-4752",
publisher = "Institute of Advanced Engineering and Science (IAES)",
number = "2",

}

TY - JOUR

T1 - A two-step feature selection method for quranic text classification

AU - Adeleke, A.

AU - Samsudin, N. A.

AU - Othman, Z. A.

AU - Ahmad Khalid, S. K.

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Feature selection is an integral phase in text classification problems. It is primarily applied in preprocessing text data prior to labeling. However, there exist some limitations with the FS techniques. The filter-based FS techniques have the drawback of lower accuracy performance while the wrapper-based techniques are highly computationally expensive to process. In this paper, a two-step FS method is presented. In the first step, chisquare (CH) filter-based technique is used to reduce the dimensionality of the feature set and then wrapper correlation-based (CFS) technique is employed in the second step to further select most relevant features from the reduced feature set. Specifically, the ultimate aim is to reduce the computational runtime while achieving high classification accuracy. Subsequently, the proposed method was applied in labeling instances of the input data (Quranic verses) using standard classifiers: naïve bayes (NB), support vector machine (SVM), decision trees (J48). The results report the proposed method achieved accuracy result of 93.6% at 4.17secs.

AB - Feature selection is an integral phase in text classification problems. It is primarily applied in preprocessing text data prior to labeling. However, there exist some limitations with the FS techniques. The filter-based FS techniques have the drawback of lower accuracy performance while the wrapper-based techniques are highly computationally expensive to process. In this paper, a two-step FS method is presented. In the first step, chisquare (CH) filter-based technique is used to reduce the dimensionality of the feature set and then wrapper correlation-based (CFS) technique is employed in the second step to further select most relevant features from the reduced feature set. Specifically, the ultimate aim is to reduce the computational runtime while achieving high classification accuracy. Subsequently, the proposed method was applied in labeling instances of the input data (Quranic verses) using standard classifiers: naïve bayes (NB), support vector machine (SVM), decision trees (J48). The results report the proposed method achieved accuracy result of 93.6% at 4.17secs.

KW - Classifier

KW - Feature selection

KW - Holy Quran

KW - Text classification

UR - http://www.scopus.com/inward/record.url?scp=85073539474&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85073539474&partnerID=8YFLogxK

U2 - 10.11591/ijeecs.v16.i2.pp730-736

DO - 10.11591/ijeecs.v16.i2.pp730-736

M3 - Article

AN - SCOPUS:85073539474

VL - 16

SP - 730

EP - 736

JO - Indonesian Journal of Electrical Engineering and Computer Science

JF - Indonesian Journal of Electrical Engineering and Computer Science

SN - 2502-4752

IS - 2

ER -