Genetic algorithm rule based categorization method for textual data mining

Mohammed H. Afif, Abdullah Saeed Ghareb, Abdulgbar Saif, Azuraliza Abu Bakar, Omer Bazighifan

Research output: Contribution to journalArticle

Abstract

The rule based categorization approaches such as associative classification have the capability to produce classifiers rival to those learned by traditional categorization approaches such as Naïve Bayes and K-nearest Neighbor. However, the lack of useful discovery and usage of categorization rules are the major challenges of rule based approaches and their performance is declined with large set of rules. Genetic Algorithm (GA) is effective to reduce the high dimensionality and improve categorization performance. However, the usage of GA in most researches is limited in the categorization preprocessing stage and its results is used to simplify the categorization process performed by other categorization algorithms. This paper proposed a hybrid GA rule based categorization method, named genetic algorithm rule based categorization (GARC), to enhance the accuracy of categorization rules and to produce accurate classifier for text mining. The GARC consists of three main stages; namely, search space determination, rule discovery with validation (rule generation), and categorization. The experimental results are carried out on three Arabic text datasets with multiple categories to evaluate the efficiency of GARC. The results show that a promising performance was achieved by using GARC for Arabic text categorization. The GARC achieves the best performance with small feature space in most situations.

Original languageEnglish
Pages (from-to)37-50
Number of pages14
JournalDecision Science Letters
Volume9
Issue number1
DOIs
Publication statusPublished - Jan 2020

Fingerprint

Text categorization
Rule-based
Genetic algorithm
Data mining
Classifier
Text mining
Canada
K-nearest neighbor
Dimensionality
Hybrid genetic algorithm

Keywords

  • Associative classification
  • Categorization rule
  • Genetic Algorithm
  • Rule based categorization
  • Rule discovery
  • Text categorization

ASJC Scopus subject areas

  • Decision Sciences(all)

Cite this

Genetic algorithm rule based categorization method for textual data mining. / Afif, Mohammed H.; Ghareb, Abdullah Saeed; Saif, Abdulgbar; Bakar, Azuraliza Abu; Bazighifan, Omer.

In: Decision Science Letters, Vol. 9, No. 1, 01.2020, p. 37-50.

Research output: Contribution to journalArticle

Afif, Mohammed H. ; Ghareb, Abdullah Saeed ; Saif, Abdulgbar ; Bakar, Azuraliza Abu ; Bazighifan, Omer. / Genetic algorithm rule based categorization method for textual data mining. In: Decision Science Letters. 2020 ; Vol. 9, No. 1. pp. 37-50.
@article{f25b543e8e6b4107a443eea5d4b21d74,
title = "Genetic algorithm rule based categorization method for textual data mining",
abstract = "The rule based categorization approaches such as associative classification have the capability to produce classifiers rival to those learned by traditional categorization approaches such as Na{\"i}ve Bayes and K-nearest Neighbor. However, the lack of useful discovery and usage of categorization rules are the major challenges of rule based approaches and their performance is declined with large set of rules. Genetic Algorithm (GA) is effective to reduce the high dimensionality and improve categorization performance. However, the usage of GA in most researches is limited in the categorization preprocessing stage and its results is used to simplify the categorization process performed by other categorization algorithms. This paper proposed a hybrid GA rule based categorization method, named genetic algorithm rule based categorization (GARC), to enhance the accuracy of categorization rules and to produce accurate classifier for text mining. The GARC consists of three main stages; namely, search space determination, rule discovery with validation (rule generation), and categorization. The experimental results are carried out on three Arabic text datasets with multiple categories to evaluate the efficiency of GARC. The results show that a promising performance was achieved by using GARC for Arabic text categorization. The GARC achieves the best performance with small feature space in most situations.",
keywords = "Associative classification, Categorization rule, Genetic Algorithm, Rule based categorization, Rule discovery, Text categorization",
author = "Afif, {Mohammed H.} and Ghareb, {Abdullah Saeed} and Abdulgbar Saif and Bakar, {Azuraliza Abu} and Omer Bazighifan",
year = "2020",
month = "1",
doi = "10.5267/j.dsl.2019.8.003",
language = "English",
volume = "9",
pages = "37--50",
journal = "Decision Science Letters",
issn = "1929-5804",
publisher = "Growing Science",
number = "1",

}

TY - JOUR

T1 - Genetic algorithm rule based categorization method for textual data mining

AU - Afif, Mohammed H.

AU - Ghareb, Abdullah Saeed

AU - Saif, Abdulgbar

AU - Bakar, Azuraliza Abu

AU - Bazighifan, Omer

PY - 2020/1

Y1 - 2020/1

N2 - The rule based categorization approaches such as associative classification have the capability to produce classifiers rival to those learned by traditional categorization approaches such as Naïve Bayes and K-nearest Neighbor. However, the lack of useful discovery and usage of categorization rules are the major challenges of rule based approaches and their performance is declined with large set of rules. Genetic Algorithm (GA) is effective to reduce the high dimensionality and improve categorization performance. However, the usage of GA in most researches is limited in the categorization preprocessing stage and its results is used to simplify the categorization process performed by other categorization algorithms. This paper proposed a hybrid GA rule based categorization method, named genetic algorithm rule based categorization (GARC), to enhance the accuracy of categorization rules and to produce accurate classifier for text mining. The GARC consists of three main stages; namely, search space determination, rule discovery with validation (rule generation), and categorization. The experimental results are carried out on three Arabic text datasets with multiple categories to evaluate the efficiency of GARC. The results show that a promising performance was achieved by using GARC for Arabic text categorization. The GARC achieves the best performance with small feature space in most situations.

AB - The rule based categorization approaches such as associative classification have the capability to produce classifiers rival to those learned by traditional categorization approaches such as Naïve Bayes and K-nearest Neighbor. However, the lack of useful discovery and usage of categorization rules are the major challenges of rule based approaches and their performance is declined with large set of rules. Genetic Algorithm (GA) is effective to reduce the high dimensionality and improve categorization performance. However, the usage of GA in most researches is limited in the categorization preprocessing stage and its results is used to simplify the categorization process performed by other categorization algorithms. This paper proposed a hybrid GA rule based categorization method, named genetic algorithm rule based categorization (GARC), to enhance the accuracy of categorization rules and to produce accurate classifier for text mining. The GARC consists of three main stages; namely, search space determination, rule discovery with validation (rule generation), and categorization. The experimental results are carried out on three Arabic text datasets with multiple categories to evaluate the efficiency of GARC. The results show that a promising performance was achieved by using GARC for Arabic text categorization. The GARC achieves the best performance with small feature space in most situations.

KW - Associative classification

KW - Categorization rule

KW - Genetic Algorithm

KW - Rule based categorization

KW - Rule discovery

KW - Text categorization

UR - http://www.scopus.com/inward/record.url?scp=85073453289&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85073453289&partnerID=8YFLogxK

U2 - 10.5267/j.dsl.2019.8.003

DO - 10.5267/j.dsl.2019.8.003

M3 - Article

AN - SCOPUS:85073453289

VL - 9

SP - 37

EP - 50

JO - Decision Science Letters

JF - Decision Science Letters

SN - 1929-5804

IS - 1

ER -