Hybrid feature selection based on enhanced genetic algorithm for text categorization

Abdullah Saeed Ghareb, Azuraliza Abu Bakar, Abdul Razak Hamdan

Research output: Contribution to journalArticle

69 Citations (Scopus)

Abstract

This paper proposes hybrid feature selection approaches based on the Genetic Algorithm (GA). This approach uses a hybrid search technique that combines the advantages of filter feature selection methods with an enhanced GA (EGA) in a wrapper approach to handle the high dimensionality of the feature space and improve categorization performance simultaneously. First, we propose EGA by improving the crossover and mutation operators. The crossover operation is performed based on chromosome (feature subset) partitioning with term and document frequencies of chromosome entries (features), while the mutation is performed based on the classifier performance of the original parents and feature importance. Thus, the crossover and mutation operations are performed based on useful information instead of using probability and random selection. Second, we incorporate six well-known filter feature selection methods with the EGA to create hybrid feature selection approaches. In the hybrid approach, the EGA is applied to several feature subsets of different sizes, which are ranked in decreasing order based on their importance, and dimension reduction is carried out. The EGA operations are applied to the most important features that had the higher ranks. The effectiveness of the proposed approach is evaluated by using naïve Bayes and associative classification on three different collections of Arabic text datasets. The experimental results show the superiority of EGA over GA, comparisons of GA with EGA showed that the latter achieved better results in terms of dimensionality reduction, time and categorization performance. Furthermore, six proposed hybrid FS approaches consisting of a filter method and the EGA are applied to various feature subsets. The results showed that these hybrid approaches are more effective than single filter methods for dimensionality reduction because they were able to produce a higher reduction rate without loss of categorization precision in most situations.

Original languageEnglish
Pages (from-to)31-47
Number of pages17
JournalExpert Systems with Applications
Volume49
DOIs
Publication statusPublished - 1 May 2016

Fingerprint

Feature extraction
Genetic algorithms
Chromosomes
Classifiers

Keywords

  • Enhanced genetic algorithm
  • Filter feature selection
  • Hybrid feature selection
  • Text categorization

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Engineering(all)

Cite this

Hybrid feature selection based on enhanced genetic algorithm for text categorization. / Ghareb, Abdullah Saeed; Abu Bakar, Azuraliza; Hamdan, Abdul Razak.

In: Expert Systems with Applications, Vol. 49, 01.05.2016, p. 31-47.

Research output: Contribution to journalArticle

@article{ee0023f8fafd4f43ad81ee8a08998ea6,
title = "Hybrid feature selection based on enhanced genetic algorithm for text categorization",
abstract = "This paper proposes hybrid feature selection approaches based on the Genetic Algorithm (GA). This approach uses a hybrid search technique that combines the advantages of filter feature selection methods with an enhanced GA (EGA) in a wrapper approach to handle the high dimensionality of the feature space and improve categorization performance simultaneously. First, we propose EGA by improving the crossover and mutation operators. The crossover operation is performed based on chromosome (feature subset) partitioning with term and document frequencies of chromosome entries (features), while the mutation is performed based on the classifier performance of the original parents and feature importance. Thus, the crossover and mutation operations are performed based on useful information instead of using probability and random selection. Second, we incorporate six well-known filter feature selection methods with the EGA to create hybrid feature selection approaches. In the hybrid approach, the EGA is applied to several feature subsets of different sizes, which are ranked in decreasing order based on their importance, and dimension reduction is carried out. The EGA operations are applied to the most important features that had the higher ranks. The effectiveness of the proposed approach is evaluated by using na{\"i}ve Bayes and associative classification on three different collections of Arabic text datasets. The experimental results show the superiority of EGA over GA, comparisons of GA with EGA showed that the latter achieved better results in terms of dimensionality reduction, time and categorization performance. Furthermore, six proposed hybrid FS approaches consisting of a filter method and the EGA are applied to various feature subsets. The results showed that these hybrid approaches are more effective than single filter methods for dimensionality reduction because they were able to produce a higher reduction rate without loss of categorization precision in most situations.",
keywords = "Enhanced genetic algorithm, Filter feature selection, Hybrid feature selection, Text categorization",
author = "Ghareb, {Abdullah Saeed} and {Abu Bakar}, Azuraliza and Hamdan, {Abdul Razak}",
year = "2016",
month = "5",
day = "1",
doi = "10.1016/j.eswa.2015.12.004",
language = "English",
volume = "49",
pages = "31--47",
journal = "Expert Systems with Applications",
issn = "0957-4174",
publisher = "Elsevier Limited",

}

TY - JOUR

T1 - Hybrid feature selection based on enhanced genetic algorithm for text categorization

AU - Ghareb, Abdullah Saeed

AU - Abu Bakar, Azuraliza

AU - Hamdan, Abdul Razak

PY - 2016/5/1

Y1 - 2016/5/1

N2 - This paper proposes hybrid feature selection approaches based on the Genetic Algorithm (GA). This approach uses a hybrid search technique that combines the advantages of filter feature selection methods with an enhanced GA (EGA) in a wrapper approach to handle the high dimensionality of the feature space and improve categorization performance simultaneously. First, we propose EGA by improving the crossover and mutation operators. The crossover operation is performed based on chromosome (feature subset) partitioning with term and document frequencies of chromosome entries (features), while the mutation is performed based on the classifier performance of the original parents and feature importance. Thus, the crossover and mutation operations are performed based on useful information instead of using probability and random selection. Second, we incorporate six well-known filter feature selection methods with the EGA to create hybrid feature selection approaches. In the hybrid approach, the EGA is applied to several feature subsets of different sizes, which are ranked in decreasing order based on their importance, and dimension reduction is carried out. The EGA operations are applied to the most important features that had the higher ranks. The effectiveness of the proposed approach is evaluated by using naïve Bayes and associative classification on three different collections of Arabic text datasets. The experimental results show the superiority of EGA over GA, comparisons of GA with EGA showed that the latter achieved better results in terms of dimensionality reduction, time and categorization performance. Furthermore, six proposed hybrid FS approaches consisting of a filter method and the EGA are applied to various feature subsets. The results showed that these hybrid approaches are more effective than single filter methods for dimensionality reduction because they were able to produce a higher reduction rate without loss of categorization precision in most situations.

AB - This paper proposes hybrid feature selection approaches based on the Genetic Algorithm (GA). This approach uses a hybrid search technique that combines the advantages of filter feature selection methods with an enhanced GA (EGA) in a wrapper approach to handle the high dimensionality of the feature space and improve categorization performance simultaneously. First, we propose EGA by improving the crossover and mutation operators. The crossover operation is performed based on chromosome (feature subset) partitioning with term and document frequencies of chromosome entries (features), while the mutation is performed based on the classifier performance of the original parents and feature importance. Thus, the crossover and mutation operations are performed based on useful information instead of using probability and random selection. Second, we incorporate six well-known filter feature selection methods with the EGA to create hybrid feature selection approaches. In the hybrid approach, the EGA is applied to several feature subsets of different sizes, which are ranked in decreasing order based on their importance, and dimension reduction is carried out. The EGA operations are applied to the most important features that had the higher ranks. The effectiveness of the proposed approach is evaluated by using naïve Bayes and associative classification on three different collections of Arabic text datasets. The experimental results show the superiority of EGA over GA, comparisons of GA with EGA showed that the latter achieved better results in terms of dimensionality reduction, time and categorization performance. Furthermore, six proposed hybrid FS approaches consisting of a filter method and the EGA are applied to various feature subsets. The results showed that these hybrid approaches are more effective than single filter methods for dimensionality reduction because they were able to produce a higher reduction rate without loss of categorization precision in most situations.

KW - Enhanced genetic algorithm

KW - Filter feature selection

KW - Hybrid feature selection

KW - Text categorization

UR - http://www.scopus.com/inward/record.url?scp=84952886629&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84952886629&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2015.12.004

DO - 10.1016/j.eswa.2015.12.004

M3 - Article

AN - SCOPUS:84952886629

VL - 49

SP - 31

EP - 47

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

ER -