Feature clustering for pso-based feature construction on high-dimensional data

Idheba Mohamad Ali Omer Swesi, Azuraliza Abu Bakar

Research output: Contribution to journalArticle

Abstract

Feature construction (FC) refers to a process that uses the original features to construct new features with better discrimination ability. Particle Swarm Optimisation (PSO) is an effective search technique that has been successfully utilised in FC. However, the application of PSO for feature construction using high dimensional data has been a challenge due to its large search space and high computational cost. Moreover, unnecessary features that were irrelevant, redundant and contained noise were constructed when PSO was applied to the whole feature. Therefore, the main purpose of this paper is to select the most informative features and construct new features from the selected features for a better classification performance. The feature clustering methods were used to aggregate similar features into clusters, whereby the dimensionality of the data was lowered by choosing representative features from every cluster to form the final feature subset. The clustering of each features are proven to be accurate in feature selection (FS), however, only one study investigated its application in FC for classification. The study identified some limitations, such as the implementation of only two binary classes and the decreasing accuracy of the data. This paper proposes a cluster based PSO feature construction approach called ClusPSOFC. The Redundancy-Based Feature Clustering (RFC) algorithm was applied to choose the most informative features from the original data, while PSO was used to construct new features from those selected by RFC. Experimental results were obtained by using six UCI data sets and six high-dimensional data to demonstrate the efficiency of the proposed method when compared to the original full features, other PSO based FC methods, and standard genetic programming based feature construction (GPFC). Hence, the ClusPSOFC method is effective for feature construction in the classification of high dimensional data.

Original languageEnglish
Pages (from-to)439-472
Number of pages34
JournalJournal of Information and Communication Technology
Volume18
Issue number4
Publication statusPublished - 1 Oct 2019

Fingerprint

High-dimensional Data
Clustering
Particle swarm optimization (PSO)
Particle Swarm Optimization
Redundancy
Genetic programming
Genetic Programming
Clustering Methods
Clustering algorithms
Feature Selection
Search Space
Discrimination
Clustering Algorithm
Dimensionality
Feature extraction
Computational Cost
Choose
Binary
Subset
Experimental Results

Keywords

  • Classification
  • Feature construction
  • Genetic programming
  • High- dimensional data
  • Particle swarm optimisation

ASJC Scopus subject areas

  • Computer Science(all)
  • Mathematics(all)

Cite this

Feature clustering for pso-based feature construction on high-dimensional data. / Swesi, Idheba Mohamad Ali Omer; Bakar, Azuraliza Abu.

In: Journal of Information and Communication Technology, Vol. 18, No. 4, 01.10.2019, p. 439-472.

Research output: Contribution to journalArticle

@article{67d2a11e1e1d4f74bbb8fb0e94487ca2,
title = "Feature clustering for pso-based feature construction on high-dimensional data",
abstract = "Feature construction (FC) refers to a process that uses the original features to construct new features with better discrimination ability. Particle Swarm Optimisation (PSO) is an effective search technique that has been successfully utilised in FC. However, the application of PSO for feature construction using high dimensional data has been a challenge due to its large search space and high computational cost. Moreover, unnecessary features that were irrelevant, redundant and contained noise were constructed when PSO was applied to the whole feature. Therefore, the main purpose of this paper is to select the most informative features and construct new features from the selected features for a better classification performance. The feature clustering methods were used to aggregate similar features into clusters, whereby the dimensionality of the data was lowered by choosing representative features from every cluster to form the final feature subset. The clustering of each features are proven to be accurate in feature selection (FS), however, only one study investigated its application in FC for classification. The study identified some limitations, such as the implementation of only two binary classes and the decreasing accuracy of the data. This paper proposes a cluster based PSO feature construction approach called ClusPSOFC. The Redundancy-Based Feature Clustering (RFC) algorithm was applied to choose the most informative features from the original data, while PSO was used to construct new features from those selected by RFC. Experimental results were obtained by using six UCI data sets and six high-dimensional data to demonstrate the efficiency of the proposed method when compared to the original full features, other PSO based FC methods, and standard genetic programming based feature construction (GPFC). Hence, the ClusPSOFC method is effective for feature construction in the classification of high dimensional data.",
keywords = "Classification, Feature construction, Genetic programming, High- dimensional data, Particle swarm optimisation",
author = "Swesi, {Idheba Mohamad Ali Omer} and Bakar, {Azuraliza Abu}",
year = "2019",
month = "10",
day = "1",
language = "English",
volume = "18",
pages = "439--472",
journal = "Journal of Information and Communication Technology",
issn = "1675-414X",
publisher = "Universiti Utara Malaysia Press",
number = "4",

}

TY - JOUR

T1 - Feature clustering for pso-based feature construction on high-dimensional data

AU - Swesi, Idheba Mohamad Ali Omer

AU - Bakar, Azuraliza Abu

PY - 2019/10/1

Y1 - 2019/10/1

N2 - Feature construction (FC) refers to a process that uses the original features to construct new features with better discrimination ability. Particle Swarm Optimisation (PSO) is an effective search technique that has been successfully utilised in FC. However, the application of PSO for feature construction using high dimensional data has been a challenge due to its large search space and high computational cost. Moreover, unnecessary features that were irrelevant, redundant and contained noise were constructed when PSO was applied to the whole feature. Therefore, the main purpose of this paper is to select the most informative features and construct new features from the selected features for a better classification performance. The feature clustering methods were used to aggregate similar features into clusters, whereby the dimensionality of the data was lowered by choosing representative features from every cluster to form the final feature subset. The clustering of each features are proven to be accurate in feature selection (FS), however, only one study investigated its application in FC for classification. The study identified some limitations, such as the implementation of only two binary classes and the decreasing accuracy of the data. This paper proposes a cluster based PSO feature construction approach called ClusPSOFC. The Redundancy-Based Feature Clustering (RFC) algorithm was applied to choose the most informative features from the original data, while PSO was used to construct new features from those selected by RFC. Experimental results were obtained by using six UCI data sets and six high-dimensional data to demonstrate the efficiency of the proposed method when compared to the original full features, other PSO based FC methods, and standard genetic programming based feature construction (GPFC). Hence, the ClusPSOFC method is effective for feature construction in the classification of high dimensional data.

AB - Feature construction (FC) refers to a process that uses the original features to construct new features with better discrimination ability. Particle Swarm Optimisation (PSO) is an effective search technique that has been successfully utilised in FC. However, the application of PSO for feature construction using high dimensional data has been a challenge due to its large search space and high computational cost. Moreover, unnecessary features that were irrelevant, redundant and contained noise were constructed when PSO was applied to the whole feature. Therefore, the main purpose of this paper is to select the most informative features and construct new features from the selected features for a better classification performance. The feature clustering methods were used to aggregate similar features into clusters, whereby the dimensionality of the data was lowered by choosing representative features from every cluster to form the final feature subset. The clustering of each features are proven to be accurate in feature selection (FS), however, only one study investigated its application in FC for classification. The study identified some limitations, such as the implementation of only two binary classes and the decreasing accuracy of the data. This paper proposes a cluster based PSO feature construction approach called ClusPSOFC. The Redundancy-Based Feature Clustering (RFC) algorithm was applied to choose the most informative features from the original data, while PSO was used to construct new features from those selected by RFC. Experimental results were obtained by using six UCI data sets and six high-dimensional data to demonstrate the efficiency of the proposed method when compared to the original full features, other PSO based FC methods, and standard genetic programming based feature construction (GPFC). Hence, the ClusPSOFC method is effective for feature construction in the classification of high dimensional data.

KW - Classification

KW - Feature construction

KW - Genetic programming

KW - High- dimensional data

KW - Particle swarm optimisation

UR - http://www.scopus.com/inward/record.url?scp=85072801016&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072801016&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:85072801016

VL - 18

SP - 439

EP - 472

JO - Journal of Information and Communication Technology

JF - Journal of Information and Communication Technology

SN - 1675-414X

IS - 4

ER -