The fitness-rough

A new attribute reduction method based on statistical and rough set theory

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Attribute reduction has become an important pre-processing task to reduce the complexity of the data mining task. Rough reducts, statistical methods and correlation-based methods have gradually contributed towards improving attribute reduction techniques to a certain extent. Statistical methods are generally lower in computational complexity compared to the rough reducts and the correlation-based methods, but many have proven that the rough reducts method is significant in reducing important attributes without causing too much information loss. Correlation-based methods on the other hand evaluate features as a subset instead of individual attribute. In this paper, we propose a combination of statistical and rough set methods to reduce important attributes in a simpler way while maintaining a lesser degree of information loss from the raw data. The fitness-rough method (FsR) indicates important attributes from raw data and it is further simplified to a more compact information table. Besides that, we have also looked into the problem of information loss in this method. Ten UCI machine learning datasets were used as testing sets on the proposed method as compared to the classical rough reducts (RR) method, the statistical entropy (ENT) method and the correlation-based feature selection (CFS) method. Experimental results show that our method has performed comparatively well with higher reduction strength and smaller rules set against the benchmarking methods, especially in medium size datasets. However, the FsR method is basically less efficient when used on mix-mode and nominal datasets as the non-quantitative attributes involved in these datasets are normally pre-categorised.

Original languageEnglish
Pages (from-to)73-87
Number of pages15
JournalIntelligent Data Analysis
Volume12
Issue number1
Publication statusPublished - 2008

Fingerprint

Attribute Reduction
Rough set theory
Rough Set Theory
Reduction Method
Fitness
Rough
Statistical methods
Benchmarking
Reduct
Data mining
Learning systems
Feature extraction
Computational complexity
Entropy
Attribute
Information Loss
Statistical method
Testing
Processing
Entropy Method

Keywords

  • Attribute reduction
  • Fitness degree
  • Heuristic rule
  • Information loss
  • Rough reducts

ASJC Scopus subject areas

  • Artificial Intelligence
  • Theoretical Computer Science
  • Computer Vision and Pattern Recognition

Cite this

@article{c5f22962d3e34e18ad1e7678c376804a,
title = "The fitness-rough: A new attribute reduction method based on statistical and rough set theory",
abstract = "Attribute reduction has become an important pre-processing task to reduce the complexity of the data mining task. Rough reducts, statistical methods and correlation-based methods have gradually contributed towards improving attribute reduction techniques to a certain extent. Statistical methods are generally lower in computational complexity compared to the rough reducts and the correlation-based methods, but many have proven that the rough reducts method is significant in reducing important attributes without causing too much information loss. Correlation-based methods on the other hand evaluate features as a subset instead of individual attribute. In this paper, we propose a combination of statistical and rough set methods to reduce important attributes in a simpler way while maintaining a lesser degree of information loss from the raw data. The fitness-rough method (FsR) indicates important attributes from raw data and it is further simplified to a more compact information table. Besides that, we have also looked into the problem of information loss in this method. Ten UCI machine learning datasets were used as testing sets on the proposed method as compared to the classical rough reducts (RR) method, the statistical entropy (ENT) method and the correlation-based feature selection (CFS) method. Experimental results show that our method has performed comparatively well with higher reduction strength and smaller rules set against the benchmarking methods, especially in medium size datasets. However, the FsR method is basically less efficient when used on mix-mode and nominal datasets as the non-quantitative attributes involved in these datasets are normally pre-categorised.",
keywords = "Attribute reduction, Fitness degree, Heuristic rule, Information loss, Rough reducts",
author = "Choo, {Yun Huoy} and {Abu Bakar}, Azuraliza and Hamdan, {Abdul Razak}",
year = "2008",
language = "English",
volume = "12",
pages = "73--87",
journal = "Intelligent Data Analysis",
issn = "1088-467X",
publisher = "IOS Press",
number = "1",

}

TY - JOUR

T1 - The fitness-rough

T2 - A new attribute reduction method based on statistical and rough set theory

AU - Choo, Yun Huoy

AU - Abu Bakar, Azuraliza

AU - Hamdan, Abdul Razak

PY - 2008

Y1 - 2008

N2 - Attribute reduction has become an important pre-processing task to reduce the complexity of the data mining task. Rough reducts, statistical methods and correlation-based methods have gradually contributed towards improving attribute reduction techniques to a certain extent. Statistical methods are generally lower in computational complexity compared to the rough reducts and the correlation-based methods, but many have proven that the rough reducts method is significant in reducing important attributes without causing too much information loss. Correlation-based methods on the other hand evaluate features as a subset instead of individual attribute. In this paper, we propose a combination of statistical and rough set methods to reduce important attributes in a simpler way while maintaining a lesser degree of information loss from the raw data. The fitness-rough method (FsR) indicates important attributes from raw data and it is further simplified to a more compact information table. Besides that, we have also looked into the problem of information loss in this method. Ten UCI machine learning datasets were used as testing sets on the proposed method as compared to the classical rough reducts (RR) method, the statistical entropy (ENT) method and the correlation-based feature selection (CFS) method. Experimental results show that our method has performed comparatively well with higher reduction strength and smaller rules set against the benchmarking methods, especially in medium size datasets. However, the FsR method is basically less efficient when used on mix-mode and nominal datasets as the non-quantitative attributes involved in these datasets are normally pre-categorised.

AB - Attribute reduction has become an important pre-processing task to reduce the complexity of the data mining task. Rough reducts, statistical methods and correlation-based methods have gradually contributed towards improving attribute reduction techniques to a certain extent. Statistical methods are generally lower in computational complexity compared to the rough reducts and the correlation-based methods, but many have proven that the rough reducts method is significant in reducing important attributes without causing too much information loss. Correlation-based methods on the other hand evaluate features as a subset instead of individual attribute. In this paper, we propose a combination of statistical and rough set methods to reduce important attributes in a simpler way while maintaining a lesser degree of information loss from the raw data. The fitness-rough method (FsR) indicates important attributes from raw data and it is further simplified to a more compact information table. Besides that, we have also looked into the problem of information loss in this method. Ten UCI machine learning datasets were used as testing sets on the proposed method as compared to the classical rough reducts (RR) method, the statistical entropy (ENT) method and the correlation-based feature selection (CFS) method. Experimental results show that our method has performed comparatively well with higher reduction strength and smaller rules set against the benchmarking methods, especially in medium size datasets. However, the FsR method is basically less efficient when used on mix-mode and nominal datasets as the non-quantitative attributes involved in these datasets are normally pre-categorised.

KW - Attribute reduction

KW - Fitness degree

KW - Heuristic rule

KW - Information loss

KW - Rough reducts

UR - http://www.scopus.com/inward/record.url?scp=51849084418&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=51849084418&partnerID=8YFLogxK

M3 - Article

VL - 12

SP - 73

EP - 87

JO - Intelligent Data Analysis

JF - Intelligent Data Analysis

SN - 1088-467X

IS - 1

ER -