Review of the effect of feature selection for microarray data on the classification accuracy for cancer data sets

Naeimeh Elkhani, Ravie Chandren Muniyandi

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

DNA microarrays can be used to monitor the expression level of thousands of genes simultaneously and gene microarray data can be used in cancer diagnosis and classification. Many machine learning techniques have been developed for computational analyses of microarray data. A common difficulty for all techniques is the large number of genes compared to the small sample size which has a negative impact on their speed and accuracy. To overcome these limitations, feature selection techniques are applied to distinguish between significant and redundant or irrelevant genes. Feature selection methods are used for two main goals. The first is to identify the relationship between specific diseases and genes. The second is to examine a compact set of discriminative genes to develop a pattern classifier with good generalizability and limited complexity. Here, we review different feature selection methods for cancer microarray data sets and analyze their accuracy. We describe methods commonly used for selecting significant features including filters, wrappers and embedded methods, categorized according to their experimental methodology. We then compare the classification accuracy of the methods for various cancer data sets and their time complexity to make some suggestions regarding the use of suitable methods for cancer data sets.

Original languageEnglish
Pages (from-to)334-342
Number of pages9
JournalInternational Journal of Soft Computing
Volume11
Issue number5
DOIs
Publication statusPublished - 2016

Fingerprint

Microarrays
Microarray Data
Feature Selection
Feature extraction
Cancer
Genes
Gene
DNA Microarray
Wrapper
Small Sample Size
Compact Set
Time Complexity
Learning systems
Review
Machine Learning
Monitor
DNA
Classifiers
Classifier
Filter

Keywords

  • Classification accuracy
  • Experimental
  • Feature selection methods
  • Microarray cancer data sets
  • Wrappers

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Software
  • Modelling and Simulation

Cite this

@article{72e1011c66ab4cefbf90c139d315a4b0,
title = "Review of the effect of feature selection for microarray data on the classification accuracy for cancer data sets",
abstract = "DNA microarrays can be used to monitor the expression level of thousands of genes simultaneously and gene microarray data can be used in cancer diagnosis and classification. Many machine learning techniques have been developed for computational analyses of microarray data. A common difficulty for all techniques is the large number of genes compared to the small sample size which has a negative impact on their speed and accuracy. To overcome these limitations, feature selection techniques are applied to distinguish between significant and redundant or irrelevant genes. Feature selection methods are used for two main goals. The first is to identify the relationship between specific diseases and genes. The second is to examine a compact set of discriminative genes to develop a pattern classifier with good generalizability and limited complexity. Here, we review different feature selection methods for cancer microarray data sets and analyze their accuracy. We describe methods commonly used for selecting significant features including filters, wrappers and embedded methods, categorized according to their experimental methodology. We then compare the classification accuracy of the methods for various cancer data sets and their time complexity to make some suggestions regarding the use of suitable methods for cancer data sets.",
keywords = "Classification accuracy, Experimental, Feature selection methods, Microarray cancer data sets, Wrappers",
author = "Naeimeh Elkhani and Muniyandi, {Ravie Chandren}",
year = "2016",
doi = "10.3923/ijscomp.2016.334.342",
language = "English",
volume = "11",
pages = "334--342",
journal = "International Journal of Soft Computing",
issn = "1816-9503",
publisher = "Medwell Publishing",
number = "5",

}

TY - JOUR

T1 - Review of the effect of feature selection for microarray data on the classification accuracy for cancer data sets

AU - Elkhani, Naeimeh

AU - Muniyandi, Ravie Chandren

PY - 2016

Y1 - 2016

N2 - DNA microarrays can be used to monitor the expression level of thousands of genes simultaneously and gene microarray data can be used in cancer diagnosis and classification. Many machine learning techniques have been developed for computational analyses of microarray data. A common difficulty for all techniques is the large number of genes compared to the small sample size which has a negative impact on their speed and accuracy. To overcome these limitations, feature selection techniques are applied to distinguish between significant and redundant or irrelevant genes. Feature selection methods are used for two main goals. The first is to identify the relationship between specific diseases and genes. The second is to examine a compact set of discriminative genes to develop a pattern classifier with good generalizability and limited complexity. Here, we review different feature selection methods for cancer microarray data sets and analyze their accuracy. We describe methods commonly used for selecting significant features including filters, wrappers and embedded methods, categorized according to their experimental methodology. We then compare the classification accuracy of the methods for various cancer data sets and their time complexity to make some suggestions regarding the use of suitable methods for cancer data sets.

AB - DNA microarrays can be used to monitor the expression level of thousands of genes simultaneously and gene microarray data can be used in cancer diagnosis and classification. Many machine learning techniques have been developed for computational analyses of microarray data. A common difficulty for all techniques is the large number of genes compared to the small sample size which has a negative impact on their speed and accuracy. To overcome these limitations, feature selection techniques are applied to distinguish between significant and redundant or irrelevant genes. Feature selection methods are used for two main goals. The first is to identify the relationship between specific diseases and genes. The second is to examine a compact set of discriminative genes to develop a pattern classifier with good generalizability and limited complexity. Here, we review different feature selection methods for cancer microarray data sets and analyze their accuracy. We describe methods commonly used for selecting significant features including filters, wrappers and embedded methods, categorized according to their experimental methodology. We then compare the classification accuracy of the methods for various cancer data sets and their time complexity to make some suggestions regarding the use of suitable methods for cancer data sets.

KW - Classification accuracy

KW - Experimental

KW - Feature selection methods

KW - Microarray cancer data sets

KW - Wrappers

UR - http://www.scopus.com/inward/record.url?scp=85011419036&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85011419036&partnerID=8YFLogxK

U2 - 10.3923/ijscomp.2016.334.342

DO - 10.3923/ijscomp.2016.334.342

M3 - Article

AN - SCOPUS:85011419036

VL - 11

SP - 334

EP - 342

JO - International Journal of Soft Computing

JF - International Journal of Soft Computing

SN - 1816-9503

IS - 5

ER -