Feature selection on pectin lyase-like enzyme using computational methods

Research output: Contribution to journalArticle

Abstract

The employment of feature selection algorithms (FSAs) prior to classification has become a necessity due to an enormous growth of public sequence databases and the nature of high dimensionality in protein sequences. This paper provides a comparative framework on four multivariate FSAs for finding minimal feature subsets prior to classification of a protein function from a pectin lyase-like superfamily. The comparative studies for these FSAs are based on four criteria: the accuracy, the area under ROC graph (AUC), the selected features, and the modelling time taken. Classification was performed on a reduced feature set using three state-of-the-art machine learning classifiers,Support Vector Machines, Naïve Bayes and Decision Tree, on the dataset with and without FSAs. Our results suggest the importance of FSAs in improving the classification accuracy and reducing the modelling time.

Original languageEnglish
Pages (from-to)3374-3380
Number of pages7
JournalAdvanced Science Letters
Volume19
Issue number11
DOIs
Publication statusPublished - Nov 2013

Fingerprint

Computational methods
Computational Methods
Feature Selection
Feature extraction
Enzymes
enzyme
Proteins
Decision Trees
protein
Protein Sequence
Bayes
Decision trees
Set theory
Modeling
Decision tree
modeling
Area Under Curve
Comparative Study
Dimensionality
Support vector machines

Keywords

  • Classification
  • Feature selection algorithms
  • Protein function
  • Protein sequences

ASJC Scopus subject areas

  • Education
  • Health(social science)
  • Mathematics(all)
  • Energy(all)
  • Computer Science(all)
  • Environmental Science(all)
  • Engineering(all)

Cite this

Feature selection on pectin lyase-like enzyme using computational methods. / Rahman, Shuzlina Abdul; Abu Bakar, Azuraliza; Mohamed Hussein, Zeti Azura.

In: Advanced Science Letters, Vol. 19, No. 11, 11.2013, p. 3374-3380.

Research output: Contribution to journalArticle

@article{1ffc267fb3754846a839b87da5cf6c54,
title = "Feature selection on pectin lyase-like enzyme using computational methods",
abstract = "The employment of feature selection algorithms (FSAs) prior to classification has become a necessity due to an enormous growth of public sequence databases and the nature of high dimensionality in protein sequences. This paper provides a comparative framework on four multivariate FSAs for finding minimal feature subsets prior to classification of a protein function from a pectin lyase-like superfamily. The comparative studies for these FSAs are based on four criteria: the accuracy, the area under ROC graph (AUC), the selected features, and the modelling time taken. Classification was performed on a reduced feature set using three state-of-the-art machine learning classifiers,Support Vector Machines, Na{\"i}ve Bayes and Decision Tree, on the dataset with and without FSAs. Our results suggest the importance of FSAs in improving the classification accuracy and reducing the modelling time.",
keywords = "Classification, Feature selection algorithms, Protein function, Protein sequences",
author = "Rahman, {Shuzlina Abdul} and {Abu Bakar}, Azuraliza and {Mohamed Hussein}, {Zeti Azura}",
year = "2013",
month = "11",
doi = "10.1166/asl.2013.5154",
language = "English",
volume = "19",
pages = "3374--3380",
journal = "Advanced Science Letters",
issn = "1936-6612",
publisher = "American Scientific Publishers",
number = "11",

}

TY - JOUR

T1 - Feature selection on pectin lyase-like enzyme using computational methods

AU - Rahman, Shuzlina Abdul

AU - Abu Bakar, Azuraliza

AU - Mohamed Hussein, Zeti Azura

PY - 2013/11

Y1 - 2013/11

N2 - The employment of feature selection algorithms (FSAs) prior to classification has become a necessity due to an enormous growth of public sequence databases and the nature of high dimensionality in protein sequences. This paper provides a comparative framework on four multivariate FSAs for finding minimal feature subsets prior to classification of a protein function from a pectin lyase-like superfamily. The comparative studies for these FSAs are based on four criteria: the accuracy, the area under ROC graph (AUC), the selected features, and the modelling time taken. Classification was performed on a reduced feature set using three state-of-the-art machine learning classifiers,Support Vector Machines, Naïve Bayes and Decision Tree, on the dataset with and without FSAs. Our results suggest the importance of FSAs in improving the classification accuracy and reducing the modelling time.

AB - The employment of feature selection algorithms (FSAs) prior to classification has become a necessity due to an enormous growth of public sequence databases and the nature of high dimensionality in protein sequences. This paper provides a comparative framework on four multivariate FSAs for finding minimal feature subsets prior to classification of a protein function from a pectin lyase-like superfamily. The comparative studies for these FSAs are based on four criteria: the accuracy, the area under ROC graph (AUC), the selected features, and the modelling time taken. Classification was performed on a reduced feature set using three state-of-the-art machine learning classifiers,Support Vector Machines, Naïve Bayes and Decision Tree, on the dataset with and without FSAs. Our results suggest the importance of FSAs in improving the classification accuracy and reducing the modelling time.

KW - Classification

KW - Feature selection algorithms

KW - Protein function

KW - Protein sequences

UR - http://www.scopus.com/inward/record.url?scp=84876387561&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84876387561&partnerID=8YFLogxK

U2 - 10.1166/asl.2013.5154

DO - 10.1166/asl.2013.5154

M3 - Article

AN - SCOPUS:84876387561

VL - 19

SP - 3374

EP - 3380

JO - Advanced Science Letters

JF - Advanced Science Letters

SN - 1936-6612

IS - 11

ER -