Feature selection and classification of protein subfamilies using rough sets

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Machine learning methods are known to be inefficient when faced with many features that are unnecessary for rule discovery. In coping with this issue, many methods have been proposed for selecting important features. Among them is feature selection that selects a subset of discriminative features or attribute for model building due to its ability to avoid overfitting issue, improve model performance, provide faster and producing reliable model. This paper proposes a new method based on Rough Set algorithms, which is a rule-based data mining method to select the important features in bioinformatics datasets. Amino acid compositions are used as conditional features for the classification task. However, our results indicate that all amino acid composition features are equally important thus selecting the features are unnecessary. We do confirm the need of having a balance classes in classifying the protein function by demonstrating an increase of more than 15% in accuracy.

Original languageEnglish
Title of host publicationProceedings of the 2009 International Conference on Electrical Engineering and Informatics, ICEEI 2009
Pages32-35
Number of pages4
Volume1
DOIs
Publication statusPublished - 2009
Event2009 International Conference on Electrical Engineering and Informatics, ICEEI 2009 - Selangor
Duration: 5 Aug 20097 Aug 2009

Other

Other2009 International Conference on Electrical Engineering and Informatics, ICEEI 2009
CitySelangor
Period5/8/097/8/09

Fingerprint

Feature extraction
Proteins
Amino acids
Bioinformatics
Chemical analysis
Data mining
Learning systems

Keywords

  • Feature selection
  • Protein function classification
  • Rough sets

ASJC Scopus subject areas

  • Information Systems
  • Software
  • Energy Engineering and Power Technology
  • Electrical and Electronic Engineering

Cite this

Rahman, H. A., Abu Bakar, A., & Mohamed Hussein, Z. A. (2009). Feature selection and classification of protein subfamilies using rough sets. In Proceedings of the 2009 International Conference on Electrical Engineering and Informatics, ICEEI 2009 (Vol. 1, pp. 32-35). [5254822] https://doi.org/10.1109/ICEEI.2009.5254822

Feature selection and classification of protein subfamilies using rough sets. / Rahman, Huzlina Abdul; Abu Bakar, Azuraliza; Mohamed Hussein, Zeti Azura.

Proceedings of the 2009 International Conference on Electrical Engineering and Informatics, ICEEI 2009. Vol. 1 2009. p. 32-35 5254822.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Rahman, HA, Abu Bakar, A & Mohamed Hussein, ZA 2009, Feature selection and classification of protein subfamilies using rough sets. in Proceedings of the 2009 International Conference on Electrical Engineering and Informatics, ICEEI 2009. vol. 1, 5254822, pp. 32-35, 2009 International Conference on Electrical Engineering and Informatics, ICEEI 2009, Selangor, 5/8/09. https://doi.org/10.1109/ICEEI.2009.5254822
Rahman HA, Abu Bakar A, Mohamed Hussein ZA. Feature selection and classification of protein subfamilies using rough sets. In Proceedings of the 2009 International Conference on Electrical Engineering and Informatics, ICEEI 2009. Vol. 1. 2009. p. 32-35. 5254822 https://doi.org/10.1109/ICEEI.2009.5254822
Rahman, Huzlina Abdul ; Abu Bakar, Azuraliza ; Mohamed Hussein, Zeti Azura. / Feature selection and classification of protein subfamilies using rough sets. Proceedings of the 2009 International Conference on Electrical Engineering and Informatics, ICEEI 2009. Vol. 1 2009. pp. 32-35
@inproceedings{4048b573379749fda6204ee6b4bf3ebb,
title = "Feature selection and classification of protein subfamilies using rough sets",
abstract = "Machine learning methods are known to be inefficient when faced with many features that are unnecessary for rule discovery. In coping with this issue, many methods have been proposed for selecting important features. Among them is feature selection that selects a subset of discriminative features or attribute for model building due to its ability to avoid overfitting issue, improve model performance, provide faster and producing reliable model. This paper proposes a new method based on Rough Set algorithms, which is a rule-based data mining method to select the important features in bioinformatics datasets. Amino acid compositions are used as conditional features for the classification task. However, our results indicate that all amino acid composition features are equally important thus selecting the features are unnecessary. We do confirm the need of having a balance classes in classifying the protein function by demonstrating an increase of more than 15{\%} in accuracy.",
keywords = "Feature selection, Protein function classification, Rough sets",
author = "Rahman, {Huzlina Abdul} and {Abu Bakar}, Azuraliza and {Mohamed Hussein}, {Zeti Azura}",
year = "2009",
doi = "10.1109/ICEEI.2009.5254822",
language = "English",
isbn = "9781424449132",
volume = "1",
pages = "32--35",
booktitle = "Proceedings of the 2009 International Conference on Electrical Engineering and Informatics, ICEEI 2009",

}

TY - GEN

T1 - Feature selection and classification of protein subfamilies using rough sets

AU - Rahman, Huzlina Abdul

AU - Abu Bakar, Azuraliza

AU - Mohamed Hussein, Zeti Azura

PY - 2009

Y1 - 2009

N2 - Machine learning methods are known to be inefficient when faced with many features that are unnecessary for rule discovery. In coping with this issue, many methods have been proposed for selecting important features. Among them is feature selection that selects a subset of discriminative features or attribute for model building due to its ability to avoid overfitting issue, improve model performance, provide faster and producing reliable model. This paper proposes a new method based on Rough Set algorithms, which is a rule-based data mining method to select the important features in bioinformatics datasets. Amino acid compositions are used as conditional features for the classification task. However, our results indicate that all amino acid composition features are equally important thus selecting the features are unnecessary. We do confirm the need of having a balance classes in classifying the protein function by demonstrating an increase of more than 15% in accuracy.

AB - Machine learning methods are known to be inefficient when faced with many features that are unnecessary for rule discovery. In coping with this issue, many methods have been proposed for selecting important features. Among them is feature selection that selects a subset of discriminative features or attribute for model building due to its ability to avoid overfitting issue, improve model performance, provide faster and producing reliable model. This paper proposes a new method based on Rough Set algorithms, which is a rule-based data mining method to select the important features in bioinformatics datasets. Amino acid compositions are used as conditional features for the classification task. However, our results indicate that all amino acid composition features are equally important thus selecting the features are unnecessary. We do confirm the need of having a balance classes in classifying the protein function by demonstrating an increase of more than 15% in accuracy.

KW - Feature selection

KW - Protein function classification

KW - Rough sets

UR - http://www.scopus.com/inward/record.url?scp=70449670961&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70449670961&partnerID=8YFLogxK

U2 - 10.1109/ICEEI.2009.5254822

DO - 10.1109/ICEEI.2009.5254822

M3 - Conference contribution

SN - 9781424449132

VL - 1

SP - 32

EP - 35

BT - Proceedings of the 2009 International Conference on Electrical Engineering and Informatics, ICEEI 2009

ER -