A classification of "Gracilaria changii" protein sequences using back-propagation classifier

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

This paper focuses on protein sequences family classification from Gracilaria changii seaweed species using back-propagation classifier. Classification of protein sequence family is to infer the function of an unknown protein by analysing its structural similarity to a given family of proteins. The use of sequence alignment technique to classify the protein sequence is less efficient because the entire sequence is used for classification. Data mining offers the uses of an artificial intelligence technique that is well known and good for classification. Therefore, the purpose of this research is to develop protein sequences classification for Gracilaria changii using data mining approach with feature extraction. The feature extraction is to identify the best features in the overall sequence. Data preparation for feature extraction is used bioinformatics tools to translate DNA to protein (Batch Translator) and to analyze the family protein (InterProScan). The feature extraction process is done on the data that has been prepared using the 2-gram method. Features that are obtained with this method are then used to develop the classification model using Back-Propagation Neural Network technique (RNPB). Experiment results from RNPB are then compared with the sequence alignment technique (HMMER). The comparison results show that classification model produced from RNPB is better than sequence alignment technique with average accuracy for the whole family as much as 99.01% compared 96.51%. For the specificity and sensitivity of the prediction, the HMMER and ANN were equally efficient.

Original languageEnglish
Title of host publication2009 2nd Conference on Data Mining and Optimization, DMO 2009
Pages94-99
Number of pages6
DOIs
Publication statusPublished - 2009
Event2009 2nd Conference on Data Mining and Optimization, DMO 2009 - Bangi, Selangor
Duration: 27 Oct 200928 Oct 2009

Other

Other2009 2nd Conference on Data Mining and Optimization, DMO 2009
CityBangi, Selangor
Period27/10/0928/10/09

Fingerprint

Backpropagation
Classifiers
Proteins
Feature extraction
Data mining
Seaweed
Bioinformatics
Artificial intelligence
DNA
Neural networks

Keywords

  • Back-propagation neural network
  • Classification
  • Data mining
  • Gracilaria changii
  • Protein
  • Sequence alignment

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software

Cite this

Mohamed, N. S., Ali Othman, Z., & Abu Bakar, A. (2009). A classification of "Gracilaria changii" protein sequences using back-propagation classifier. In 2009 2nd Conference on Data Mining and Optimization, DMO 2009 (pp. 94-99). [5341902] https://doi.org/10.1109/DMO.2009.5341902

A classification of "Gracilaria changii" protein sequences using back-propagation classifier. / Mohamed, Nur Shazila; Ali Othman, Zulaiha; Abu Bakar, Azuraliza.

2009 2nd Conference on Data Mining and Optimization, DMO 2009. 2009. p. 94-99 5341902.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mohamed, NS, Ali Othman, Z & Abu Bakar, A 2009, A classification of "Gracilaria changii" protein sequences using back-propagation classifier. in 2009 2nd Conference on Data Mining and Optimization, DMO 2009., 5341902, pp. 94-99, 2009 2nd Conference on Data Mining and Optimization, DMO 2009, Bangi, Selangor, 27/10/09. https://doi.org/10.1109/DMO.2009.5341902
Mohamed, Nur Shazila ; Ali Othman, Zulaiha ; Abu Bakar, Azuraliza. / A classification of "Gracilaria changii" protein sequences using back-propagation classifier. 2009 2nd Conference on Data Mining and Optimization, DMO 2009. 2009. pp. 94-99
@inproceedings{ef6db4a05c294d16b8c3bff6dc94c18b,
title = "A classification of {"}Gracilaria changii{"} protein sequences using back-propagation classifier",
abstract = "This paper focuses on protein sequences family classification from Gracilaria changii seaweed species using back-propagation classifier. Classification of protein sequence family is to infer the function of an unknown protein by analysing its structural similarity to a given family of proteins. The use of sequence alignment technique to classify the protein sequence is less efficient because the entire sequence is used for classification. Data mining offers the uses of an artificial intelligence technique that is well known and good for classification. Therefore, the purpose of this research is to develop protein sequences classification for Gracilaria changii using data mining approach with feature extraction. The feature extraction is to identify the best features in the overall sequence. Data preparation for feature extraction is used bioinformatics tools to translate DNA to protein (Batch Translator) and to analyze the family protein (InterProScan). The feature extraction process is done on the data that has been prepared using the 2-gram method. Features that are obtained with this method are then used to develop the classification model using Back-Propagation Neural Network technique (RNPB). Experiment results from RNPB are then compared with the sequence alignment technique (HMMER). The comparison results show that classification model produced from RNPB is better than sequence alignment technique with average accuracy for the whole family as much as 99.01{\%} compared 96.51{\%}. For the specificity and sensitivity of the prediction, the HMMER and ANN were equally efficient.",
keywords = "Back-propagation neural network, Classification, Data mining, Gracilaria changii, Protein, Sequence alignment",
author = "Mohamed, {Nur Shazila} and {Ali Othman}, Zulaiha and {Abu Bakar}, Azuraliza",
year = "2009",
doi = "10.1109/DMO.2009.5341902",
language = "English",
isbn = "9781424449446",
pages = "94--99",
booktitle = "2009 2nd Conference on Data Mining and Optimization, DMO 2009",

}

TY - GEN

T1 - A classification of "Gracilaria changii" protein sequences using back-propagation classifier

AU - Mohamed, Nur Shazila

AU - Ali Othman, Zulaiha

AU - Abu Bakar, Azuraliza

PY - 2009

Y1 - 2009

N2 - This paper focuses on protein sequences family classification from Gracilaria changii seaweed species using back-propagation classifier. Classification of protein sequence family is to infer the function of an unknown protein by analysing its structural similarity to a given family of proteins. The use of sequence alignment technique to classify the protein sequence is less efficient because the entire sequence is used for classification. Data mining offers the uses of an artificial intelligence technique that is well known and good for classification. Therefore, the purpose of this research is to develop protein sequences classification for Gracilaria changii using data mining approach with feature extraction. The feature extraction is to identify the best features in the overall sequence. Data preparation for feature extraction is used bioinformatics tools to translate DNA to protein (Batch Translator) and to analyze the family protein (InterProScan). The feature extraction process is done on the data that has been prepared using the 2-gram method. Features that are obtained with this method are then used to develop the classification model using Back-Propagation Neural Network technique (RNPB). Experiment results from RNPB are then compared with the sequence alignment technique (HMMER). The comparison results show that classification model produced from RNPB is better than sequence alignment technique with average accuracy for the whole family as much as 99.01% compared 96.51%. For the specificity and sensitivity of the prediction, the HMMER and ANN were equally efficient.

AB - This paper focuses on protein sequences family classification from Gracilaria changii seaweed species using back-propagation classifier. Classification of protein sequence family is to infer the function of an unknown protein by analysing its structural similarity to a given family of proteins. The use of sequence alignment technique to classify the protein sequence is less efficient because the entire sequence is used for classification. Data mining offers the uses of an artificial intelligence technique that is well known and good for classification. Therefore, the purpose of this research is to develop protein sequences classification for Gracilaria changii using data mining approach with feature extraction. The feature extraction is to identify the best features in the overall sequence. Data preparation for feature extraction is used bioinformatics tools to translate DNA to protein (Batch Translator) and to analyze the family protein (InterProScan). The feature extraction process is done on the data that has been prepared using the 2-gram method. Features that are obtained with this method are then used to develop the classification model using Back-Propagation Neural Network technique (RNPB). Experiment results from RNPB are then compared with the sequence alignment technique (HMMER). The comparison results show that classification model produced from RNPB is better than sequence alignment technique with average accuracy for the whole family as much as 99.01% compared 96.51%. For the specificity and sensitivity of the prediction, the HMMER and ANN were equally efficient.

KW - Back-propagation neural network

KW - Classification

KW - Data mining

KW - Gracilaria changii

KW - Protein

KW - Sequence alignment

UR - http://www.scopus.com/inward/record.url?scp=72449143111&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=72449143111&partnerID=8YFLogxK

U2 - 10.1109/DMO.2009.5341902

DO - 10.1109/DMO.2009.5341902

M3 - Conference contribution

SN - 9781424449446

SP - 94

EP - 99

BT - 2009 2nd Conference on Data Mining and Optimization, DMO 2009

ER -