An improved method to enhance protein structural class prediction using their secondary structure sequences and genetic algorithm

Mohammed Hasan Aldulaimi, Suhaila Zainudin, Azuraliza Abu Bakar

Research output: Contribution to journalArticle

Abstract

Many approaches have been proposed to enhance the accuracy of protein structural class. However, such approaches did not cover the low-similarity sequences which are proved to be quite challenging. In this study, a 71-dimensional integrated feature vector is extracted from the predicted secondary structure and hydropathy sequence using newly devised strategies for the purpose of categorising proteins into their major structural classes: All-α, all-β, α/β and α+β. A new combined method containing two machine learning algorithms has been proposed for feature selections in this study. Support vector machine (SVM) and genetic algorithm (GA) are combined using the wrapper method for the purpose of selecting top N features based on the level of their importance. The proposed method is evaluated using the jackknife upon two low-similarity sequences datasets, i.e. ASTRAL and D640. The overall accuracies of 83.93 and 92.2% are reported for the predictions pertaining to ASTRALtesting and D640 benchmarks, exceeding most of the current approaches.

Original languageEnglish
Pages (from-to)376-400
Number of pages25
JournalInternational Journal of Bioinformatics Research and Applications
Volume14
Issue number4
DOIs
Publication statusPublished - 1 Jan 2018

Fingerprint

Genetic Structures
Genetic algorithms
Proteins
Learning algorithms
Support vector machines
Learning systems
Feature extraction
Benchmarking

Keywords

  • Feature selection
  • Genetic algorithm
  • Hydropathical information
  • Low-similarity
  • Secondary structure sequence
  • Support vector machine

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics
  • Clinical Biochemistry
  • Health Information Management

Cite this

@article{8d5a1d2d0669438db188ecd4ba54f399,
title = "An improved method to enhance protein structural class prediction using their secondary structure sequences and genetic algorithm",
abstract = "Many approaches have been proposed to enhance the accuracy of protein structural class. However, such approaches did not cover the low-similarity sequences which are proved to be quite challenging. In this study, a 71-dimensional integrated feature vector is extracted from the predicted secondary structure and hydropathy sequence using newly devised strategies for the purpose of categorising proteins into their major structural classes: All-α, all-β, α/β and α+β. A new combined method containing two machine learning algorithms has been proposed for feature selections in this study. Support vector machine (SVM) and genetic algorithm (GA) are combined using the wrapper method for the purpose of selecting top N features based on the level of their importance. The proposed method is evaluated using the jackknife upon two low-similarity sequences datasets, i.e. ASTRAL and D640. The overall accuracies of 83.93 and 92.2{\%} are reported for the predictions pertaining to ASTRALtesting and D640 benchmarks, exceeding most of the current approaches.",
keywords = "Feature selection, Genetic algorithm, Hydropathical information, Low-similarity, Secondary structure sequence, Support vector machine",
author = "Aldulaimi, {Mohammed Hasan} and Suhaila Zainudin and {Abu Bakar}, Azuraliza",
year = "2018",
month = "1",
day = "1",
doi = "10.1504/IJBRA.2018.094965",
language = "English",
volume = "14",
pages = "376--400",
journal = "International Journal of Bioinformatics Research and Applications",
issn = "1744-5485",
publisher = "Inderscience Enterprises Ltd",
number = "4",

}

TY - JOUR

T1 - An improved method to enhance protein structural class prediction using their secondary structure sequences and genetic algorithm

AU - Aldulaimi, Mohammed Hasan

AU - Zainudin, Suhaila

AU - Abu Bakar, Azuraliza

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Many approaches have been proposed to enhance the accuracy of protein structural class. However, such approaches did not cover the low-similarity sequences which are proved to be quite challenging. In this study, a 71-dimensional integrated feature vector is extracted from the predicted secondary structure and hydropathy sequence using newly devised strategies for the purpose of categorising proteins into their major structural classes: All-α, all-β, α/β and α+β. A new combined method containing two machine learning algorithms has been proposed for feature selections in this study. Support vector machine (SVM) and genetic algorithm (GA) are combined using the wrapper method for the purpose of selecting top N features based on the level of their importance. The proposed method is evaluated using the jackknife upon two low-similarity sequences datasets, i.e. ASTRAL and D640. The overall accuracies of 83.93 and 92.2% are reported for the predictions pertaining to ASTRALtesting and D640 benchmarks, exceeding most of the current approaches.

AB - Many approaches have been proposed to enhance the accuracy of protein structural class. However, such approaches did not cover the low-similarity sequences which are proved to be quite challenging. In this study, a 71-dimensional integrated feature vector is extracted from the predicted secondary structure and hydropathy sequence using newly devised strategies for the purpose of categorising proteins into their major structural classes: All-α, all-β, α/β and α+β. A new combined method containing two machine learning algorithms has been proposed for feature selections in this study. Support vector machine (SVM) and genetic algorithm (GA) are combined using the wrapper method for the purpose of selecting top N features based on the level of their importance. The proposed method is evaluated using the jackknife upon two low-similarity sequences datasets, i.e. ASTRAL and D640. The overall accuracies of 83.93 and 92.2% are reported for the predictions pertaining to ASTRALtesting and D640 benchmarks, exceeding most of the current approaches.

KW - Feature selection

KW - Genetic algorithm

KW - Hydropathical information

KW - Low-similarity

KW - Secondary structure sequence

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=85055098167&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055098167&partnerID=8YFLogxK

U2 - 10.1504/IJBRA.2018.094965

DO - 10.1504/IJBRA.2018.094965

M3 - Article

VL - 14

SP - 376

EP - 400

JO - International Journal of Bioinformatics Research and Applications

JF - International Journal of Bioinformatics Research and Applications

SN - 1744-5485

IS - 4

ER -