A review on protein sequence clustering research

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The enormous growth of public sequence databases and continuing addition of fully sequenced genomes is a fertile area for data mining. The clustering research is at the cross road of research from several research communities such as document retrieval, image segmentation, and artificial intelligence research communities especially from machine learning and data mining in which the data size is very large. In this paper, we surveyed the clustering aspects of protein sequence data sets by pointing out the problems that was encountered during the procedures. Challenges include identifying multidomain proteins, identifying remote homologues, identifying protein families, and dealing with large-scale data sets. We then analyzed the clustering techniques that have been developed by exploring how they addressed the issues. In this survey, we focused on the alignment method and clustering algorithm that were employed. We limit our study on the heuristicbased categories namely hierarchical and partitional approaches. We concluded this paper with some research issues.

Original languageEnglish
Title of host publicationIFMBE Proceedings
Pages275-278
Number of pages4
Volume21 IFMBE
Edition1
DOIs
Publication statusPublished - 2008
Event4th Kuala Lumpur International Conference on Biomedical Engineering 2008, Biomed 2008 - Kuala Lumpur
Duration: 25 Jun 200828 Jun 2008

Other

Other4th Kuala Lumpur International Conference on Biomedical Engineering 2008, Biomed 2008
CityKuala Lumpur
Period25/6/0828/6/08

Fingerprint

Proteins
Data mining
Image segmentation
Clustering algorithms
Artificial intelligence
Learning systems
Genes

Keywords

  • Clustering
  • Data Mining
  • Hierarchical
  • Partitional
  • Protein Sequences

ASJC Scopus subject areas

  • Biomedical Engineering
  • Bioengineering

Cite this

A review on protein sequence clustering research. / Rahman, Shuzlina Abdul; Abu Bakar, Azuraliza; Mohamed Hussein, Zeti Azura.

IFMBE Proceedings. Vol. 21 IFMBE 1. ed. 2008. p. 275-278.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Rahman, SA, Abu Bakar, A & Mohamed Hussein, ZA 2008, A review on protein sequence clustering research. in IFMBE Proceedings. 1 edn, vol. 21 IFMBE, pp. 275-278, 4th Kuala Lumpur International Conference on Biomedical Engineering 2008, Biomed 2008, Kuala Lumpur, 25/6/08. https://doi.org/10.1007/978-3-540-69139-6-71
Rahman, Shuzlina Abdul ; Abu Bakar, Azuraliza ; Mohamed Hussein, Zeti Azura. / A review on protein sequence clustering research. IFMBE Proceedings. Vol. 21 IFMBE 1. ed. 2008. pp. 275-278
@inproceedings{728147a8c15942ddb9770ad00090999d,
title = "A review on protein sequence clustering research",
abstract = "The enormous growth of public sequence databases and continuing addition of fully sequenced genomes is a fertile area for data mining. The clustering research is at the cross road of research from several research communities such as document retrieval, image segmentation, and artificial intelligence research communities especially from machine learning and data mining in which the data size is very large. In this paper, we surveyed the clustering aspects of protein sequence data sets by pointing out the problems that was encountered during the procedures. Challenges include identifying multidomain proteins, identifying remote homologues, identifying protein families, and dealing with large-scale data sets. We then analyzed the clustering techniques that have been developed by exploring how they addressed the issues. In this survey, we focused on the alignment method and clustering algorithm that were employed. We limit our study on the heuristicbased categories namely hierarchical and partitional approaches. We concluded this paper with some research issues.",
keywords = "Clustering, Data Mining, Hierarchical, Partitional, Protein Sequences",
author = "Rahman, {Shuzlina Abdul} and {Abu Bakar}, Azuraliza and {Mohamed Hussein}, {Zeti Azura}",
year = "2008",
doi = "10.1007/978-3-540-69139-6-71",
language = "English",
isbn = "9783540691389",
volume = "21 IFMBE",
pages = "275--278",
booktitle = "IFMBE Proceedings",
edition = "1",

}

TY - GEN

T1 - A review on protein sequence clustering research

AU - Rahman, Shuzlina Abdul

AU - Abu Bakar, Azuraliza

AU - Mohamed Hussein, Zeti Azura

PY - 2008

Y1 - 2008

N2 - The enormous growth of public sequence databases and continuing addition of fully sequenced genomes is a fertile area for data mining. The clustering research is at the cross road of research from several research communities such as document retrieval, image segmentation, and artificial intelligence research communities especially from machine learning and data mining in which the data size is very large. In this paper, we surveyed the clustering aspects of protein sequence data sets by pointing out the problems that was encountered during the procedures. Challenges include identifying multidomain proteins, identifying remote homologues, identifying protein families, and dealing with large-scale data sets. We then analyzed the clustering techniques that have been developed by exploring how they addressed the issues. In this survey, we focused on the alignment method and clustering algorithm that were employed. We limit our study on the heuristicbased categories namely hierarchical and partitional approaches. We concluded this paper with some research issues.

AB - The enormous growth of public sequence databases and continuing addition of fully sequenced genomes is a fertile area for data mining. The clustering research is at the cross road of research from several research communities such as document retrieval, image segmentation, and artificial intelligence research communities especially from machine learning and data mining in which the data size is very large. In this paper, we surveyed the clustering aspects of protein sequence data sets by pointing out the problems that was encountered during the procedures. Challenges include identifying multidomain proteins, identifying remote homologues, identifying protein families, and dealing with large-scale data sets. We then analyzed the clustering techniques that have been developed by exploring how they addressed the issues. In this survey, we focused on the alignment method and clustering algorithm that were employed. We limit our study on the heuristicbased categories namely hierarchical and partitional approaches. We concluded this paper with some research issues.

KW - Clustering

KW - Data Mining

KW - Hierarchical

KW - Partitional

KW - Protein Sequences

UR - http://www.scopus.com/inward/record.url?scp=78349285077&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78349285077&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-69139-6-71

DO - 10.1007/978-3-540-69139-6-71

M3 - Conference contribution

SN - 9783540691389

VL - 21 IFMBE

SP - 275

EP - 278

BT - IFMBE Proceedings

ER -