Computational discovery and annotation of conserved small open reading frames in fungal genomes

Shuhaila Mat-Sharani, Mohd Firdaus Mohd Raih

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Background: Small open reading frames (smORF/sORFs) that encode short protein sequences are often overlooked during the standard gene prediction process thus leading to many sORFs being left undiscovered and/or misannotated. For many genomes, a second round of sORF targeted gene prediction can complement the existing annotation. In this study, we specifically targeted the identification of ORFs encoding for 80 amino acid residues or less from 31 fungal genomes. We then compared the predicted sORFs and analysed those that are highly conserved among the genomes. Results: A first set of sORFs was identified from existing annotations that fitted the maximum of 80 residues criterion. A second set was predicted using parameters that specifically searched for ORF candidates of 80 codons or less in the exonic, intronic and intergenic sequences of the subject genomes. A total of 1986 conserved sORFs were predicted and characterized. Conclusions: It is evident that numerous open reading frames that could potentially encode for polypeptides consisting of 80 amino acid residues or less are overlooked during standard gene prediction and annotation. From our results, additional targeted reannotation of genomes is clearly able to complement standard genome annotation to identify sORFs. Due to the lack of, and limitations with experimental validation, we propose that a simple conservation analysis can provide an acceptable means of ensuring that the predicted sORFs are sufficiently clear of gene prediction artefacts.

Original languageEnglish
Article number551
JournalBMC Bioinformatics
Volume19
DOIs
Publication statusPublished - 4 Feb 2019

Fingerprint

Fungal Genome
Open Reading Frames
Annotation
Genome
Genes
Gene
Prediction
Amino Acids
Molecular Sequence Annotation
Complement
Intergenic DNA
Codon
Artifacts
Experimental Validation
Protein Sequence
Amino acids
Conservation
Encoding
Peptides
Polypeptides

Keywords

  • Conserved
  • Fungal
  • Small open Reading frames
  • smORF
  • sORFs

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Computational discovery and annotation of conserved small open reading frames in fungal genomes. / Mat-Sharani, Shuhaila; Mohd Raih, Mohd Firdaus.

In: BMC Bioinformatics, Vol. 19, 551, 04.02.2019.

Research output: Contribution to journalArticle

@article{87b4f3567f344093b746a54a38e7f9cd,
title = "Computational discovery and annotation of conserved small open reading frames in fungal genomes",
abstract = "Background: Small open reading frames (smORF/sORFs) that encode short protein sequences are often overlooked during the standard gene prediction process thus leading to many sORFs being left undiscovered and/or misannotated. For many genomes, a second round of sORF targeted gene prediction can complement the existing annotation. In this study, we specifically targeted the identification of ORFs encoding for 80 amino acid residues or less from 31 fungal genomes. We then compared the predicted sORFs and analysed those that are highly conserved among the genomes. Results: A first set of sORFs was identified from existing annotations that fitted the maximum of 80 residues criterion. A second set was predicted using parameters that specifically searched for ORF candidates of 80 codons or less in the exonic, intronic and intergenic sequences of the subject genomes. A total of 1986 conserved sORFs were predicted and characterized. Conclusions: It is evident that numerous open reading frames that could potentially encode for polypeptides consisting of 80 amino acid residues or less are overlooked during standard gene prediction and annotation. From our results, additional targeted reannotation of genomes is clearly able to complement standard genome annotation to identify sORFs. Due to the lack of, and limitations with experimental validation, we propose that a simple conservation analysis can provide an acceptable means of ensuring that the predicted sORFs are sufficiently clear of gene prediction artefacts.",
keywords = "Conserved, Fungal, Small open Reading frames, smORF, sORFs",
author = "Shuhaila Mat-Sharani and {Mohd Raih}, {Mohd Firdaus}",
year = "2019",
month = "2",
day = "4",
doi = "10.1186/s12859-018-2550-2",
language = "English",
volume = "19",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Computational discovery and annotation of conserved small open reading frames in fungal genomes

AU - Mat-Sharani, Shuhaila

AU - Mohd Raih, Mohd Firdaus

PY - 2019/2/4

Y1 - 2019/2/4

N2 - Background: Small open reading frames (smORF/sORFs) that encode short protein sequences are often overlooked during the standard gene prediction process thus leading to many sORFs being left undiscovered and/or misannotated. For many genomes, a second round of sORF targeted gene prediction can complement the existing annotation. In this study, we specifically targeted the identification of ORFs encoding for 80 amino acid residues or less from 31 fungal genomes. We then compared the predicted sORFs and analysed those that are highly conserved among the genomes. Results: A first set of sORFs was identified from existing annotations that fitted the maximum of 80 residues criterion. A second set was predicted using parameters that specifically searched for ORF candidates of 80 codons or less in the exonic, intronic and intergenic sequences of the subject genomes. A total of 1986 conserved sORFs were predicted and characterized. Conclusions: It is evident that numerous open reading frames that could potentially encode for polypeptides consisting of 80 amino acid residues or less are overlooked during standard gene prediction and annotation. From our results, additional targeted reannotation of genomes is clearly able to complement standard genome annotation to identify sORFs. Due to the lack of, and limitations with experimental validation, we propose that a simple conservation analysis can provide an acceptable means of ensuring that the predicted sORFs are sufficiently clear of gene prediction artefacts.

AB - Background: Small open reading frames (smORF/sORFs) that encode short protein sequences are often overlooked during the standard gene prediction process thus leading to many sORFs being left undiscovered and/or misannotated. For many genomes, a second round of sORF targeted gene prediction can complement the existing annotation. In this study, we specifically targeted the identification of ORFs encoding for 80 amino acid residues or less from 31 fungal genomes. We then compared the predicted sORFs and analysed those that are highly conserved among the genomes. Results: A first set of sORFs was identified from existing annotations that fitted the maximum of 80 residues criterion. A second set was predicted using parameters that specifically searched for ORF candidates of 80 codons or less in the exonic, intronic and intergenic sequences of the subject genomes. A total of 1986 conserved sORFs were predicted and characterized. Conclusions: It is evident that numerous open reading frames that could potentially encode for polypeptides consisting of 80 amino acid residues or less are overlooked during standard gene prediction and annotation. From our results, additional targeted reannotation of genomes is clearly able to complement standard genome annotation to identify sORFs. Due to the lack of, and limitations with experimental validation, we propose that a simple conservation analysis can provide an acceptable means of ensuring that the predicted sORFs are sufficiently clear of gene prediction artefacts.

KW - Conserved

KW - Fungal

KW - Small open Reading frames

KW - smORF

KW - sORFs

UR - http://www.scopus.com/inward/record.url?scp=85061095672&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061095672&partnerID=8YFLogxK

U2 - 10.1186/s12859-018-2550-2

DO - 10.1186/s12859-018-2550-2

M3 - Article

C2 - 30717662

AN - SCOPUS:85061095672

VL - 19

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 551

ER -