Proteogenomics: From next-generation sequencing (NGS) and mass spectrometry-based proteomics to precision medicine

Mia Yang Ang, Low Teck Yew, Pey Yee Lee, Wan Fahmi Wan Mohamad Nazarie, Victor Guryev, A. Rahman A. Jamal

Research output: Contribution to journalReview article

Abstract

One of the best-established area within multi-omics is proteogenomics, whereby the underpinning technologies are next-generation sequencing (NGS) and mass spectrometry (MS). Proteogenomics has contributed significantly to genome (re)-annotation, whereby novel coding sequences (CDS) are identified and confirmed. By incorporating in-silico translated genome variants in protein database, single amino acid variants (SAAV) and splice proteoforms can be identified and quantified at peptide level. The application of proteogenomics in cancer research potentially enables the identification of patient-specific proteoforms, as well as the association of the efficacy or resistance of cancer therapy to different mutations. Here, we discuss how NGS/TGS data are analyzed and incorporated into the proteogenomic framework. These sequence data mainly originate from whole genome sequencing (WGS), whole exome sequencing (WES) and RNA-Seq. We explain two major strategies for sequence analysis i.e., de novo assembly and reads mapping, followed by construction of customized protein databases using such data. Besides, we also elaborate on the procedures of spectrum to peptide sequence matching in proteogenomics, and the relationship between database size on the false discovery rate (FDR). Finally, we discuss the latest development in proteogenomics-assisted precision oncology and also challenges and opportunities in proteogenomics research.

Original languageEnglish
Pages (from-to)38-46
Number of pages9
JournalClinica Chimica Acta
Volume498
DOIs
Publication statusPublished - 1 Nov 2019

Fingerprint

Precision Medicine
Proteomics
Medicine
Mass spectrometry
Mass Spectrometry
Protein Databases
Genes
Genome
Exome
RNA Sequence Analysis
Peptides
Oncology
Proteogenomics
Computer Simulation
Sequence Analysis
Neoplasms
Proteins
Databases
RNA
Technology

Keywords

  • Genomic variant
  • Genomics
  • Mass spectrometry (MS)
  • Next-generation sequencing (NGS)
  • Proteogenomics
  • Proteomics

ASJC Scopus subject areas

  • Biochemistry
  • Clinical Biochemistry
  • Biochemistry, medical

Cite this

Proteogenomics : From next-generation sequencing (NGS) and mass spectrometry-based proteomics to precision medicine. / Ang, Mia Yang; Teck Yew, Low; Lee, Pey Yee; Wan Mohamad Nazarie, Wan Fahmi; Guryev, Victor; A. Jamal, A. Rahman.

In: Clinica Chimica Acta, Vol. 498, 01.11.2019, p. 38-46.

Research output: Contribution to journalReview article

Ang, Mia Yang ; Teck Yew, Low ; Lee, Pey Yee ; Wan Mohamad Nazarie, Wan Fahmi ; Guryev, Victor ; A. Jamal, A. Rahman. / Proteogenomics : From next-generation sequencing (NGS) and mass spectrometry-based proteomics to precision medicine. In: Clinica Chimica Acta. 2019 ; Vol. 498. pp. 38-46.
@article{e97dd547ba8944ffa261e9014e591c9b,
title = "Proteogenomics: From next-generation sequencing (NGS) and mass spectrometry-based proteomics to precision medicine",
abstract = "One of the best-established area within multi-omics is proteogenomics, whereby the underpinning technologies are next-generation sequencing (NGS) and mass spectrometry (MS). Proteogenomics has contributed significantly to genome (re)-annotation, whereby novel coding sequences (CDS) are identified and confirmed. By incorporating in-silico translated genome variants in protein database, single amino acid variants (SAAV) and splice proteoforms can be identified and quantified at peptide level. The application of proteogenomics in cancer research potentially enables the identification of patient-specific proteoforms, as well as the association of the efficacy or resistance of cancer therapy to different mutations. Here, we discuss how NGS/TGS data are analyzed and incorporated into the proteogenomic framework. These sequence data mainly originate from whole genome sequencing (WGS), whole exome sequencing (WES) and RNA-Seq. We explain two major strategies for sequence analysis i.e., de novo assembly and reads mapping, followed by construction of customized protein databases using such data. Besides, we also elaborate on the procedures of spectrum to peptide sequence matching in proteogenomics, and the relationship between database size on the false discovery rate (FDR). Finally, we discuss the latest development in proteogenomics-assisted precision oncology and also challenges and opportunities in proteogenomics research.",
keywords = "Genomic variant, Genomics, Mass spectrometry (MS), Next-generation sequencing (NGS), Proteogenomics, Proteomics",
author = "Ang, {Mia Yang} and {Teck Yew}, Low and Lee, {Pey Yee} and {Wan Mohamad Nazarie}, {Wan Fahmi} and Victor Guryev and {A. Jamal}, {A. Rahman}",
year = "2019",
month = "11",
day = "1",
doi = "10.1016/j.cca.2019.08.010",
language = "English",
volume = "498",
pages = "38--46",
journal = "Clinica Chimica Acta",
issn = "0009-8981",
publisher = "Elsevier",

}

TY - JOUR

T1 - Proteogenomics

T2 - From next-generation sequencing (NGS) and mass spectrometry-based proteomics to precision medicine

AU - Ang, Mia Yang

AU - Teck Yew, Low

AU - Lee, Pey Yee

AU - Wan Mohamad Nazarie, Wan Fahmi

AU - Guryev, Victor

AU - A. Jamal, A. Rahman

PY - 2019/11/1

Y1 - 2019/11/1

N2 - One of the best-established area within multi-omics is proteogenomics, whereby the underpinning technologies are next-generation sequencing (NGS) and mass spectrometry (MS). Proteogenomics has contributed significantly to genome (re)-annotation, whereby novel coding sequences (CDS) are identified and confirmed. By incorporating in-silico translated genome variants in protein database, single amino acid variants (SAAV) and splice proteoforms can be identified and quantified at peptide level. The application of proteogenomics in cancer research potentially enables the identification of patient-specific proteoforms, as well as the association of the efficacy or resistance of cancer therapy to different mutations. Here, we discuss how NGS/TGS data are analyzed and incorporated into the proteogenomic framework. These sequence data mainly originate from whole genome sequencing (WGS), whole exome sequencing (WES) and RNA-Seq. We explain two major strategies for sequence analysis i.e., de novo assembly and reads mapping, followed by construction of customized protein databases using such data. Besides, we also elaborate on the procedures of spectrum to peptide sequence matching in proteogenomics, and the relationship between database size on the false discovery rate (FDR). Finally, we discuss the latest development in proteogenomics-assisted precision oncology and also challenges and opportunities in proteogenomics research.

AB - One of the best-established area within multi-omics is proteogenomics, whereby the underpinning technologies are next-generation sequencing (NGS) and mass spectrometry (MS). Proteogenomics has contributed significantly to genome (re)-annotation, whereby novel coding sequences (CDS) are identified and confirmed. By incorporating in-silico translated genome variants in protein database, single amino acid variants (SAAV) and splice proteoforms can be identified and quantified at peptide level. The application of proteogenomics in cancer research potentially enables the identification of patient-specific proteoforms, as well as the association of the efficacy or resistance of cancer therapy to different mutations. Here, we discuss how NGS/TGS data are analyzed and incorporated into the proteogenomic framework. These sequence data mainly originate from whole genome sequencing (WGS), whole exome sequencing (WES) and RNA-Seq. We explain two major strategies for sequence analysis i.e., de novo assembly and reads mapping, followed by construction of customized protein databases using such data. Besides, we also elaborate on the procedures of spectrum to peptide sequence matching in proteogenomics, and the relationship between database size on the false discovery rate (FDR). Finally, we discuss the latest development in proteogenomics-assisted precision oncology and also challenges and opportunities in proteogenomics research.

KW - Genomic variant

KW - Genomics

KW - Mass spectrometry (MS)

KW - Next-generation sequencing (NGS)

KW - Proteogenomics

KW - Proteomics

UR - http://www.scopus.com/inward/record.url?scp=85070867650&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85070867650&partnerID=8YFLogxK

U2 - 10.1016/j.cca.2019.08.010

DO - 10.1016/j.cca.2019.08.010

M3 - Review article

AN - SCOPUS:85070867650

VL - 498

SP - 38

EP - 46

JO - Clinica Chimica Acta

JF - Clinica Chimica Acta

SN - 0009-8981

ER -