Partial least squares-discriminant analysis (PLS-DA) for classification of high-dimensional (HD) data: A review of contemporary practice strategies and knowledge gaps

Research output: Contribution to journalReview article

22 Citations (Scopus)

Abstract

Partial least squares-discriminant analysis (PLS-DA) is a versatile algorithm that can be used for predictive and descriptive modelling as well as for discriminative variable selection. However, versatility is both a blessing and a curse and the user needs to optimize a wealth of parameters before reaching reliable and valid outcomes. Over the past two decades, PLS-DA has demonstrated great success in modelling high-dimensional datasets for diverse purposes, e.g. product authentication in food analysis, diseases classification in medical diagnosis, and evidence analysis in forensic science. Despite that, in practice, many users have yet to grasp the essence of constructing a valid and reliable PLS-DA model. As the technology progresses, across every discipline, datasets are evolving into a more complex form, i.e. multi-class, imbalanced and colossal. Indeed, the community is welcoming a new era called big data. In this context, the aim of the article is two-fold: (a) to review, outline and describe the contemporary PLS-DA modelling practice strategies, and (b) to critically discuss the respective knowledge gaps that have emerged in response to the present big data era. This work could complement other available reviews or tutorials on PLS-DA, to provide a timely and user-friendly guide to researchers, especially those working in applied research.

Original languageEnglish
Pages (from-to)3526-3539
Number of pages14
JournalAnalyst
Volume143
Issue number15
DOIs
Publication statusPublished - 7 Aug 2018

Fingerprint

Discriminant Analysis
Discriminant analysis
discriminant analysis
Least-Squares Analysis
forensic science
modeling
Food Analysis
Forensic Sciences
Authentication
Research Personnel
fold
Technology
food
Research
Datasets
analysis
Big data

ASJC Scopus subject areas

  • Analytical Chemistry
  • Biochemistry
  • Environmental Chemistry
  • Spectroscopy
  • Electrochemistry

Cite this

@article{f67175fa1c05439b9e51510879b91463,
title = "Partial least squares-discriminant analysis (PLS-DA) for classification of high-dimensional (HD) data: A review of contemporary practice strategies and knowledge gaps",
abstract = "Partial least squares-discriminant analysis (PLS-DA) is a versatile algorithm that can be used for predictive and descriptive modelling as well as for discriminative variable selection. However, versatility is both a blessing and a curse and the user needs to optimize a wealth of parameters before reaching reliable and valid outcomes. Over the past two decades, PLS-DA has demonstrated great success in modelling high-dimensional datasets for diverse purposes, e.g. product authentication in food analysis, diseases classification in medical diagnosis, and evidence analysis in forensic science. Despite that, in practice, many users have yet to grasp the essence of constructing a valid and reliable PLS-DA model. As the technology progresses, across every discipline, datasets are evolving into a more complex form, i.e. multi-class, imbalanced and colossal. Indeed, the community is welcoming a new era called big data. In this context, the aim of the article is two-fold: (a) to review, outline and describe the contemporary PLS-DA modelling practice strategies, and (b) to critically discuss the respective knowledge gaps that have emerged in response to the present big data era. This work could complement other available reviews or tutorials on PLS-DA, to provide a timely and user-friendly guide to researchers, especially those working in applied research.",
author = "Lee, {Loong Chuen} and Liong, {Choong Yeun} and Jemain, {Abdul Aziz}",
year = "2018",
month = "8",
day = "7",
doi = "10.1039/c8an00599k",
language = "English",
volume = "143",
pages = "3526--3539",
journal = "The Analyst",
issn = "0003-2654",
publisher = "Royal Society of Chemistry",
number = "15",

}

TY - JOUR

T1 - Partial least squares-discriminant analysis (PLS-DA) for classification of high-dimensional (HD) data

T2 - A review of contemporary practice strategies and knowledge gaps

AU - Lee, Loong Chuen

AU - Liong, Choong Yeun

AU - Jemain, Abdul Aziz

PY - 2018/8/7

Y1 - 2018/8/7

N2 - Partial least squares-discriminant analysis (PLS-DA) is a versatile algorithm that can be used for predictive and descriptive modelling as well as for discriminative variable selection. However, versatility is both a blessing and a curse and the user needs to optimize a wealth of parameters before reaching reliable and valid outcomes. Over the past two decades, PLS-DA has demonstrated great success in modelling high-dimensional datasets for diverse purposes, e.g. product authentication in food analysis, diseases classification in medical diagnosis, and evidence analysis in forensic science. Despite that, in practice, many users have yet to grasp the essence of constructing a valid and reliable PLS-DA model. As the technology progresses, across every discipline, datasets are evolving into a more complex form, i.e. multi-class, imbalanced and colossal. Indeed, the community is welcoming a new era called big data. In this context, the aim of the article is two-fold: (a) to review, outline and describe the contemporary PLS-DA modelling practice strategies, and (b) to critically discuss the respective knowledge gaps that have emerged in response to the present big data era. This work could complement other available reviews or tutorials on PLS-DA, to provide a timely and user-friendly guide to researchers, especially those working in applied research.

AB - Partial least squares-discriminant analysis (PLS-DA) is a versatile algorithm that can be used for predictive and descriptive modelling as well as for discriminative variable selection. However, versatility is both a blessing and a curse and the user needs to optimize a wealth of parameters before reaching reliable and valid outcomes. Over the past two decades, PLS-DA has demonstrated great success in modelling high-dimensional datasets for diverse purposes, e.g. product authentication in food analysis, diseases classification in medical diagnosis, and evidence analysis in forensic science. Despite that, in practice, many users have yet to grasp the essence of constructing a valid and reliable PLS-DA model. As the technology progresses, across every discipline, datasets are evolving into a more complex form, i.e. multi-class, imbalanced and colossal. Indeed, the community is welcoming a new era called big data. In this context, the aim of the article is two-fold: (a) to review, outline and describe the contemporary PLS-DA modelling practice strategies, and (b) to critically discuss the respective knowledge gaps that have emerged in response to the present big data era. This work could complement other available reviews or tutorials on PLS-DA, to provide a timely and user-friendly guide to researchers, especially those working in applied research.

UR - http://www.scopus.com/inward/record.url?scp=85050748635&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85050748635&partnerID=8YFLogxK

U2 - 10.1039/c8an00599k

DO - 10.1039/c8an00599k

M3 - Review article

AN - SCOPUS:85050748635

VL - 143

SP - 3526

EP - 3539

JO - The Analyst

JF - The Analyst

SN - 0003-2654

IS - 15

ER -