Statistical comparison of decision rules in PLS2-DA prediction model for classification of blue gel pen inks according to pen brand and pen model

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Partial least squares-discriminant analysis (PLS-DA) is a favored modeling tool for high-dimensional data, e.g. infrared and Raman spectra. Prior to prediction, one needs to pre-select one decision rule (DR) for the PLS-DA model. The purpose of this work is to evaluate statistical differences of four novel DRs and the naïve DR in PLS2-DA prediction models for classification of blue gel pen inks based on attenuated total reflectance-Fourier transform infrared (ATR-FTIR) spectra. The performances of the DRs have been estimated using forty sub-datasets that were prepared from the principal ATR-FTIR spectra. The global region together with three mutually exclusive spectral windows were preprocessed independently via nine different data preprocessing methods (i.e. mean centering, autoscaling, Pareto scaling, robust scaling, multiplicative scatter correction, normalization to sum, normalization to constant vector length, standard normal variate and asymmetric least squares). Then a series of 50 models (i.e. by including the first 50 PLS components incrementally) was constructed using the forty sub-datasets independently. Each model was evaluated using six different variants of v-fold cross-validation and external testing; and selected models were also assessed using iterative random sampling approach. Eventually, each DR was presented with a number of model estimates that describe the model accuracy and stability. Performances of the DRs were discussed according to summary statistics and ANOVA tests that were derived from the model estimates. Results show that the novel DRs are more accurate but relatively less stable than the naïve DR. The differences are statistically significant at the 0.05 significance level. In conclusion, both novel and naïve DRs are competitive with each other; and one has to decide on a trade-off between model stability and model accuracy.

Original languageEnglish
Pages (from-to)94-101
Number of pages8
JournalChemometrics and Intelligent Laboratory Systems
Volume184
DOIs
Publication statusPublished - 15 Jan 2019

Fingerprint

Ink
Gels
Discriminant analysis
Infrared radiation
Fourier transforms
Analysis of variance (ANOVA)
Raman scattering
Statistics
Sampling
Testing

Keywords

  • Accuracy
  • Decision rules
  • Forensic science
  • IR spectrum
  • PLS-DA
  • Stability

ASJC Scopus subject areas

  • Analytical Chemistry
  • Software
  • Process Chemistry and Technology
  • Spectroscopy
  • Computer Science Applications

Cite this

@article{7f24a1b53288485ab3d26e73a6996d3e,
title = "Statistical comparison of decision rules in PLS2-DA prediction model for classification of blue gel pen inks according to pen brand and pen model",
abstract = "Partial least squares-discriminant analysis (PLS-DA) is a favored modeling tool for high-dimensional data, e.g. infrared and Raman spectra. Prior to prediction, one needs to pre-select one decision rule (DR) for the PLS-DA model. The purpose of this work is to evaluate statistical differences of four novel DRs and the na{\"i}ve DR in PLS2-DA prediction models for classification of blue gel pen inks based on attenuated total reflectance-Fourier transform infrared (ATR-FTIR) spectra. The performances of the DRs have been estimated using forty sub-datasets that were prepared from the principal ATR-FTIR spectra. The global region together with three mutually exclusive spectral windows were preprocessed independently via nine different data preprocessing methods (i.e. mean centering, autoscaling, Pareto scaling, robust scaling, multiplicative scatter correction, normalization to sum, normalization to constant vector length, standard normal variate and asymmetric least squares). Then a series of 50 models (i.e. by including the first 50 PLS components incrementally) was constructed using the forty sub-datasets independently. Each model was evaluated using six different variants of v-fold cross-validation and external testing; and selected models were also assessed using iterative random sampling approach. Eventually, each DR was presented with a number of model estimates that describe the model accuracy and stability. Performances of the DRs were discussed according to summary statistics and ANOVA tests that were derived from the model estimates. Results show that the novel DRs are more accurate but relatively less stable than the na{\"i}ve DR. The differences are statistically significant at the 0.05 significance level. In conclusion, both novel and na{\"i}ve DRs are competitive with each other; and one has to decide on a trade-off between model stability and model accuracy.",
keywords = "Accuracy, Decision rules, Forensic science, IR spectrum, PLS-DA, Stability",
author = "Lee, {Loong Chuen} and Liong, {Choong Yeun} and Jemain, {Abdul Aziz}",
year = "2019",
month = "1",
day = "15",
doi = "10.1016/j.chemolab.2018.11.014",
language = "English",
volume = "184",
pages = "94--101",
journal = "Chemometrics and Intelligent Laboratory Systems",
issn = "0169-7439",
publisher = "Elsevier",

}

TY - JOUR

T1 - Statistical comparison of decision rules in PLS2-DA prediction model for classification of blue gel pen inks according to pen brand and pen model

AU - Lee, Loong Chuen

AU - Liong, Choong Yeun

AU - Jemain, Abdul Aziz

PY - 2019/1/15

Y1 - 2019/1/15

N2 - Partial least squares-discriminant analysis (PLS-DA) is a favored modeling tool for high-dimensional data, e.g. infrared and Raman spectra. Prior to prediction, one needs to pre-select one decision rule (DR) for the PLS-DA model. The purpose of this work is to evaluate statistical differences of four novel DRs and the naïve DR in PLS2-DA prediction models for classification of blue gel pen inks based on attenuated total reflectance-Fourier transform infrared (ATR-FTIR) spectra. The performances of the DRs have been estimated using forty sub-datasets that were prepared from the principal ATR-FTIR spectra. The global region together with three mutually exclusive spectral windows were preprocessed independently via nine different data preprocessing methods (i.e. mean centering, autoscaling, Pareto scaling, robust scaling, multiplicative scatter correction, normalization to sum, normalization to constant vector length, standard normal variate and asymmetric least squares). Then a series of 50 models (i.e. by including the first 50 PLS components incrementally) was constructed using the forty sub-datasets independently. Each model was evaluated using six different variants of v-fold cross-validation and external testing; and selected models were also assessed using iterative random sampling approach. Eventually, each DR was presented with a number of model estimates that describe the model accuracy and stability. Performances of the DRs were discussed according to summary statistics and ANOVA tests that were derived from the model estimates. Results show that the novel DRs are more accurate but relatively less stable than the naïve DR. The differences are statistically significant at the 0.05 significance level. In conclusion, both novel and naïve DRs are competitive with each other; and one has to decide on a trade-off between model stability and model accuracy.

AB - Partial least squares-discriminant analysis (PLS-DA) is a favored modeling tool for high-dimensional data, e.g. infrared and Raman spectra. Prior to prediction, one needs to pre-select one decision rule (DR) for the PLS-DA model. The purpose of this work is to evaluate statistical differences of four novel DRs and the naïve DR in PLS2-DA prediction models for classification of blue gel pen inks based on attenuated total reflectance-Fourier transform infrared (ATR-FTIR) spectra. The performances of the DRs have been estimated using forty sub-datasets that were prepared from the principal ATR-FTIR spectra. The global region together with three mutually exclusive spectral windows were preprocessed independently via nine different data preprocessing methods (i.e. mean centering, autoscaling, Pareto scaling, robust scaling, multiplicative scatter correction, normalization to sum, normalization to constant vector length, standard normal variate and asymmetric least squares). Then a series of 50 models (i.e. by including the first 50 PLS components incrementally) was constructed using the forty sub-datasets independently. Each model was evaluated using six different variants of v-fold cross-validation and external testing; and selected models were also assessed using iterative random sampling approach. Eventually, each DR was presented with a number of model estimates that describe the model accuracy and stability. Performances of the DRs were discussed according to summary statistics and ANOVA tests that were derived from the model estimates. Results show that the novel DRs are more accurate but relatively less stable than the naïve DR. The differences are statistically significant at the 0.05 significance level. In conclusion, both novel and naïve DRs are competitive with each other; and one has to decide on a trade-off between model stability and model accuracy.

KW - Accuracy

KW - Decision rules

KW - Forensic science

KW - IR spectrum

KW - PLS-DA

KW - Stability

UR - http://www.scopus.com/inward/record.url?scp=85059339036&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85059339036&partnerID=8YFLogxK

U2 - 10.1016/j.chemolab.2018.11.014

DO - 10.1016/j.chemolab.2018.11.014

M3 - Article

VL - 184

SP - 94

EP - 101

JO - Chemometrics and Intelligent Laboratory Systems

JF - Chemometrics and Intelligent Laboratory Systems

SN - 0169-7439

ER -