The effects of column-wise manipulations on accuracy of classical classifiers with high-dimensional spectral data

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Column-wise manipulations (CWM), a group of data pre-processing (DP) techniques composed of mean-centering, Pareto scaling (PS), variance scaling and auto-scaling; are often applied individually or in combination. It has been applied like a norm without thoughtful considerations partly attributed to its simplicity and ease of applications. Theoretically, all variables in IR spectrum are measured on the same scale and seldom have different means and as such rarely require CWM as compared to normalization. This preliminary paper aims to investigate the real needs of each aforementioned CWM in infrared (IR) spectroscopic dataset that is derived from white copy paper. The untreated and pre-processed IR data is then processed with Principal Component Analysis plus Linear Discriminant Analysis (PCA-DA). The impact of CWM on test accuracy of the different PCA-DA models is then compared according to different IR wavenumber intervals. Error of the predictive models is determined via nonparametric bootstrap. Results show that an in-formative spectrum (i.e. highly discriminatory) can, even in its raw form, achieve high classification accuracy if optimum numbers of principal components are included. It is concluded that selection of CWM for IR spectrum depends on its in-herent quality such that a discriminatory IR spectrum might not need any CWM at all.

Original languageEnglish
Title of host publication4th International Conference on Mathematical Sciences - Mathematical Sciences
Subtitle of host publicationChampioning the Way in a Problem Based and Data Driven Society, ICMS 2016
PublisherAmerican Institute of Physics Inc.
Volume1830
ISBN (Electronic)9780735414983
DOIs
Publication statusPublished - 27 Apr 2017
Event4th International Conference on Mathematical Sciences - Mathematical Sciences: Championing the Way in a Problem Based and Data Driven Society, ICMS 2016 - Putrajaya, Malaysia
Duration: 15 Nov 201617 Nov 2016

Other

Other4th International Conference on Mathematical Sciences - Mathematical Sciences: Championing the Way in a Problem Based and Data Driven Society, ICMS 2016
CountryMalaysia
CityPutrajaya
Period15/11/1617/11/16

Fingerprint

classifiers
manipulators
infrared spectra
principal components analysis
scaling
preprocessing
norms
intervals

ASJC Scopus subject areas

  • Physics and Astronomy(all)

Cite this

Lee, L. C., Liong, C. Y., & Jemain, A. A. (2017). The effects of column-wise manipulations on accuracy of classical classifiers with high-dimensional spectral data. In 4th International Conference on Mathematical Sciences - Mathematical Sciences: Championing the Way in a Problem Based and Data Driven Society, ICMS 2016 (Vol. 1830). [080008] American Institute of Physics Inc.. https://doi.org/10.1063/1.4980992

The effects of column-wise manipulations on accuracy of classical classifiers with high-dimensional spectral data. / Lee, Loong Chuen; Liong, Choong Yeun; Jemain, Abdul Aziz.

4th International Conference on Mathematical Sciences - Mathematical Sciences: Championing the Way in a Problem Based and Data Driven Society, ICMS 2016. Vol. 1830 American Institute of Physics Inc., 2017. 080008.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lee, LC, Liong, CY & Jemain, AA 2017, The effects of column-wise manipulations on accuracy of classical classifiers with high-dimensional spectral data. in 4th International Conference on Mathematical Sciences - Mathematical Sciences: Championing the Way in a Problem Based and Data Driven Society, ICMS 2016. vol. 1830, 080008, American Institute of Physics Inc., 4th International Conference on Mathematical Sciences - Mathematical Sciences: Championing the Way in a Problem Based and Data Driven Society, ICMS 2016, Putrajaya, Malaysia, 15/11/16. https://doi.org/10.1063/1.4980992
Lee LC, Liong CY, Jemain AA. The effects of column-wise manipulations on accuracy of classical classifiers with high-dimensional spectral data. In 4th International Conference on Mathematical Sciences - Mathematical Sciences: Championing the Way in a Problem Based and Data Driven Society, ICMS 2016. Vol. 1830. American Institute of Physics Inc. 2017. 080008 https://doi.org/10.1063/1.4980992
Lee, Loong Chuen ; Liong, Choong Yeun ; Jemain, Abdul Aziz. / The effects of column-wise manipulations on accuracy of classical classifiers with high-dimensional spectral data. 4th International Conference on Mathematical Sciences - Mathematical Sciences: Championing the Way in a Problem Based and Data Driven Society, ICMS 2016. Vol. 1830 American Institute of Physics Inc., 2017.
@inproceedings{82d32817f82b46cca10b8db7358c93e3,
title = "The effects of column-wise manipulations on accuracy of classical classifiers with high-dimensional spectral data",
abstract = "Column-wise manipulations (CWM), a group of data pre-processing (DP) techniques composed of mean-centering, Pareto scaling (PS), variance scaling and auto-scaling; are often applied individually or in combination. It has been applied like a norm without thoughtful considerations partly attributed to its simplicity and ease of applications. Theoretically, all variables in IR spectrum are measured on the same scale and seldom have different means and as such rarely require CWM as compared to normalization. This preliminary paper aims to investigate the real needs of each aforementioned CWM in infrared (IR) spectroscopic dataset that is derived from white copy paper. The untreated and pre-processed IR data is then processed with Principal Component Analysis plus Linear Discriminant Analysis (PCA-DA). The impact of CWM on test accuracy of the different PCA-DA models is then compared according to different IR wavenumber intervals. Error of the predictive models is determined via nonparametric bootstrap. Results show that an in-formative spectrum (i.e. highly discriminatory) can, even in its raw form, achieve high classification accuracy if optimum numbers of principal components are included. It is concluded that selection of CWM for IR spectrum depends on its in-herent quality such that a discriminatory IR spectrum might not need any CWM at all.",
author = "Lee, {Loong Chuen} and Liong, {Choong Yeun} and Jemain, {Abdul Aziz}",
year = "2017",
month = "4",
day = "27",
doi = "10.1063/1.4980992",
language = "English",
volume = "1830",
booktitle = "4th International Conference on Mathematical Sciences - Mathematical Sciences",
publisher = "American Institute of Physics Inc.",

}

TY - GEN

T1 - The effects of column-wise manipulations on accuracy of classical classifiers with high-dimensional spectral data

AU - Lee, Loong Chuen

AU - Liong, Choong Yeun

AU - Jemain, Abdul Aziz

PY - 2017/4/27

Y1 - 2017/4/27

N2 - Column-wise manipulations (CWM), a group of data pre-processing (DP) techniques composed of mean-centering, Pareto scaling (PS), variance scaling and auto-scaling; are often applied individually or in combination. It has been applied like a norm without thoughtful considerations partly attributed to its simplicity and ease of applications. Theoretically, all variables in IR spectrum are measured on the same scale and seldom have different means and as such rarely require CWM as compared to normalization. This preliminary paper aims to investigate the real needs of each aforementioned CWM in infrared (IR) spectroscopic dataset that is derived from white copy paper. The untreated and pre-processed IR data is then processed with Principal Component Analysis plus Linear Discriminant Analysis (PCA-DA). The impact of CWM on test accuracy of the different PCA-DA models is then compared according to different IR wavenumber intervals. Error of the predictive models is determined via nonparametric bootstrap. Results show that an in-formative spectrum (i.e. highly discriminatory) can, even in its raw form, achieve high classification accuracy if optimum numbers of principal components are included. It is concluded that selection of CWM for IR spectrum depends on its in-herent quality such that a discriminatory IR spectrum might not need any CWM at all.

AB - Column-wise manipulations (CWM), a group of data pre-processing (DP) techniques composed of mean-centering, Pareto scaling (PS), variance scaling and auto-scaling; are often applied individually or in combination. It has been applied like a norm without thoughtful considerations partly attributed to its simplicity and ease of applications. Theoretically, all variables in IR spectrum are measured on the same scale and seldom have different means and as such rarely require CWM as compared to normalization. This preliminary paper aims to investigate the real needs of each aforementioned CWM in infrared (IR) spectroscopic dataset that is derived from white copy paper. The untreated and pre-processed IR data is then processed with Principal Component Analysis plus Linear Discriminant Analysis (PCA-DA). The impact of CWM on test accuracy of the different PCA-DA models is then compared according to different IR wavenumber intervals. Error of the predictive models is determined via nonparametric bootstrap. Results show that an in-formative spectrum (i.e. highly discriminatory) can, even in its raw form, achieve high classification accuracy if optimum numbers of principal components are included. It is concluded that selection of CWM for IR spectrum depends on its in-herent quality such that a discriminatory IR spectrum might not need any CWM at all.

UR - http://www.scopus.com/inward/record.url?scp=85019458489&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85019458489&partnerID=8YFLogxK

U2 - 10.1063/1.4980992

DO - 10.1063/1.4980992

M3 - Conference contribution

AN - SCOPUS:85019458489

VL - 1830

BT - 4th International Conference on Mathematical Sciences - Mathematical Sciences

PB - American Institute of Physics Inc.

ER -