Q-mode versus R-mode principal component analysis for linear discriminant analysis (LDA)

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Many literature apply Principal Component Analysis (PCA) as either preliminary visualization or variable con-struction methods or both. Focus of PCA can be on the samples (R-mode PCA) or variables (Q-mode PCA). Traditionally, R-mode PCA has been the usual approach to reduce high-dimensionality data before the application of Linear Discriminant Analysis (LDA), to solve classification problems. Output from PCA composed of two new matrices known as loadings and scores matrices. Each matrix can then be used to produce a plot, i.e. loadings plot aids identification of important variables whereas scores plot presents spatial distribution of samples on new axes that are also known as Principal Components (PCs). Fundamentally, the scores matrix always be the input variables for building classification model. A recent paper uses Q-mode PCA but the focus of analysis was not on the variables but instead on the samples. As a result, the authors have exchanged the use of both loadings and scores plots in which clustering of samples was studied using loadings plot whereas scores plot has been used to identify important manifest variables. Therefore, the aim of this study is to statistically validate the proposed practice. Evaluation is based on performance of external error obtained from LDA models according to number of PCs. On top of that, bootstrapping was also conducted to evaluate the external error of each of the LDA models. Results show that LDA models produced by PCs from R-mode PCA give logical performance and the matched external error are also unbiased whereas the ones produced with Q-mode PCA show the opposites. With that, we concluded that PCs produced from Q-mode is not statistically stable and thus should not be applied to problems of classifying samples, but variables. We hope this paper will provide some insights on the disputable issues.

Original languageEnglish
Title of host publication3rd ISM International Statistical Conference 2016, ISM 2016
Subtitle of host publicationBringing Professionalism and Prestige in Statistics
PublisherAmerican Institute of Physics Inc.
Volume1842
ISBN (Electronic)9780735415126
DOIs
Publication statusPublished - 12 May 2017
Event3rd ISM International Statistical Conference 2016: Bringing Professionalism and Prestige in Statistics, ISM 2016 - Kuala Lumpur, Malaysia
Duration: 9 Aug 201611 Aug 2016

Other

Other3rd ISM International Statistical Conference 2016: Bringing Professionalism and Prestige in Statistics, ISM 2016
CountryMalaysia
CityKuala Lumpur
Period9/8/1611/8/16

Fingerprint

principal components analysis
plots
matrices
classifying
spatial distribution
evaluation
output

Keywords

  • Forensic paper analysis
  • IR spectrum
  • linear discriminant analysis (LDA)
  • principal component analysis (PCA)

ASJC Scopus subject areas

  • Physics and Astronomy(all)

Cite this

Lee, L. C., Liong, C. Y., & Jemain, A. A. (2017). Q-mode versus R-mode principal component analysis for linear discriminant analysis (LDA). In 3rd ISM International Statistical Conference 2016, ISM 2016: Bringing Professionalism and Prestige in Statistics (Vol. 1842). [030024] American Institute of Physics Inc.. https://doi.org/10.1063/1.4982862

Q-mode versus R-mode principal component analysis for linear discriminant analysis (LDA). / Lee, Loong Chuen; Liong, Choong Yeun; Jemain, Abdul Aziz.

3rd ISM International Statistical Conference 2016, ISM 2016: Bringing Professionalism and Prestige in Statistics. Vol. 1842 American Institute of Physics Inc., 2017. 030024.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lee, LC, Liong, CY & Jemain, AA 2017, Q-mode versus R-mode principal component analysis for linear discriminant analysis (LDA). in 3rd ISM International Statistical Conference 2016, ISM 2016: Bringing Professionalism and Prestige in Statistics. vol. 1842, 030024, American Institute of Physics Inc., 3rd ISM International Statistical Conference 2016: Bringing Professionalism and Prestige in Statistics, ISM 2016, Kuala Lumpur, Malaysia, 9/8/16. https://doi.org/10.1063/1.4982862
Lee LC, Liong CY, Jemain AA. Q-mode versus R-mode principal component analysis for linear discriminant analysis (LDA). In 3rd ISM International Statistical Conference 2016, ISM 2016: Bringing Professionalism and Prestige in Statistics. Vol. 1842. American Institute of Physics Inc. 2017. 030024 https://doi.org/10.1063/1.4982862
Lee, Loong Chuen ; Liong, Choong Yeun ; Jemain, Abdul Aziz. / Q-mode versus R-mode principal component analysis for linear discriminant analysis (LDA). 3rd ISM International Statistical Conference 2016, ISM 2016: Bringing Professionalism and Prestige in Statistics. Vol. 1842 American Institute of Physics Inc., 2017.
@inproceedings{89e8225a903843b98682cce00fbccab9,
title = "Q-mode versus R-mode principal component analysis for linear discriminant analysis (LDA)",
abstract = "Many literature apply Principal Component Analysis (PCA) as either preliminary visualization or variable con-struction methods or both. Focus of PCA can be on the samples (R-mode PCA) or variables (Q-mode PCA). Traditionally, R-mode PCA has been the usual approach to reduce high-dimensionality data before the application of Linear Discriminant Analysis (LDA), to solve classification problems. Output from PCA composed of two new matrices known as loadings and scores matrices. Each matrix can then be used to produce a plot, i.e. loadings plot aids identification of important variables whereas scores plot presents spatial distribution of samples on new axes that are also known as Principal Components (PCs). Fundamentally, the scores matrix always be the input variables for building classification model. A recent paper uses Q-mode PCA but the focus of analysis was not on the variables but instead on the samples. As a result, the authors have exchanged the use of both loadings and scores plots in which clustering of samples was studied using loadings plot whereas scores plot has been used to identify important manifest variables. Therefore, the aim of this study is to statistically validate the proposed practice. Evaluation is based on performance of external error obtained from LDA models according to number of PCs. On top of that, bootstrapping was also conducted to evaluate the external error of each of the LDA models. Results show that LDA models produced by PCs from R-mode PCA give logical performance and the matched external error are also unbiased whereas the ones produced with Q-mode PCA show the opposites. With that, we concluded that PCs produced from Q-mode is not statistically stable and thus should not be applied to problems of classifying samples, but variables. We hope this paper will provide some insights on the disputable issues.",
keywords = "Forensic paper analysis, IR spectrum, linear discriminant analysis (LDA), principal component analysis (PCA)",
author = "Lee, {Loong Chuen} and Liong, {Choong Yeun} and Jemain, {Abdul Aziz}",
year = "2017",
month = "5",
day = "12",
doi = "10.1063/1.4982862",
language = "English",
volume = "1842",
booktitle = "3rd ISM International Statistical Conference 2016, ISM 2016",
publisher = "American Institute of Physics Inc.",

}

TY - GEN

T1 - Q-mode versus R-mode principal component analysis for linear discriminant analysis (LDA)

AU - Lee, Loong Chuen

AU - Liong, Choong Yeun

AU - Jemain, Abdul Aziz

PY - 2017/5/12

Y1 - 2017/5/12

N2 - Many literature apply Principal Component Analysis (PCA) as either preliminary visualization or variable con-struction methods or both. Focus of PCA can be on the samples (R-mode PCA) or variables (Q-mode PCA). Traditionally, R-mode PCA has been the usual approach to reduce high-dimensionality data before the application of Linear Discriminant Analysis (LDA), to solve classification problems. Output from PCA composed of two new matrices known as loadings and scores matrices. Each matrix can then be used to produce a plot, i.e. loadings plot aids identification of important variables whereas scores plot presents spatial distribution of samples on new axes that are also known as Principal Components (PCs). Fundamentally, the scores matrix always be the input variables for building classification model. A recent paper uses Q-mode PCA but the focus of analysis was not on the variables but instead on the samples. As a result, the authors have exchanged the use of both loadings and scores plots in which clustering of samples was studied using loadings plot whereas scores plot has been used to identify important manifest variables. Therefore, the aim of this study is to statistically validate the proposed practice. Evaluation is based on performance of external error obtained from LDA models according to number of PCs. On top of that, bootstrapping was also conducted to evaluate the external error of each of the LDA models. Results show that LDA models produced by PCs from R-mode PCA give logical performance and the matched external error are also unbiased whereas the ones produced with Q-mode PCA show the opposites. With that, we concluded that PCs produced from Q-mode is not statistically stable and thus should not be applied to problems of classifying samples, but variables. We hope this paper will provide some insights on the disputable issues.

AB - Many literature apply Principal Component Analysis (PCA) as either preliminary visualization or variable con-struction methods or both. Focus of PCA can be on the samples (R-mode PCA) or variables (Q-mode PCA). Traditionally, R-mode PCA has been the usual approach to reduce high-dimensionality data before the application of Linear Discriminant Analysis (LDA), to solve classification problems. Output from PCA composed of two new matrices known as loadings and scores matrices. Each matrix can then be used to produce a plot, i.e. loadings plot aids identification of important variables whereas scores plot presents spatial distribution of samples on new axes that are also known as Principal Components (PCs). Fundamentally, the scores matrix always be the input variables for building classification model. A recent paper uses Q-mode PCA but the focus of analysis was not on the variables but instead on the samples. As a result, the authors have exchanged the use of both loadings and scores plots in which clustering of samples was studied using loadings plot whereas scores plot has been used to identify important manifest variables. Therefore, the aim of this study is to statistically validate the proposed practice. Evaluation is based on performance of external error obtained from LDA models according to number of PCs. On top of that, bootstrapping was also conducted to evaluate the external error of each of the LDA models. Results show that LDA models produced by PCs from R-mode PCA give logical performance and the matched external error are also unbiased whereas the ones produced with Q-mode PCA show the opposites. With that, we concluded that PCs produced from Q-mode is not statistically stable and thus should not be applied to problems of classifying samples, but variables. We hope this paper will provide some insights on the disputable issues.

KW - Forensic paper analysis

KW - IR spectrum

KW - linear discriminant analysis (LDA)

KW - principal component analysis (PCA)

UR - http://www.scopus.com/inward/record.url?scp=85019710868&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85019710868&partnerID=8YFLogxK

U2 - 10.1063/1.4982862

DO - 10.1063/1.4982862

M3 - Conference contribution

VL - 1842

BT - 3rd ISM International Statistical Conference 2016, ISM 2016

PB - American Institute of Physics Inc.

ER -