### Abstract

Many literature apply Principal Component Analysis (PCA) as either preliminary visualization or variable con-struction methods or both. Focus of PCA can be on the samples (R-mode PCA) or variables (Q-mode PCA). Traditionally, R-mode PCA has been the usual approach to reduce high-dimensionality data before the application of Linear Discriminant Analysis (LDA), to solve classification problems. Output from PCA composed of two new matrices known as loadings and scores matrices. Each matrix can then be used to produce a plot, i.e. loadings plot aids identification of important variables whereas scores plot presents spatial distribution of samples on new axes that are also known as Principal Components (PCs). Fundamentally, the scores matrix always be the input variables for building classification model. A recent paper uses Q-mode PCA but the focus of analysis was not on the variables but instead on the samples. As a result, the authors have exchanged the use of both loadings and scores plots in which clustering of samples was studied using loadings plot whereas scores plot has been used to identify important manifest variables. Therefore, the aim of this study is to statistically validate the proposed practice. Evaluation is based on performance of external error obtained from LDA models according to number of PCs. On top of that, bootstrapping was also conducted to evaluate the external error of each of the LDA models. Results show that LDA models produced by PCs from R-mode PCA give logical performance and the matched external error are also unbiased whereas the ones produced with Q-mode PCA show the opposites. With that, we concluded that PCs produced from Q-mode is not statistically stable and thus should not be applied to problems of classifying samples, but variables. We hope this paper will provide some insights on the disputable issues.

Original language | English |
---|---|

Title of host publication | 3rd ISM International Statistical Conference 2016, ISM 2016 |

Subtitle of host publication | Bringing Professionalism and Prestige in Statistics |

Publisher | American Institute of Physics Inc. |

Volume | 1842 |

ISBN (Electronic) | 9780735415126 |

DOIs | |

Publication status | Published - 12 May 2017 |

Event | 3rd ISM International Statistical Conference 2016: Bringing Professionalism and Prestige in Statistics, ISM 2016 - Kuala Lumpur, Malaysia Duration: 9 Aug 2016 → 11 Aug 2016 |

### Other

Other | 3rd ISM International Statistical Conference 2016: Bringing Professionalism and Prestige in Statistics, ISM 2016 |
---|---|

Country | Malaysia |

City | Kuala Lumpur |

Period | 9/8/16 → 11/8/16 |

### Fingerprint

### Keywords

- Forensic paper analysis
- IR spectrum
- linear discriminant analysis (LDA)
- principal component analysis (PCA)

### ASJC Scopus subject areas

- Physics and Astronomy(all)

### Cite this

*3rd ISM International Statistical Conference 2016, ISM 2016: Bringing Professionalism and Prestige in Statistics*(Vol. 1842). [030024] American Institute of Physics Inc.. https://doi.org/10.1063/1.4982862

**Q-mode versus R-mode principal component analysis for linear discriminant analysis (LDA).** / Lee, Loong Chuen; Liong, Choong Yeun; Jemain, Abdul Aziz.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*3rd ISM International Statistical Conference 2016, ISM 2016: Bringing Professionalism and Prestige in Statistics.*vol. 1842, 030024, American Institute of Physics Inc., 3rd ISM International Statistical Conference 2016: Bringing Professionalism and Prestige in Statistics, ISM 2016, Kuala Lumpur, Malaysia, 9/8/16. https://doi.org/10.1063/1.4982862

}

TY - GEN

T1 - Q-mode versus R-mode principal component analysis for linear discriminant analysis (LDA)

AU - Lee, Loong Chuen

AU - Liong, Choong Yeun

AU - Jemain, Abdul Aziz

PY - 2017/5/12

Y1 - 2017/5/12

N2 - Many literature apply Principal Component Analysis (PCA) as either preliminary visualization or variable con-struction methods or both. Focus of PCA can be on the samples (R-mode PCA) or variables (Q-mode PCA). Traditionally, R-mode PCA has been the usual approach to reduce high-dimensionality data before the application of Linear Discriminant Analysis (LDA), to solve classification problems. Output from PCA composed of two new matrices known as loadings and scores matrices. Each matrix can then be used to produce a plot, i.e. loadings plot aids identification of important variables whereas scores plot presents spatial distribution of samples on new axes that are also known as Principal Components (PCs). Fundamentally, the scores matrix always be the input variables for building classification model. A recent paper uses Q-mode PCA but the focus of analysis was not on the variables but instead on the samples. As a result, the authors have exchanged the use of both loadings and scores plots in which clustering of samples was studied using loadings plot whereas scores plot has been used to identify important manifest variables. Therefore, the aim of this study is to statistically validate the proposed practice. Evaluation is based on performance of external error obtained from LDA models according to number of PCs. On top of that, bootstrapping was also conducted to evaluate the external error of each of the LDA models. Results show that LDA models produced by PCs from R-mode PCA give logical performance and the matched external error are also unbiased whereas the ones produced with Q-mode PCA show the opposites. With that, we concluded that PCs produced from Q-mode is not statistically stable and thus should not be applied to problems of classifying samples, but variables. We hope this paper will provide some insights on the disputable issues.

AB - Many literature apply Principal Component Analysis (PCA) as either preliminary visualization or variable con-struction methods or both. Focus of PCA can be on the samples (R-mode PCA) or variables (Q-mode PCA). Traditionally, R-mode PCA has been the usual approach to reduce high-dimensionality data before the application of Linear Discriminant Analysis (LDA), to solve classification problems. Output from PCA composed of two new matrices known as loadings and scores matrices. Each matrix can then be used to produce a plot, i.e. loadings plot aids identification of important variables whereas scores plot presents spatial distribution of samples on new axes that are also known as Principal Components (PCs). Fundamentally, the scores matrix always be the input variables for building classification model. A recent paper uses Q-mode PCA but the focus of analysis was not on the variables but instead on the samples. As a result, the authors have exchanged the use of both loadings and scores plots in which clustering of samples was studied using loadings plot whereas scores plot has been used to identify important manifest variables. Therefore, the aim of this study is to statistically validate the proposed practice. Evaluation is based on performance of external error obtained from LDA models according to number of PCs. On top of that, bootstrapping was also conducted to evaluate the external error of each of the LDA models. Results show that LDA models produced by PCs from R-mode PCA give logical performance and the matched external error are also unbiased whereas the ones produced with Q-mode PCA show the opposites. With that, we concluded that PCs produced from Q-mode is not statistically stable and thus should not be applied to problems of classifying samples, but variables. We hope this paper will provide some insights on the disputable issues.

KW - Forensic paper analysis

KW - IR spectrum

KW - linear discriminant analysis (LDA)

KW - principal component analysis (PCA)

UR - http://www.scopus.com/inward/record.url?scp=85019710868&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85019710868&partnerID=8YFLogxK

U2 - 10.1063/1.4982862

DO - 10.1063/1.4982862

M3 - Conference contribution

VL - 1842

BT - 3rd ISM International Statistical Conference 2016, ISM 2016

PB - American Institute of Physics Inc.

ER -