Comparison of linear discriminant analysis and logistic regression for data classification

Choong Yeun Liong, Sin Fan Foo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

Linear discriminant analysis (LDA) and logistic regression (LR) are often used for the purpose of classifying populations or groups using a set of predictor variables. Assumptions of multivariate normality and equal variance-covariance matrices across groups are required before proceeding with LDA, but such assumptions are not required for LR and hence LR is considered to be much more robust than LDA. In this paper, several real datasets which are different in terms of normality, number of independent variables and sample size are used to study the performance of both methods. The methods are compared based on the percentage of correct classification and B index. The results show that overall, LR performs better regardless of the distribution of the data is normal or nonnormal. However, LR needs longer computing time than LDA with the increase in sample size. The performance of LDA was also tested by using various prior probabilities. The results show that the average percentage of correct classification and the B index are higher when the prior probability is set based on the group size rather than using equal probabilities for all groups.

Original languageEnglish
Title of host publicationAIP Conference Proceedings
Pages1159-1165
Number of pages7
Volume1522
DOIs
Publication statusPublished - 2013
Event20th National Symposium on Mathematical Sciences - Research in Mathematical Sciences: A Catalyst for Creativity and Innovation, SKSM 2012 - Putrajaya
Duration: 18 Dec 201220 Dec 2012

Other

Other20th National Symposium on Mathematical Sciences - Research in Mathematical Sciences: A Catalyst for Creativity and Innovation, SKSM 2012
CityPutrajaya
Period18/12/1220/12/12

Fingerprint

logistics
regression analysis
normality
classifying
predictions

Keywords

  • Linear discriminant analysis
  • Logistic regression
  • Multivariate normality
  • Prior probability
  • Sample size

ASJC Scopus subject areas

  • Physics and Astronomy(all)

Cite this

Comparison of linear discriminant analysis and logistic regression for data classification. / Liong, Choong Yeun; Foo, Sin Fan.

AIP Conference Proceedings. Vol. 1522 2013. p. 1159-1165.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Liong, CY & Foo, SF 2013, Comparison of linear discriminant analysis and logistic regression for data classification. in AIP Conference Proceedings. vol. 1522, pp. 1159-1165, 20th National Symposium on Mathematical Sciences - Research in Mathematical Sciences: A Catalyst for Creativity and Innovation, SKSM 2012, Putrajaya, 18/12/12. https://doi.org/10.1063/1.4801262
Liong, Choong Yeun ; Foo, Sin Fan. / Comparison of linear discriminant analysis and logistic regression for data classification. AIP Conference Proceedings. Vol. 1522 2013. pp. 1159-1165
@inproceedings{ba0c209a9e0b4b6dbada5a60cc6b2a2a,
title = "Comparison of linear discriminant analysis and logistic regression for data classification",
abstract = "Linear discriminant analysis (LDA) and logistic regression (LR) are often used for the purpose of classifying populations or groups using a set of predictor variables. Assumptions of multivariate normality and equal variance-covariance matrices across groups are required before proceeding with LDA, but such assumptions are not required for LR and hence LR is considered to be much more robust than LDA. In this paper, several real datasets which are different in terms of normality, number of independent variables and sample size are used to study the performance of both methods. The methods are compared based on the percentage of correct classification and B index. The results show that overall, LR performs better regardless of the distribution of the data is normal or nonnormal. However, LR needs longer computing time than LDA with the increase in sample size. The performance of LDA was also tested by using various prior probabilities. The results show that the average percentage of correct classification and the B index are higher when the prior probability is set based on the group size rather than using equal probabilities for all groups.",
keywords = "Linear discriminant analysis, Logistic regression, Multivariate normality, Prior probability, Sample size",
author = "Liong, {Choong Yeun} and Foo, {Sin Fan}",
year = "2013",
doi = "10.1063/1.4801262",
language = "English",
isbn = "9780735411500",
volume = "1522",
pages = "1159--1165",
booktitle = "AIP Conference Proceedings",

}

TY - GEN

T1 - Comparison of linear discriminant analysis and logistic regression for data classification

AU - Liong, Choong Yeun

AU - Foo, Sin Fan

PY - 2013

Y1 - 2013

N2 - Linear discriminant analysis (LDA) and logistic regression (LR) are often used for the purpose of classifying populations or groups using a set of predictor variables. Assumptions of multivariate normality and equal variance-covariance matrices across groups are required before proceeding with LDA, but such assumptions are not required for LR and hence LR is considered to be much more robust than LDA. In this paper, several real datasets which are different in terms of normality, number of independent variables and sample size are used to study the performance of both methods. The methods are compared based on the percentage of correct classification and B index. The results show that overall, LR performs better regardless of the distribution of the data is normal or nonnormal. However, LR needs longer computing time than LDA with the increase in sample size. The performance of LDA was also tested by using various prior probabilities. The results show that the average percentage of correct classification and the B index are higher when the prior probability is set based on the group size rather than using equal probabilities for all groups.

AB - Linear discriminant analysis (LDA) and logistic regression (LR) are often used for the purpose of classifying populations or groups using a set of predictor variables. Assumptions of multivariate normality and equal variance-covariance matrices across groups are required before proceeding with LDA, but such assumptions are not required for LR and hence LR is considered to be much more robust than LDA. In this paper, several real datasets which are different in terms of normality, number of independent variables and sample size are used to study the performance of both methods. The methods are compared based on the percentage of correct classification and B index. The results show that overall, LR performs better regardless of the distribution of the data is normal or nonnormal. However, LR needs longer computing time than LDA with the increase in sample size. The performance of LDA was also tested by using various prior probabilities. The results show that the average percentage of correct classification and the B index are higher when the prior probability is set based on the group size rather than using equal probabilities for all groups.

KW - Linear discriminant analysis

KW - Logistic regression

KW - Multivariate normality

KW - Prior probability

KW - Sample size

UR - http://www.scopus.com/inward/record.url?scp=84876916563&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84876916563&partnerID=8YFLogxK

U2 - 10.1063/1.4801262

DO - 10.1063/1.4801262

M3 - Conference contribution

SN - 9780735411500

VL - 1522

SP - 1159

EP - 1165

BT - AIP Conference Proceedings

ER -