Comparing LDA and LR


Few Points:

  • LR model probability with logistic function
  • LDA models probability with multivariate Gaussian function
  • LR find maximum likelihood solution
  • LDA find maximum a posterior using Bayes’ formula

When classes are well separated

When the classes are well-separated, the parameter estimates for logistic regression are surprisingly unstable. Coefficients may go to infinity. LDA doesn’t suffer from this problem.

LR gets unstable in the case of perfect separation

If there are covariate values that can predict the binary outcome perfectly then the algorithm of logistic regression, i.e. Fisher scoring, does not even converge.

If you are using R or SAS you will get a warning that probabilities of zero and one were computed and that the algorithm has crashed.

This is the extreme case of perfect separation but even if the data are only separated to a great degree and not perfectly, the maximum likelihood estimator might not exist and even if it does exist, the estimates are not reliable.

The resulting fit is not good at all.

Math behind LR


For example suppose y = 0 for x=0 and y=1 for x = 1. To maximize the likelihood of the observed data, the “S”-shaped logistic regression curve has to model h(Θ) as 0 and 1. This will lead β to reach infinite, which causes the instability.


Few terms:

Complete Separation – when x completely predicts both zero and 1

Quasi-Complete separation – when x completely predicts either 0 or 1

When can LDA fail

It can fail if either the between or within covariance matrix(Sigma) is singular but that is a rather rare instance.


In fact, If there is complete or quasi-complete separation then all the better because the discriminant is more likely to be successful.


LDA is popular when we have more than two response classes, because it also provides low-dimensional views of the data.

In the post on LDA, QDA we had said that LDA is generalization of Fisher’s discriminant analysis (which involves project data on lower dimension to that achieves maximum separation).


LDA may result in information loss

The low-dimensional representation has a problem that it can result in loss of information. This is less of a problem when the data are linearly separable but if they are not the loss of information might be substantial and the classifier will perform poorly

Another assumption of LDA that it assumes equal covariance matrix for all classes, in which case we might go for QDA. Blog post on LDA, QDA list more consideration about the same.