Linear Discriminant Analysis (LDA)
In LR, we estimate the posterior probability directly. In LDA we estimate likelihood and then use Bayes theorem. Calculating posterior using bayes theorem is easy in case of classification because hypothesis space is limited.
Equation 4 is derived from equation 3 only. Probability(k) would be highest for the class for which Delta(k) will be highest.
LDA estimates mean and variance from data and uses equation 4 for classification.
- f(x) is normal
- Variance(sigma) is same for all classes
When more than one predictor, we go for multivariate gaussian
Quadratic Descriminant Analysis (QDA)
Unlike LDA, QDA assumes that each class has its own covariance matrix. It is called quadratic because below function is quadratic of x.
When to use LDA, QDA
- This is related to bias variance trade-off
- For p predict and k classes
- LDA estimates k*p parameters
- QDA estimates additional k*p*(p+1)/2 parameters
- So LDA has much lower variance and classifier built can suffer from high bias
- LDA should be used when number of training sample are less, because we want to avoid high variance problem
- QDA has high variance, so it should be used when number of training samples are more
- Another scenario would the case when common covariance matrix among K classes is untenable
A note on Fisher’s Linear Discriminant Analysis
- It is simply LDA in case of two classes.
- We can derive this similarity mathematically.
- In literature we found it from the perspective that it project data on a line which achieves maximum separation
- We can state without loss of generality that LDA also provides low dimensional view on data
- We want to project 2-D data on a line which
- maximizes the difference between projected mean
- minimizes within class variance
- Such a direction can be found by maximizing fisher criterion (J)