Support Vector Machines

Maximum margin classifiers Also known as optimal separating hyperplane Margin is the distance between hyperplane and closest training data point We want to select a hyperplane for which this distance is maximum Once we identify optimal separating hyper plane there can be many equidistance training points with the shortest distance from hyperplane Such point are…


Comparing LDA and LR

  Few Points: LR model probability with logistic function LDA models probability with multivariate gaussian function LR find maximum likelihood solution LDA find maximum a posterior using bayes' formula When classes are well separated When the classes are well-separated, the parameter estimates for logistic regression are surprisingly unstable. Coefficients may go to infinity. LDA doesn't…



Linear Discriminant Analysis (LDA) In LR, we estimate the posterior probability directly. In LDA we estimate likelihood and then use Bayes theorem. Calculating posterior using bayes theorem is easy in case of classification because hypothesis space is limited. Equation 4 is derived from equation 3 only. Probability(k) would be highest for the class for which…


Classification – One vs Rest and One vs One

  In the blog post on Cost Function And Hypothesis for LR we noted that LR (Logistic Regression) inherently models binary classification. Here we will describe two approaches used to extend it for multiclass classification. One vs Rest approach takes one class as positive and rest all as negative and trains the classifier. So for the data having…


Probability Distribution

We have learned various probability distribution during high school and  engineering courses. However at times we forget them, so here I am providing simple practical scenarios for each distribution with no theories involved. Bernoulli Distribution When the random variable has just two outcomes Probability of Drug/Medicine will be approved by government is p = 0.65…