On Classification Accuracy

Some Scenarios

  • In finance default is failure to meet the legal obligation of loan. Given some data we want to classify whether the person will be defaulter or not.
    • Suppose our training data-set is imbalanced. Out of 10k samples only 300 are defaulters. (3%)
    • Classifier in the following table is good at classifying non defaulters but not good at classifying defaulters (which is more important for credit card company)
    • Assume what if out of 300 defaulters 250 are classified as non defaulters and given the credit card
  • Doctors want to conduct a test whether a patient has cancer or not.
    • Popular terms in medical field are sensitivity and specificity
    • Instead of trying to classify person as defaulter, here we classify if patient has cancer.
    • Sensitivity = 81/333 = 24 %
    • Specificity = 9644/9667 = 99 %
    • Every medical test thrives to achieve 100% in both sensitivity and specificity.
  • In information retriever we want to know how many % of relevant pages we were able to retrieve.
    • TP = 81
    • FP = 23
    • TN = 9644
    • FN = 252
    • Precision = 81/104= 77%
    • Recall = 81/333= sensitivity = 24%



  • Formulas:
    • Precision = TP/(TP+FP)
    • Recall = TP/(TP+FN)
    • Sensitivity = TP/(TP+FN)
    • Specificity = TN/(TN+FP)
    • Recall and sensitivity are same

Solution is to change the threshold

  • Earlier we were assigning person to default if probability is more than 50%
  • Now we want to assign more person as defaulter
  • So we will assign them to defaulter when probability is more than 20%
  • This will incorrectly classify non-defaulters to defaulters but that is less concerned compared to assigning defaulter to non-defaulter
    • This will also increase the overall error rate, which is still okay


  • We can always increase sensitive by classifying all samples as positive
  • We can increase specificity by classifying all samples as negative
  • ROC plot (sensitivity) vs (1-specificity)
    • That is TP vs FP
  • ROC = Reciever operating charateristic
  • It is good to have ROC curve on top left
    • Better classifier
    • Accurate test
  • And ROC curve close to 45 degree represents less accurate test
  • AUC = Area Under Curve
    • Area under ROC curve
  • Ideal value for AUC is 1
  • AUC of 0.5 represents a random classifier


Threshold selection

  • Unless there is special business requirement (as in credit card defaulters) we want to select a threshold which maximizes TP while minimizing FP
  • There are two methods to do that:
    • Point which is closest to (0, 1) in ROC curve
    • Youden Index
      • Point which maximizes vertical distance from line of equality (45 degree line)
      • We can derive that this is the point which maximizes (sensitivity + specificity)


AUC vs overall accuracy as comparison metric

  • AUC helps us understand how much our classifier is away from random guess, which accuracy can not tell
  • Accuracy is measured at particular threshold while AUC requires moving threshold from 0 to 1


F score

  • We know that recall and sensitivity are same, but precision and specificity are not same
  • While medical field is more concerned about specificity, information retrieval is more concerned about precision
  • So they came up with F score which is harmonic mean of precision and recall
  • AUC helps us maximizing sensitivity and specificity simultaneously while F score helps us maximizing precision and recall simultaneously







An Introduction to Statistical Learning – http://www-bcf.usc.edu/~gareth/ISL/



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s