## Some Scenarios

• In finance default is failure to meet the legal obligation of loan. Given some data we want to classify whether the person will be defaulter or not.
• Suppose our training data-set is imbalanced. Out of 10k samples only 300 are defaulters. (3%)
• Classifier in the following table is good at classifying non defaulters but not good at classifying defaulters (which is more important for credit card company)
• Assume what if out of 300 defaulters 250 are classified as non defaulters and given the credit card
• Doctors want to conduct a test whether a patient has cancer or not.
• Popular terms in medical field are sensitivity and specificity
• Instead of trying to classify person as defaulter, here we classify if patient has cancer.
• Sensitivity = 81/333 = 24 %
• Specificity = 9644/9667 = 99 %
• Every medical test thrives to achieve 100% in both sensitivity and specificity.
• In information retriever we want to know how many % of relevant pages we were able to retrieve.
• TP = 81
• FP = 23
• TN = 9644
• FN = 252
• Precision = 81/104= 77%
• Recall = 81/333= sensitivity = 24%

## Example

• Formulas:
• Precision = TP/(TP+FP)
• Recall = TP/(TP+FN)
• Sensitivity = TP/(TP+FN)
• Specificity = TN/(TN+FP)
• Recall and sensitivity are same

## Solution is to change the threshold

• Earlier we were assigning person to default if probability is more than 50%
• Now we want to assign more person as defaulter
• So we will assign them to defaulter when probability is more than 20%
• This will incorrectly classify non-defaulters to defaulters but that is less concerned compared to assigning defaulter to non-defaulter
• This will also increase the overall error rate, which is still okay

## ROC and AUC

• We can always increase sensitivity by classifying all samples as positive
• We can increase specificity by classifying all samples as negative
• ROC plot (sensitivity) vs (1-specificity)
• That is TP vs FP
• ROC = Reciever operating charateristic
• It is good to have ROC curve on top left
• Better classifier
• Accurate test
• And ROC curve close to 45 degree represents less accurate test
• AUC = Area Under Curve
• Area under ROC curve
• Ideal value for AUC is 1
• AUC of 0.5 (45 degree line) represents a random classifier

##### How to plot ROC?
• Change the probability threshold from 0 to 1 and measure sensitivity and specificity. If specificity keeps on decreasing ((100-specificity) keeps on increasing) as sensitivity increase it is a bad classifier.
• For random classifier ROC is 45 degree line
• You draw random number between (0, 1)
• Classify it based on threshold
• So threshold is there
• But while building classifier we want to do better than drawing random probability between (0, 1). We also want to consider features into account while drawing between (0, 1)
• Can AUC be less than 0.5? I don’t think so.
• Complementing the output will bring it to other side of line anyway.
• What if I classified all of them as positive?
• That means you are taking all 1. You can not plot ROC with that.

## Threshold selection

• Unless there is special business requirement (as in credit card defaulters) we want to select a threshold which maximizes TP while minimizing FP
• There are two methods to do that:
• Point which is closest to (0, 1) in ROC curve
• Youden Index
• Point which maximizes vertical distance from line of equality (45 degree line)
• We can derive that this is the point which maximizes (sensitivity + specificity)

## AUC vs overall accuracy as comparison metric

• AUC helps us understand how much our classifier is away from random guess, which accuracy can not tell
• Accuracy is measured at particular threshold while AUC requires moving threshold from 0 to 1

## F score

• We know that recall and sensitivity are same, but precision and specificity are not same
• While medical field is more concerned about specificity, information retrieval is more concerned about precision
• So they came up with F score which is harmonic mean of precision and recall
• AUC helps us maximizing sensitivity and specificity simultaneously while F score helps us maximizing precision and recall simultaneously
• Beta in f score helps providing weight to precision and recall.
• Harmonic mean can not be made arbitrarily large while changing some values to bigger one and leaving at least one unchanged. It is maximizes when all elements are increased.
• x = 0, y = 1 will give 0.5 in arithmetic mean but is zero for harmonic mean

## References

http://machinelearningmastery.com/assessing-comparing-classifier-performance-roc-curves-2/

http://www.medicalbiostatistics.com/roccurve.pdf

https://www.medcalc.org/manual/roc-curves.php

https://en.wikipedia.org/wiki/F1_score

An Introduction to Statistical Learning – http://www-bcf.usc.edu/~gareth/ISL/

https://stats.stackexchange.com/questions/221997/why-f-beta-score-define-beta-like-that

https://en.wikipedia.org/wiki/Harmonic_mean