ROC AUC

Created	@May 9, 2022
Tags	Metrics

ROC curve

a graphical representation of the contrast between true positive rates and the false positive rate at various thresholds.

a proxy for the trade-off between the sensitivity of the model (true positives) vs the fall-out or the probability it will trigger a false alarm (false positives)

true positive rate and false positive rate.
- True positive rate = recall = TP/(TP+FN)
- False positive rate = FP/(FP+TN)

Lowering the threshold allows more items to be classified as positive, thus increasing both true positive rate and false positive rate.

AUC

"Area under the ROC curve".
- the probability that the model ranks a random positive example more highly than a random negative example.
- the larger the AUC, the better a model is performing.

ROC (Receiver Operating Characteristic) curve and AUC (Area Under the Curve) are evaluation metrics commonly used for binary classification models. ROC curve is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. AUC represents the degree or measure of separability of classes.

ROC Curve:

The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings.

TPR is also known as sensitivity or recall and is calculated as \(\frac{{TP}}{{TP + FN}}\).

FPR is calculated as \(\frac{{FP}}{{FP + TN}}\), where FP is the number of false positives and TN is the number of true negatives.

The ROC curve visualizes the trade-off between sensitivity and specificity of the classifier across different threshold settings.

AUC (Area Under the ROC Curve):

AUC measures the entire two-dimensional area underneath the ROC curve from (0,0) to (1,1).

AUC provides an aggregate measure of performance across all possible classification thresholds.

AUC ranges from 0 to 1, where a higher AUC value indicates better classifier performance. A perfect classifier has an AUC of 1, while a random classifier has an AUC of 0.5.

Interpretation:

An ROC curve that lies closer to the top-left corner indicates better classifier performance, as it corresponds to higher TPR and lower FPR across different threshold settings.

AUC can be interpreted as the probability that the classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance.

AUC is a useful metric for imbalanced datasets, where one class is much more prevalent than the other.

Python Implementation (using scikit-learn):

from sklearn.metrics import roc_auc_score

# Example ground truth and predicted probabilities
y_true = [0, 1, 1, 0, 1]
y_prob = [0.1, 0.9, 0.8, 0.2, 0.7]  # Predicted probabilities of positive class

# Calculate ROC AUC score
roc_auc = roc_auc_score(y_true, y_prob)

print("ROC AUC Score:", roc_auc)

In this example, y_true contains the true labels of the samples (0 for negative class and 1 for positive class), and y_prob contains the predicted probabilities of the positive class. We calculate the ROC AUC score using the roc_auc_score function from scikit-learn's metrics module.