Confusion matrix

Created	@August 28, 2021
Tags	Metrics

It is a table layout that allows visualization of the performance of an algorithm. Each row represents the instances in an actual class while each column represents the instances in a predicted class, or vice versa.

A confusion matrix is a table that is often used to describe the performance of a classification model on a set of test data for which the true values are known. It allows visualization of the performance of an algorithm, particularly in terms of binary classification, where predictions are classified into two categories (e.g., positive and negative).

Components of a Confusion Matrix:

A confusion matrix is composed of four different combinations of predicted and actual classes:

True Positive (TP): The number of samples that were correctly predicted as positive.

True Negative (TN): The number of samples that were correctly predicted as negative.

False Positive (FP): Also known as Type I error, it represents the number of samples that were incorrectly predicted as positive (false alarm).

False Negative (FN): Also known as Type II error, it represents the number of samples that were incorrectly predicted as negative (miss).

Interpretation:

Accuracy: The overall accuracy of the model can be calculated as $\frac{{TP + TN}}{{TP + TN + FP + FN}}$ , which represents the proportion of correctly classified samples.

Precision: Precision is calculated as $\frac{{TP}}{{TP + FP}}$ , representing the proportion of correctly predicted positive cases among all cases predicted as positive.

Recall (Sensitivity): Recall is calculated as $\frac{{TP}}{{TP + FN}}$ , representing the proportion of correctly predicted positive cases among all actual positive cases.

F1 Score: The F1 score is the harmonic mean of precision and recall, calculated as $2 \times \frac{{\text{{precision}} \times \text{{recall}}}}{{\text{{precision}} + \text{{recall}}}}$ .

Python Implementation (using scikit-learn):

from sklearn.metrics import confusion_matrix

# Example ground truth and predicted labels
y_true = [0, 1, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1]

# Calculate confusion matrix
conf_matrix = confusion_matrix(y_true, y_pred)

print("Confusion Matrix:")
print(conf_matrix)

In this example, y_true contains the true labels of the samples, and y_pred contains the predicted labels. We calculate the confusion matrix using the confusion_matrix function from scikit-learn's metrics module. The resulting confusion matrix is a 2x2 array representing the counts of true positive, false positive, true negative, and false negative predictions.