Log loss

Created	@May 9, 2022
Tags	Loss

Log loss, also known as logarithmic loss or cross-entropy loss, is a widely used loss function in binary and multiclass classification tasks. It measures the performance of a classification model where the predicted output is a probability value between 0 and 1.

Definition:

For binary classification, the log loss function is defined as:

$\text{Log Loss} = -\frac{1}{N} \sum_{i=1}^{N} \left( y_i \cdot \log(p_i) + (1 - y_i) \cdot \log(1 - p_i) \right)$

where:

$N$ is the number of samples,

$y_i$ is the true label of the $i$-th sample (either 0 or 1),

$p_i$ is the predicted probability that the $i$-th sample belongs to the positive class.

Interpretation:

Log loss measures the accuracy of the model's predicted probabilities by penalizing underconfident or overconfident predictions.

Lower log loss values indicate better model performance, with 0 representing perfect predictions.

Log loss is sensitive to the uncertainty of predictions and penalizes confidently wrong predictions more heavily than uncertain ones.

Applications:

Classification Models Evaluation:
- Log loss is commonly used as a performance metric for evaluating the quality of probabilistic predictions generated by classification models, especially in scenarios where class probabilities are important, such as in medical diagnosis or fraud detection.

Kaggle Competitions:
- Log loss is a frequently used evaluation metric in data science competitions on platforms like Kaggle. Competitors aim to minimize log loss to improve the predictive performance of their models.

Multi-class Classification:
- Log loss can be extended to multi-class classification tasks, where it measures the accuracy of predicted probabilities across multiple classes.

Python Implementation (using scikit-learn):

from sklearn.metrics import log_loss

# Example ground truth and predicted probabilities
y_true = [0, 1, 1, 0, 1]
y_prob = [[0.9, 0.1], [0.3, 0.7], [0.8, 0.2], [0.2, 0.8], [0.6, 0.4]]  # Predicted probabilities of positive class

# Calculate log loss
logloss = log_loss(y_true, y_prob)

print("Log Loss:", logloss)

In this example, y_true contains the true labels of the samples (0 for the negative class and 1 for the positive class), and y_prob contains the predicted probabilities of each class. We calculate the log loss using the log_loss function from scikit-learn's metrics module. Lower log loss values indicate better predictive performance.