Log loss

Created
TagsLoss

Log loss, also known as logarithmic loss or cross-entropy loss, is a widely used loss function in binary and multiclass classification tasks. It measures the performance of a classification model where the predicted output is a probability value between 0 and 1.

Definition:

For binary classification, the log loss function is defined as:

Log Loss=1Ni=1N(yilog(pi)+(1yi)log(1pi)) \text{Log Loss} = -\frac{1}{N} \sum_{i=1}^{N} \left( y_i \cdot \log(p_i) + (1 - y_i) \cdot \log(1 - p_i) \right) 

where:

Interpretation:

Applications:

  1. Classification Models Evaluation:
    • Log loss is commonly used as a performance metric for evaluating the quality of probabilistic predictions generated by classification models, especially in scenarios where class probabilities are important, such as in medical diagnosis or fraud detection.
  1. Kaggle Competitions:
    • Log loss is a frequently used evaluation metric in data science competitions on platforms like Kaggle. Competitors aim to minimize log loss to improve the predictive performance of their models.
  1. Multi-class Classification:
    • Log loss can be extended to multi-class classification tasks, where it measures the accuracy of predicted probabilities across multiple classes.

Python Implementation (using scikit-learn):

from sklearn.metrics import log_loss

# Example ground truth and predicted probabilities
y_true = [0, 1, 1, 0, 1]
y_prob = [[0.9, 0.1], [0.3, 0.7], [0.8, 0.2], [0.2, 0.8], [0.6, 0.4]]  # Predicted probabilities of positive class

# Calculate log loss
logloss = log_loss(y_true, y_prob)

print("Log Loss:", logloss)

In this example, y_true contains the true labels of the samples (0 for the negative class and 1 for the positive class), and y_prob contains the predicted probabilities of each class. We calculate the log loss using the log_loss function from scikit-learn's metrics module. Lower log loss values indicate better predictive performance.