Entropy / Cross-Entropy / Relative-entropy loss

Created
TagsLoss

Entropy

Entropy, in the context of information theory and machine learning, measures the amount of uncertainty or disorder within a set of outcomes. In machine learning, it's often used with decision trees (e.g., in the ID3 algorithm) to determine which feature splits the data best by calculating the entropy before and after the split. The equation for entropy (\(H\)) of a discrete random variable \(X\) with possible values {x1,x2,...,xn}\{x_1, x_2, ..., x_n\} and probability mass function \(P(X)\) is:

H(X)=i=1nP(xi)logP(xi)H(X) = -\sum_{i=1}^{n} P(x_i) \log P(x_i)

The base of the logarithm can be chosen to define the unit of entropy. Base 2 is commonly used, resulting in the unit of bits.

Cross-Entropy

Cross-entropy builds upon the concept of entropy and measures the difference between two probability distributions for a given random variable or set of events. It's widely used in classification tasks, especially in training neural networks, as a loss function. Given the true distribution \(P\) and an estimated distribution \(Q\), the cross-entropy (\(H(P, Q)\)) is defined as:

H(P,Q)=iP(xi)logQ(xi)H(P, Q) = -\sum_{i} P(x_i) \log Q(x_i)

For binary classification problems, this simplifies to:

H(P,Q)=1Ni=1N[yilog(y^i)+(1yi)log(1y^i)]H(P, Q) = -\frac{1}{N} \sum_{i=1}^{N} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)]

where \(y_i\) is the actual label and y^i\hat{y}_i is the predicted probability of the positive class.

Relative Entropy (Kullback-Leibler Divergence)

Relative Entropy, or Kullback-Leibler (KL) divergence, is a measure of how one probability distribution diverges from a second, expected probability distribution. It's used in various applications, including in Bayesian statistics, information theory, and machine learning. The KL divergence from \(Q\) to \(P\) is defined as:

DKL(PQ)=iP(xi)logP(xi)Q(xi)D_{KL}(P || Q) = \sum_{i} P(x_i) \log \frac{P(x_i)}{Q(x_i)}

KL divergence is not symmetric, meaning DKL(PQ)DKL(QP)D_{KL}(P || Q) \neq D_{KL}(Q || P).

Python Implementation Example

Here's an example of calculating cross-entropy loss in Python using NumPy:

import numpy as np

def cross_entropy_loss(y_true, y_pred):
    """
    Calculate the cross-entropy loss.

    :param y_true: Array of true labels.
    :param y_pred: Array of predicted probabilities.
    :return: Cross-entropy loss.
    """
    # Small constant to prevent division by zero or log of zero.
    epsilon = 1e-15
    y_pred = np.clip(y_pred, epsilon, 1. - epsilon)
    ce = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
    return ce

# Example usage
y_true = np.array([1, 0, 1, 1, 0])
y_pred = np.array([0.9, 0.1, 0.8, 0.8, 0.2])
loss = cross_entropy_loss(y_true, y_pred)
print("Cross-entropy loss:", loss)
import torch
import torch.nn.functional as F

def manual_cross_entropy(y_pred, y_true):
    # 计算softmax
    probs = torch.exp(y_pred) / torch.sum(torch.exp(y_pred), axis=1, keepdims=True)
    # 计算交叉熵损失
    n_samples = y_true.shape[0]
    correct_log_probs = -torch.log(probs[range(n_samples), y_true])
    loss = torch.sum(correct_log_probs) / n_samples
    return loss

# 假设的模型输出和标签
y_pred = torch.tensor([[2.0, 1.0, 0.1], [0.1, 1.5, 0.2], [0.05, 0.2, 1.5]])
y_true = torch.tensor([0, 1, 2])  # 类别标签

# 计算交叉熵损失
loss = manual_cross_entropy(y_pred, y_true)
print(loss)

Pros and Cons

Applications

These concepts are foundational in machine learning and data science, underpinning many algorithms and techniques for data analysis, prediction, and classification.