Normalized Cross Entropy

Created	@February 22, 2024
Tags	Metrics

$\text{NCE} = \frac{\text{logloss(model)}} {\text{logloss(rate)}}$

Always non-negative.

Only 0 if your predictions match the labels perfectly.

Unbounded; can grow arbitrarily large.

Intuitive scale: NCE < 1: the model has learned something. NCE > 1: the model is less accurate than always predicting the average

$NCE = \frac{-\frac{1}{N} \sum_{i=1}^n \left(\frac{1+y_i}{2} \log(p_i)\right) + \left(\frac{1-y_i}{2}\log(1-p_i)\right)} {-(p*\log(p) +(1-p)*\log(1-p))}$

The lower the value, the better the model’s prediction.

The reason for this normalization is that the closer the background CTR is to either 0 or 1, the easier it is to achieve a better log loss.

Dividing by the entropy of the background CTR makes the NE insensitive to the background CTR.