Quantile Loss

Created	@February 20, 2024
Tags	Loss

In certain applications, we value underestimation vs. overestimation differently. If you build a model to estimate arrival time, you don’t want to overestimate; otherwise, customers might not make orders/requests, etc.

Quantile loss can give more value to positive error or negative error.

$∑y<\text{lambda}*|y- p| + \sum_{y >= p} (\text{lambda}-1) * |y - p|$

If you set lambda to 0.5, it becomes MAE.

$L_{\tau}(y, \hat{y}) = \begin{cases} \tau \cdot (y - \hat{y}) & \text{if } y \geq \hat{y} \\ (1 - \tau) \cdot (\hat{y} - y) & \text{if } y < \hat{y} \end{cases}$

Uber uses pseudo-Huber loss and log-cosh loss to approximate Huber loss and Mean Absolute Error in their distributed XGBoost training. Doordash Estimated Time Arrival models uses MSE then they move to Quantile loss and Custom Asymmetric MSE.

It depends on the use case to decide when to use which loss function. For binary classification, the most popular one is cross_entropy. In the Ad Click prediction problem, Facebook uses Normalized Cross Entropy loss (a.k.a. log loss) to make the loss less sensitive to background conversion rate.

Characteristics

Asymmetry: The quantile loss is asymmetric, meaning it does not penalize overestimations and underestimations equally. For $ \tau > 0.5 $, underestimations are penalized more than overestimations, and for $ \tau < 0.5 $, the opposite is true. When $ \tau = 0.5 $, the quantile loss is equivalent to the absolute error loss, treating underestimations and overestimations equally.

Robustness: Quantile regression is robust to outliers in the outcome variable because it focuses on estimating the conditional median (when $ \tau = 0.5 $) or other quantiles rather than the mean.

Applications

Quantile regression and quantile loss are widely used in various fields, including:

Finance: For estimating the value at risk (VaR) or expected shortfall of a portfolio, where understanding the distribution tails (e.g., $ \tau = 0.05 $ for the lower 5% quantile) is more important than the mean.

Energy: In load forecasting, to estimate peak demand (high quantiles) or minimum supply needs.

Medical Research: For understanding the distribution of treatment effects, especially when there is interest in the effects on different portions of the population (e.g., the most vulnerable).

Retail and Supply Chain: For inventory management and demand forecasting, especially to plan for peak demand scenarios.

Implementing Quantile Loss

Quantile loss can be implemented in machine learning frameworks that support custom loss functions, such as TensorFlow or PyTorch. It is also directly available in some machine learning algorithms and libraries that support quantile regression, like Gradient Boosting Machines (GBMs) in libraries such as scikit-learn, XGBoost, and LightGBM.