Quantile Loss

Created
TagsLoss

In certain applications, we value underestimation vs. overestimation differently. If you build a model to estimate arrival time, you don’t want to overestimate; otherwise, customers might not make orders/requests, etc.

Quantile loss can give more value to positive error or negative error.

y<lambdayp+y>=p(lambda1)yp∑y<\text{lambda}*|y- p| + \sum_{y >= p} (\text{lambda}-1) * |y - p|

If you set lambda to 0.5, it becomes MAE.

 Lτ(y,y^)={τ(yy^)if yy^(1τ)(y^y)if y<y^ L_{\tau}(y, \hat{y}) = \begin{cases} \tau \cdot (y - \hat{y}) & \text{if } y \geq \hat{y} \\ (1 - \tau) \cdot (\hat{y} - y) & \text{if } y < \hat{y} \end{cases}

Uber uses pseudo-Huber loss and log-cosh loss to approximate Huber loss and Mean Absolute Error in their distributed XGBoost training. Doordash Estimated Time Arrival models uses MSE then they move to Quantile loss and Custom Asymmetric MSE.

It depends on the use case to decide when to use which loss function. For binary classification, the most popular one is cross_entropy. In the Ad Click prediction problem, Facebook uses Normalized Cross Entropy loss (a.k.a. log loss) to make the loss less sensitive to background conversion rate.

Characteristics

Applications

Quantile regression and quantile loss are widely used in various fields, including:

Implementing Quantile Loss

Quantile loss can be implemented in machine learning frameworks that support custom loss functions, such as TensorFlow or PyTorch. It is also directly available in some machine learning algorithms and libraries that support quantile regression, like Gradient Boosting Machines (GBMs) in libraries such as scikit-learn, XGBoost, and LightGBM.