Maximum Likelihood Estimation (MLE) - when find the mean and standard diviation of a group of models

Created
TagsBasic Concepts

Maximum likelihood to learn the distribution of the target: the mean and standard deviation of the model

In machine learning, Maximum Likelihood Estimation (MLE) can be particularly useful for parameter estimation in probabilistic models, including scenarios where you want to find the mean and standard deviation of a group of models or data points. This is common in ensemble methods, Bayesian models, and any situation where understanding the distribution of model predictions or errors can inform decision-making or further modeling.

Use Case: Estimating Parameters of Model Predictions

Suppose you have a group of models that have been trained on different subsets of your data (as in bagging or random forests) or models that have been trained sequentially to correct the errors of prior models (as in boosting). After training, you use these models to make predictions on a new dataset. For each data point in this new dataset, you now have a group of predictions. You suspect these predictions are normally distributed around the true value, and you want to estimate the mean and standard deviation of this distribution.

Applying MLE for Mean and Standard Deviation

Let's assume the predictions for a given data point are \(X_1, X_2, \ldots, X_n\), and you believe these predictions follow a normal distribution with mean \(\mu\) and standard deviation \(\sigma\). You want to estimate \(\mu\) and \(\sigma\) using MLE.

The likelihood function for a normal distribution is:

L(μ,σX)=i=1n12πσ2exp((xiμ)22σ2)L(\mu, \sigma | X) = \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x_i - \mu)^2}{2\sigma^2}\right)

Taking the log-likelihood, you get:

log(L(μ,σX))=n2log(2π)nlog(σ)12σ2i=1n(xiμ)2\log(L(\mu, \sigma | X)) = -\frac{n}{2} \log(2\pi) - n\log(\sigma) - \frac{1}{2\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2

To find the MLE of \(\mu\) and \(\sigma\), you take the derivatives of the log-likelihood with respect to \(\mu\) and \(\sigma\), set them to zero, and solve for \(\mu\) and \(\sigma\). Doing this, you find that the MLE of \(\mu\) (\(\hat{\mu}\)) is the sample mean:

μ^=1ni=1nxi \hat{\mu} = \frac{1}{n} \sum_{i=1}^{n} x_i 

And the MLE of \(\sigma^2\) (\(\hat{\sigma}^2\)) is the sample variance:

σ^2=1ni=1n(xiμ^)2 \hat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \hat{\mu})^2 

Python Example

Here's a simple Python example that calculates the MLE for the mean and standard deviation of predictions from multiple models:

import numpy as np

# Example predictions from 5 models for a single data point
predictions = np.array([2.3, 2.5, 2.7, 2.4, 2.6])

# MLE for the mean (sample mean)
mu_hat = np.mean(predictions)

# MLE for the standard deviation (sample standard deviation)
sigma_hat = np.std(predictions, ddof=0)  # ddof=0 for MLE

print(f"MLE for mean (mu): {mu_hat}")
print(f"MLE for standard deviation (sigma): {sigma_hat}")

Importance in Machine Learning

Understanding the distribution of model predictions or errors can be crucial:

MLE provides a principled way to quantify beliefs about parameters in probabilistic models, making it a valuable tool in the machine learning practitioner's toolkit.