Lasso and Ridge
Created | |
---|---|
Tags | Regularization |
Lasso stands for Least Absolute Shrinkage and Selection Operator. It not only is used for regularization but indirectly for feature selection. As if the penalty is high, sone of the coefficients that do not have much correlation with the target become zero.
Derivation of ridge and lasso
Prior
Lasso - Laplace prior
Ridge - Gaussian prior
Lasso (Least Absolute Shrinkage and Selection Operator) and Ridge regression are two popular linear regression techniques used for regularization. They both add a penalty term to the ordinary least squares (OLS) loss function to prevent overfitting and improve the generalization performance of the model.
Lasso Regression:
Lasso regression adds an L1 regularization term to the loss function, which penalizes the absolute value of the coefficients:
where:
- \(w_i\) are the model coefficients (weights),
- \(\lambda\) is the regularization parameter that controls the strength of regularization.
Lasso regression tends to produce sparse models by driving the coefficients of less important features to exactly zero, effectively performing feature selection.
Ridge Regression:
Ridge regression adds an L2 regularization term to the loss function, which penalizes the squared magnitude of the coefficients:
where:
- \(w_i\) are the model coefficients,
- \(\lambda\) is the regularization parameter.
Ridge regression shrinks the coefficients towards zero, but it rarely forces them to exactly zero. It helps to reduce the model's complexity and makes it more robust to outliers in the data.
Key Differences:
- Effect on Coefficients:
- Lasso regression can lead to sparse models with many coefficients set to zero.
- Ridge regression shrinks the coefficients towards zero but rarely forces them to zero.
- Feature Selection:
- Lasso regression performs feature selection by driving less important features' coefficients to zero.
- Ridge regression does not perform feature selection as aggressively as Lasso regression.
- Geometric Interpretation:
- Lasso regression has a diamond-shaped constraint boundary, leading to solutions that are more likely to intersect the axis (sparse solutions).
- Ridge regression has a circular constraint boundary, which tends to produce coefficients that are more evenly distributed.
Python Implementation (using scikit-learn):
from sklearn.linear_model import Lasso, Ridge
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# Load sample dataset (Boston housing dataset)
boston = load_boston()
X, y = boston.data, boston.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create Lasso (L1) regression model
lasso = Lasso(alpha=0.1) # alpha is the regularization parameter (lambda)
lasso.fit(X_train, y_train)
# Create Ridge (L2) regression model
ridge = Ridge(alpha=0.1) # alpha is the regularization parameter (lambda)
ridge.fit(X_train, y_train)
# Make predictions
y_pred_lasso = lasso.predict(X_test)
y_pred_ridge = ridge.predict(X_test)
# Evaluate performance (e.g., Mean Squared Error)
mse_lasso = mean_squared_error(y_test, y_pred_lasso)
mse_ridge = mean_squared_error(y_test, y_pred_ridge)
print("Lasso Regression MSE:", mse_lasso)
print("Ridge Regression MSE:", mse_ridge)
In this example, we use Lasso (L1) and Ridge (L2) regression models to predict housing prices on the Boston housing dataset. We then evaluate the performance of the models using mean squared error (MSE). Adjusting the regularization parameter (\(\lambda\)) allows us to control the strength of regularization and find the optimal balance between bias and variance.