Lasso and Ridge

Created	@August 28, 2021
Tags	Regularization

Lasso stands for Least Absolute Shrinkage and Selection Operator. It not only is used for regularization but indirectly for feature selection. As if the penalty is high, sone of the coefficients that do not have much correlation with the target become zero.

Derivation of ridge and lasso

线性回归大结局(岭(Ridge)、 Lasso回归原理、公式推导)，你想要的这里都有_牧牛的铃铛的博客-CSDN博客

所谓线性模型就是通过数据的线性组合来拟合一个数据，比如对于一个数据对于一个数据集其中 , 可能含有多个属性，如有个属性时, 即，是一个实数值。线性回归需要做的事就是需要找到一套参数尽可能的使得模型的输出跟接近。不妨设如下表达式，我们的目标就是让越靠近真实的 $y_i $越好。即 : 为了方便使用一个式子表示整个表达式，不妨令 : 现在需要来衡量模型的输出和真实值之间的差异，我们这里使用均方误差来衡量，即对于来说误差为: 像这种基于最小化来求解模型参数的方法叫做最小二乘法。对于整个数据集来说他的误差为：现在我们将他们用矩阵来表示其中：现在来仔细分析一下公式，首先对于一个或者向量来说，它的二范数为：二范数平方为：所以就有了，那么对于公式来说是一个的向量：所以根据矩阵乘法就有：根据上面的分析最终就得到了模型的误差：现在就需要最小化模型的误差，即优化问题，易知是一个关于的凸函数，则当它关于导数为0时求出的是的最优解。这里不对其是凸函数进行解释，如果有时间以后专门写一篇文章来解读。现在就需要对进行求导。我们现在要对上述公式进行求导，我们先来推导一下矩阵求导法则，请大家扶稳坐好😎😎：公求导式法则一: 证明: 公求导式法则三: 上面公式同理可以证明，在这里不进行赘述了。有公式和上面求导法则可知：对于上述不存在的情况一种常见的解决办法就是在损失函数后面加一个正则化惩罚项：则对求导有：如果需要拟合的话，下面的结果应该是最好的，即一个正弦函数：下图是一个过拟合的情况：我们可以观察一下它真实规律正弦曲线的之间的差异：过拟合的曲线将每个点都考虑到了，因此他会有一个非常大的缺点就是"突变"，即曲线的斜率的绝对值非常大，如：在正式讨论这个问题之间我们首先先来分析不同的权值所对应的（残差平方和）值是多少。的定义如下：对于一个只有两个属性的数据，对不同的权值计算整个数据集在相应权值下的。然后将值相等的点连接起来做成一个等高线图，看看相同的值下权值围成了一个什么图形。比如我们有两个属性它们有一个线性组合很容易直到和之间是一个线性组合关系：即我们要求的权值因为和真实值一样，所以它对应的为0。我们现在要做的就是针对不同的的取值去计算其所对应的值。比如说取到下面图中的所有的点。然后去计算这些点对应的，然后将值作为等高线图中点对应的高，再将相同的点连接起来就构成了等高线图。下面就是具体的生成过程： x1: array([ 0.

https://blog.csdn.net/qq_45537774/article/details/115695866?utm_medium=distribute.pc_relevant.none-task-blog-2~default~baidujs_baidulandingword~default-0.essearch_pc_relevant&spm=1001.2101.3001.4242

Prior

Lasso - Laplace prior

Ridge - Gaussian prior

Lasso (Least Absolute Shrinkage and Selection Operator) and Ridge regression are two popular linear regression techniques used for regularization. They both add a penalty term to the ordinary least squares (OLS) loss function to prevent overfitting and improve the generalization performance of the model.

Lasso Regression:

Lasso regression adds an L1 regularization term to the loss function, which penalizes the absolute value of the coefficients:

where:

$w_i$ are the model coefficients (weights),

$\lambda$ is the regularization parameter that controls the strength of regularization.

Lasso regression tends to produce sparse models by driving the coefficients of less important features to exactly zero, effectively performing feature selection.

Ridge Regression:

Ridge regression adds an L2 regularization term to the loss function, which penalizes the squared magnitude of the coefficients:

$\text{Ridge loss} = \text{OLS loss} + \lambda \sum_{i=1}^{n} w_i^2$

where:

$w_i$ are the model coefficients,

$\lambda$ is the regularization parameter.

Ridge regression shrinks the coefficients towards zero, but it rarely forces them to exactly zero. It helps to reduce the model's complexity and makes it more robust to outliers in the data.

Key Differences:

Effect on Coefficients:
- Lasso regression can lead to sparse models with many coefficients set to zero.
- Ridge regression shrinks the coefficients towards zero but rarely forces them to zero.

Feature Selection:
- Lasso regression performs feature selection by driving less important features' coefficients to zero.
- Ridge regression does not perform feature selection as aggressively as Lasso regression.

Geometric Interpretation:
- Lasso regression has a diamond-shaped constraint boundary, leading to solutions that are more likely to intersect the axis (sparse solutions).
- Ridge regression has a circular constraint boundary, which tends to produce coefficients that are more evenly distributed.

Python Implementation (using scikit-learn):

from sklearn.linear_model import Lasso, Ridge
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load sample dataset (Boston housing dataset)
boston = load_boston()
X, y = boston.data, boston.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create Lasso (L1) regression model
lasso = Lasso(alpha=0.1)  # alpha is the regularization parameter (lambda)
lasso.fit(X_train, y_train)

# Create Ridge (L2) regression model
ridge = Ridge(alpha=0.1)  # alpha is the regularization parameter (lambda)
ridge.fit(X_train, y_train)

# Make predictions
y_pred_lasso = lasso.predict(X_test)
y_pred_ridge = ridge.predict(X_test)

# Evaluate performance (e.g., Mean Squared Error)
mse_lasso = mean_squared_error(y_test, y_pred_lasso)
mse_ridge = mean_squared_error(y_test, y_pred_ridge)

print("Lasso Regression MSE:", mse_lasso)
print("Ridge Regression MSE:", mse_ridge)

In this example, we use Lasso (L1) and Ridge (L2) regression models to predict housing prices on the Boston housing dataset. We then evaluate the performance of the models using mean squared error (MSE). Adjusting the regularization parameter ($\lambda$) allows us to control the strength of regularization and find the optimal balance between bias and variance.