Skip to content

Lasso And Ridge Regression

Ankit Jha edited this page Feb 4, 2019 · 3 revisions

The ridge regression has two important advantages over the linear regression. The most important one is that it penalizes the estimates. It doesn't penalize all the feature’s estimate arbitrarily. If estimates (β) β ) value are very large, then the SSE(Sum Of Squares Error) term in the above equation will minimize, but the penalty term will increases. If estimates(β) β ) values are small, then the penalty term in the above equation will minimize, but, the SSE term will increase due to poor generalization. So, it chooses the feature's estimates (β β ) to penalize in such a way that less influential features (Some features cause very small influence on dependent variable) undergo more penalization. In some domains, the number of independent variables is many, as well as we are not sure which of the independent variables influences dependent variable. In this kind of scenario, ridge regression plays a better role than linear regression.

In general Ridge regression performs well when number of predictors are very large(p > n).

It is best to apply ridge regression after standardizing the predictor.

https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-ridge-lasso-regression-python/

Ridge Regression:
    Performs L2 regularization, i.e. adds penalty equivalent to square of the magnitude of coefficients
    Minimization objective = LS Obj + α * (sum of square of coefficients)
Lasso Regression:
    Performs L1 regularization, i.e. adds penalty equivalent to absolute value of the magnitude of coefficients
    Minimization objective = LS Obj + α * (sum of absolute value of coefficients)

Normalizing the inputs is generally a good idea

Key Difference

Ridge: It includes all (or none) of the features in the model. Thus, the major advantage of ridge regression is coefficient shrinkage and reducing model complexity.
Lasso: Along with shrinking coefficients, lasso performs feature selection as well. (Remember the ‘selection‘ in the lasso full-form?) As we observed earlier, some of the coefficients become exactly zero, which is equivalent to the particular feature being excluded from the model.

Traditionally, techniques like stepwise regression were used to perform feature selection and make parsimonious models. But with advancements in Machine Learning, ridge and lasso regression provide very good alternatives as they give much better output, require fewer tuning parameters and can be automated to a large extend.

Generally, when you have many small/medium sized effects you should go with ridge. If you have only a few variables with a medium/large effect, go with lasso

MindMap

Decision Trees

Clone this wiki locally