Heading image

Prerequisites

  1. Familiarity Conda package, dependency and virtual environment manager. A handy additional reference for Conda is the blog post “The Definitive Guide to Conda Environments” on “Towards Data Science”.
  2. Familiarity with JupyterLab. See here for my post on JupyterLab.
  3. These projects will also run Python notebooks on VSCode with the Jupyter Notebooks extension. If you do not use VSCode, it is expected that you know how to run notebooks (or alter the method for what works best for you).

Getting started

Let’s first clone the code from part three into the regression-with-scikit-learn-part-four directory.

What is Regularized Regression?

“Regularization” is a method to give a penalty to the model in order to prevent overfitting. The penalty is a function of the model’s complexity. The more complex the model, the higher the penalty.

  1. Ridge Regression
  2. Lasso Regression

Ridge Regression

Ridge regression tunes a model that is used to analyze data that has multicollinearity.

Lasso Regression

Lasso regression is a type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point, like the mean. This type is very useful when you have high levels of muticollinearity or when you want to automate certain parts of model selection, like variable selection/parameter elimination.

Lasso for feature selection

One of the important aspects of Lasso regression is using it to select important features of a dataset.

Lasso coefficients assigned to features

Summary

Today’s post looked into both Ridge and Lasso regression, as well as how to apply those methods using Scikit Learn.

Resources and further reading

  1. Conda
  2. JupyterLab
  3. Jupyter Notebooks
  4. “The Definitive Guide to Conda Environments”
  5. okeeffed/regression-with-scikit-learn-part-four
  6. Multicollinearity — Wikipedia
  7. Regularized Regression — statisticshowto.com

--

--

--

Senior Engineer @ UsabilityHub. Formerly Culture Amp.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How I got on the leaderboard when I entered Kaggle’s Store Sales competition

NeuralProphet For Time-Series Forecasting: Predicting Stock Prices Using Facebook’s New Model

Dealing with Data

Researc/hers that Code Series: Predicting population well-being from spending habits

Stock Market Analysis of Renewable Energy Giants!

Offshore Windfarm

Bank Data: Classification Part 4 Final

What’s cooking with GDS

Launch a Data Science Environment on Oracle Cloud Infrastructure

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Dennis O'Keeffe

Dennis O'Keeffe

Senior Engineer @ UsabilityHub. Formerly Culture Amp.

More from Medium

The topic is very broad: datasets can come from a wide range of sources and a wide range of…

Introducing a new data analysis online platform

How to Set X and y in Pandas

Creating a Swiss-style Tournament Manager — Part 1: Match Making