This is Day 29 of the #100DaysOfPython challenge.

This post will look into how we can create a linear regressor to make predictions about continuous target variables.

Source code can be found on my GitHub repo `okeeffed/regression-with-scikit-learn`.

## Prerequisites

1. Familiarity Conda package, dependency and virtual environment manager. A handy additional reference for Conda is the blog post “The Definitive Guide to Conda Environments” on “Towards Data Science”.

## Getting started

Let’s create the `regression-with-scikit-learn` directory and install the required packages.

At this stage, we are ready to add in a linear regressor.

## Exploring the Boston dataset

In this example, we will use the Boston housing dataset to predict the price of a house.

In our file `docs/regression.ipynb`, we can add the following:

The above will print out the dataset description.

Under `:Number of Attributes:`, we see the description "13 numeric/categorical predictive. Median Value (attribute 14) is usually the target."

It is the `MEDV` that we will be trying to predict.

In our second cell, add the following:

Here we are assigning the `X` and `y` variables to the `boston` dataset based on features and target respectively.

From there, we are creating a data frame with the features and target with panda.

Printing the head shows us the first five rows:

This information can give us some insight to what the data will look like.

We want to create a linear regressor to predict the `MEDV` variable based on the number of rooms, so we will need to adjust our X to only pass the one dimension.

The above code will only take the data for the `rooms` feature and reshape it to a 2d array.

The resulting data frame is the following:

## Visualizing the data

We can take the variables we have assign `X_rooms` and `y` to visualize the data.

In a new cell, add the following:

Executing that code gives us the following:

As you could imagine intuitively, the price of the house rises as the number of rooms increase.

## Creating a regressor to predict a continuous target variable

Finally, we can build a linear regressor to predict the `MEDV` variable.

This provides us with a visual line of the predicted values on the linear regressor.

## Summary

Today’s post was an introduction to regression with Scikit Learn. We used the Boston dataset to predict the `MEDV` variable.

Moving forward, we will dive deeper into linear regression theory apply this to a test/train split. Then we will look into cross-validation, as well a regularization.

1. Conda

Photo credit: `pawel_czerwinski`

Originally posted on my blog. To see new posts without delay, read the posts there and subscribe to my newsletter.

I write content for AWS, Kubernetes, Python, JavaScript and more. To view all the latest content, be sure to visit my blog and subscribe to my newsletter. Follow me on Twitter.

Senior Engineer @ UsabilityHub. Formerly Culture Amp.

## More from Dennis O'Keeffe

Senior Engineer @ UsabilityHub. Formerly Culture Amp.