Heading image


  1. Familiarity Conda package, dependency and virtual environment manager. A handy additional reference for Conda is the blog post “The Definitive Guide to Conda Environments” on “Towards Data Science”.
  2. Familiarity with JupyterLab. See here for my post on JupyterLab.
  3. These projects will also run Python notebooks on VSCode with the Jupyter Notebooks extension. If you do not use VSCode, it is expected that you know how to run notebooks (or alter the method for what works best for you).

Getting started

Let’s create the supervised-learning-with-scikit-learn-template directory and install the required packages.

Writing our first notebook

We will write seven cells in the notebook:

  1. Importing our required packages and setting a graph style.
  2. Exploring the iris dataset.
  3. Assigning the iris dataset to their X and y variables.
  4. Creating and exploring the data frame.
  5. Visualizing the output and making sense of the data.
  6. Creating a k-nearest neighbors classifier.
  7. Applying the classifier to some unlabelled data and assigning predicted classes to that data.

Importing required packages

In our file docs/supervised-learning-with-scikit-learn-template.ipynb, we can add the following:

  1. sklearn which includes simple and efficient tools for predictive data analysis.
  2. pandas for a data analysis and manipulation tool.
  3. numpy to help with scientific computing.
  4. matplotlib as our data visualization library.

Exploring the dataset

As a first look, we will explore the dataset with some helpful functions to get a better idea of what is happening.

  1. iris.data is our features for the data (also known as independent or predictor variables). There are 4 features (4 columns) in the data.
  2. The features themselves can be explores with the feature_names property. In this data, the features are sepal length (cm), sepal width (cm), petal length (cm) and petal width (cm).
  3. We notice that the target is a vector of integers. Our three possible classes of setosa, versicolor and virginica will be encoded as 0, 1, 2.
  4. The iris.data.shape tells use that there are 150 rows of data to use as historical data to help us find features which might be useful in identifying future entries.

Assigning the iris dataset to a variable

The next step is a help to assign the data to more apt variables to be used.

Creating and exploring the data frame

We Use the X column to create a data frame.

Visualizing the output

Finally, we can visualize the output by using a scatter matrix.

Scatter matrix in VSCode

Constructing a classifier

There are different algorithms for classifying data. In our example, we will be going with k-nearest neighbors, an algorithm that creates predication boundaries to label data based on n closest data points.

Predicting unlabeled data

To make predictions, we need to call predict on the classifier and pass some unlabelled data.


Today’s post set up a starting repository for all future posts on Machine Learning.

Resources and further reading

  1. Conda
  2. JupyterLab
  3. Jupyter Notebooks
  4. “The Definitive Guide to Conda Environments”
  5. matplotlib.pyplot.style.use
  6. sklearn
  7. pandas
  8. numpy
  9. matplotlib
  10. data frame
  11. scatter matrix
  12. What is a scatter plot?
  13. okeeffed/supervised-learning-with-scikit-learn-template




Senior Engineer @ UsabilityHub. Formerly Culture Amp.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

‘R’ You Kidding Me?

Train a Longformer for detecting HyperPartisan News Content

Inferential Statistics basics

How to start your Machine Learning career?

Writing your first data science project

Choosing the Right Streaming Service— Netflix vs. Disney+

Netflix logo arm-wrestling with the Disney+ logo

Predicting The Future With Remote Viewing

Capstone Project — The Battle of Neighborhoods in Birmingham: Restaurants

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Dennis O'Keeffe

Dennis O'Keeffe

Senior Engineer @ UsabilityHub. Formerly Culture Amp.

More from Medium

Optimization with Python: Infeasibility Explanation for Integer Programming with OR-Tools

Creating a Swiss-style Tournament Manager — Part 1: Match Making

How to implement client and server in Flask for your AI/ML models with code

Adios Pandas! Process Big Data in a Flash using Terality, Dask, or PySpark