Heading image

Prerequisites

  1. Familiarity Conda package, dependency and virtual environment manager. A handy additional reference for Conda is the blog post “The Definitive Guide to Conda Environments” on “Towards Data Science”.
  2. Familiarity with JupyterLab. See here for my post on JupyterLab.
  3. These projects will also run Python notebooks on VSCode with the Jupyter Notebooks extension. If you do not use VSCode, it is expected that you know how to run notebooks (or alter the method for what works best for you).

Getting started

Let’s create the supervised-learning-with-scikit-learn-template directory and install the required packages.

Writing our first notebook

We will write seven cells in the notebook:

  1. Importing our required packages and setting a graph style.
  2. Exploring the iris dataset.
  3. Assigning the iris dataset to their X and y variables.
  4. Creating and exploring the data frame.
  5. Visualizing the output and making sense of the data.
  6. Creating a k-nearest neighbors classifier.
  7. Applying the classifier to some unlabelled data and assigning predicted classes to that data.

Importing required packages

In our file docs/supervised-learning-with-scikit-learn-template.ipynb, we can add the following:

  1. sklearn which includes simple and efficient tools for predictive data analysis.
  2. pandas for a data analysis and manipulation tool.
  3. numpy to help with scientific computing.
  4. matplotlib as our data visualization library.

Exploring the dataset

As a first look, we will explore the dataset with some helpful functions to get a better idea of what is happening.

  1. iris.data is our features for the data (also known as independent or predictor variables). There are 4 features (4 columns) in the data.
  2. The features themselves can be explores with the feature_names property. In this data, the features are sepal length (cm), sepal width (cm), petal length (cm) and petal width (cm).
  3. We notice that the target is a vector of integers. Our three possible classes of setosa, versicolor and virginica will be encoded as 0, 1, 2.
  4. The iris.data.shape tells use that there are 150 rows of data to use as historical data to help us find features which might be useful in identifying future entries.

Assigning the iris dataset to a variable

The next step is a help to assign the data to more apt variables to be used.

Creating and exploring the data frame

We Use the X column to create a data frame.

Visualizing the output

Finally, we can visualize the output by using a scatter matrix.

Scatter matrix in VSCode

Constructing a classifier

There are different algorithms for classifying data. In our example, we will be going with k-nearest neighbors, an algorithm that creates predication boundaries to label data based on n closest data points.

Predicting unlabeled data

To make predictions, we need to call predict on the classifier and pass some unlabelled data.

Summary

Today’s post set up a starting repository for all future posts on Machine Learning.

Resources and further reading

  1. Conda
  2. JupyterLab
  3. Jupyter Notebooks
  4. “The Definitive Guide to Conda Environments”
  5. matplotlib.pyplot.style.use
  6. sklearn
  7. pandas
  8. numpy
  9. matplotlib
  10. data frame
  11. scatter matrix
  12. What is a scatter plot?
  13. okeeffed/supervised-learning-with-scikit-learn-template

--

--

--

Senior Engineer @ UsabilityHub. Formerly Culture Amp.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

What Is Data Visualization? Definition, History, and Examples

Convert Chromosome Accession using mutalyzer

Parallelization caveats in R #1: the basics, multiprocessing and multithreading, performance.

AI For Good Starts With Collaboration

10 Data Science Blog Learning Platform in 2022

COVID Country Comparisons: How the US compares to other countries

HOW THE MODEL OF NEWS HAS REFLECTED THE CRISYS OF COVID-19

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Dennis O'Keeffe

Dennis O'Keeffe

Senior Engineer @ UsabilityHub. Formerly Culture Amp.

More from Medium

How to install Ray under Windows

CS:GO’s Inferno Gameplay Analysis using Python Data Visualization and Clustering Methods

How to calculate Pareto distribution and Zipf’s law in Python

Julia programming on the Google Colab (Data Science Series)