Heading image

Prerequisites

  1. Familiarity Conda package, dependency and virtual environment manager. A handy additional reference for Conda is the blog post “The Definitive Guide to Conda Environments” on “Towards Data Science”.
  2. Familiarity with JupyterLab. See here for my post on JupyterLab.
  3. These projects will also run Python notebooks on VSCode with the Jupyter Notebooks extension. If you do not use VSCode, it is expected that you know how to run notebooks (or alter the method for what works best for you).
  4. Read “First Look At Supervised Learning With Classification”.

Getting started

Let’s create the measuring-classifier-model-performance directory and install the required packages.

Bringing the code up to par

In our file docs/measuring-classifier-model-performance, we can add the following:

Creating a training and test set

The “training” and “test” set are the data that we will use to train our classifier. We will use the “test” set to test the accuracy of our classifier.

  1. Splitting our data into a test size of 30% and a training size of 70% (as denoted in the kwarg test_size).
  2. Setting the random_state keyword arg to 21. This will ensure that the split is reproducible.
  3. Setting the stratify keyword arg to the y variable. This will ensure that the split is stratified. That is to say, that the ratio of the training set to the test set will be the same for each class.

Checking a classifier for fit

In relation to the k-Nearest Neighbors classifier, we need to check how good the fit is for our model.

Comparing Testing vs Training accuracy

Using our classifier with the determined parameter

The final step is to use our classifier with the determined parameter. In a new cell, we can add some unlabelled data and use our classifier to label it.

Summary

Today’s post demonstrated how to produce a graph to help us search for parameters that produce a good fit for our k-Nearest Neighbors classifier.

Resources and further reading

  1. Conda
  2. JupyterLab
  3. Jupyter Notebooks
  4. “The Definitive Guide to Conda Environments”
  5. okeeffed/measuring-classifier-model-performance

--

--

--

Senior Engineer @ UsabilityHub. Formerly Culture Amp.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Brute Forcing The Cartpole Problem

Support Vector Regression in Python Using Scikit-Learn

The Intuition of Recurrent Neural Networks

Predicting used car prices with linear regression in Amazon SageMaker — Part 2

Journey to ML, Part 2: Skills of a (Marketable) Machine Learning Engineer

Where machine learning truely excel

Other versions of KNN: Fast Nearest Neighbors

Case Study: Using Machine Learning to Classify Personally Identifiable Data Fields

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Dennis O'Keeffe

Dennis O'Keeffe

Senior Engineer @ UsabilityHub. Formerly Culture Amp.

More from Medium

How to Construct a Model for Nucleation and Growth

How to install Ray under Windows

Julia programming on the Google Colab (Data Science Series)

Anemia Prediction Using Machine Learning Techniques