Heading image

Prerequisites

  1. Familiarity Conda package, dependency and virtual environment manager. A handy additional reference for Conda is the blog post “The Definitive Guide to Conda Environments” on “Towards Data Science”.
  2. Familiarity with JupyterLab. See here for my post on JupyterLab.
  3. These projects will also run Python notebooks on VSCode with the Jupyter Notebooks extension. If you do not use VSCode, it is expected that you know how to run notebooks (or alter the method for what works best for you).
  4. Read “First Look At Supervised Learning With Classification”.

Getting started

Let’s create the measuring-classifier-model-performance directory and install the required packages.

Bringing the code up to par

In our file docs/measuring-classifier-model-performance, we can add the following:

Creating a training and test set

The “training” and “test” set are the data that we will use to train our classifier. We will use the “test” set to test the accuracy of our classifier.

  1. Splitting our data into a test size of 30% and a training size of 70% (as denoted in the kwarg test_size).
  2. Setting the random_state keyword arg to 21. This will ensure that the split is reproducible.
  3. Setting the stratify keyword arg to the y variable. This will ensure that the split is stratified. That is to say, that the ratio of the training set to the test set will be the same for each class.

Checking a classifier for fit

In relation to the k-Nearest Neighbors classifier, we need to check how good the fit is for our model.

Comparing Testing vs Training accuracy

Using our classifier with the determined parameter

The final step is to use our classifier with the determined parameter. In a new cell, we can add some unlabelled data and use our classifier to label it.

Summary

Today’s post demonstrated how to produce a graph to help us search for parameters that produce a good fit for our k-Nearest Neighbors classifier.

Resources and further reading

  1. Conda
  2. JupyterLab
  3. Jupyter Notebooks
  4. “The Definitive Guide to Conda Environments”
  5. okeeffed/measuring-classifier-model-performance

--

--

--

Senior Engineer @ UsabilityHub. Formerly Culture Amp.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Beer Label Classification: Make your own beer classifier using ORB, SURF and SIFT

What is Bias and Variance in Machine Learning?

Generating Text with Character-Based Deep RNNs

Why Tf-Idf is more effective than Bag-Of-Words?

A Guide to Make Machines Learn

“Spot” the difference in ML costs

A summary of Deep Reinforcement Learning (RL) Bootcamp: Lecture 5

Deep Learning Book Chapter 6 Reading Log

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Dennis O'Keeffe

Dennis O'Keeffe

Senior Engineer @ UsabilityHub. Formerly Culture Amp.

More from Medium

A Deep Dive into the Future of Integrated Development Environments: Google Colab

The Dream Python Project : A.I, M.L & Deep Learning. Making Real Life J.A.R.V.I.S

My Microsoft’s Reinforcement Learning Open Source (RLOS) Fest 2021 Experience

From Science Fictions To Our Doorsteps