Heading image

Prerequisites

  1. Familiarity Conda package, dependency and virtual environment manager. A handy additional reference for Conda is the blog post “The Definitive Guide to Conda Environments” on “Towards Data Science”.
  2. Familiarity with JupyterLab. See here for my post on JupyterLab.
  3. These projects will also run Python notebooks on VSCode with the Jupyter Notebooks extension. If you do not use VSCode, it is expected that you know how to run notebooks (or alter the method for what works best for you).

Getting started

Writing our first notebook

  1. Importing our required packages and setting a graph style.
  2. Exploring the iris dataset.
  3. Assigning the iris dataset to their X and y variables.
  4. Creating and exploring the data frame.
  5. Visualizing the output and making sense of the data.
  6. Creating a k-nearest neighbors classifier.
  7. Applying the classifier to some unlabelled data and assigning predicted classes to that data.

Importing required packages

  1. sklearn which includes simple and efficient tools for predictive data analysis.
  2. pandas for a data analysis and manipulation tool.
  3. numpy to help with scientific computing.
  4. matplotlib as our data visualization library.

Exploring the dataset

  1. iris.data is our features for the data (also known as independent or predictor variables). There are 4 features (4 columns) in the data.
  2. The features themselves can be explores with the feature_names property. In this data, the features are sepal length (cm), sepal width (cm), petal length (cm) and petal width (cm).
  3. We notice that the target is a vector of integers. Our three possible classes of setosa, versicolor and virginica will be encoded as 0, 1, 2.
  4. The iris.data.shape tells use that there are 150 rows of data to use as historical data to help us find features which might be useful in identifying future entries.

Assigning the iris dataset to a variable

Creating and exploring the data frame

Visualizing the output

Scatter matrix in VSCode

Constructing a classifier

Predicting unlabeled data

Summary

Resources and further reading

  1. Conda
  2. JupyterLab
  3. Jupyter Notebooks
  4. “The Definitive Guide to Conda Environments”
  5. matplotlib.pyplot.style.use
  6. sklearn
  7. pandas
  8. numpy
  9. matplotlib
  10. data frame
  11. scatter matrix
  12. What is a scatter plot?
  13. okeeffed/supervised-learning-with-scikit-learn-template

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store