Machine learning, penguins and explanatory data visualisations

dataviz
annotations
machine learning
ggplot2
tidyverse
tidyTuesday

An annotated data visualisation to illustrate the decision-tree model classification of penguins into species based on their flipper and bill lengths

Author
Affiliation

Building Stories with Data

Published

July 29, 2020

For this week’s 🐧#TidyTuesday I chanced upon clusters in the data and decided to apply some machine learning skills to get a decision tree and illustrate the outcomes:

A scatterplot showing penguin flipper lengths (x axis) and bill lengths (y axis), coloured by species; the clusters neatly identify the three different species in the dataset. The visualisation is clearly annotated with dashed lines showing the decision nodes; any misclassified penguins are a triangle shape rather than a dot.

Code: https://github.com/cararthompson/tidytuesdays

Here’s the making of!

Video

Reuse

Citation

For attribution, please cite this work as:
Thompson, Cara. 2020. “Machine Learning, Penguins and Explanatory Data Visualisations.” July 29, 2020. https://www.cararthompson.com/posts/2020-07-29-tidy-tuesday.