Plot Upload Test

How to Make Scatter Plot with Python

Preparation

First Step is to import the tools and the dataset we need

import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
url = "https://raw.githubusercontent.com/PhilChodrow/PIC16B/master/datasets/palmer_penguins.csv"
penguins = pd.read_csv(url)
penguins.head()
studyName Sample Number Species Region Island Stage Individual ID Clutch Completion Date Egg Culmen Length (mm) Culmen Depth (mm) Flipper Length (mm) Body Mass (g) Sex Delta 15 N (o/oo) Delta 13 C (o/oo) Comments
0 PAL0708 1 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N1A1 Yes 11/11/07 39.1 18.7 181.0 3750.0 MALE NaN NaN Not enough blood for isotopes.
1 PAL0708 2 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N1A2 Yes 11/11/07 39.5 17.4 186.0 3800.0 FEMALE 8.94956 -24.69454 NaN
2 PAL0708 3 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N2A1 Yes 11/16/07 40.3 18.0 195.0 3250.0 FEMALE 8.36821 -25.33302 NaN
3 PAL0708 4 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N2A2 Yes 11/16/07 NaN NaN NaN NaN NaN NaN NaN Adult not sampled.
4 PAL0708 5 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N3A1 Yes 11/16/07 36.7 19.3 193.0 3450.0 FEMALE 8.76651 -25.32426 NaN

Data cleaning

Raw data is usually hard to read and analyze, so we need to pick the data we will use

#Shorten the name of species
penguins["Species"]=penguins["Species"].str.split().str.get(0)
#Remove particular rows of sex that is not properly recorded
penguins=penguins[penguins["Sex"]!="."]
#Pick the data we want and get rid of Nas
data =["Species","Island","Culmen Length (mm)","Culmen Depth (mm)","Flipper Length (mm)","Body Mass (g)","Sex"]
penguins=penguins[data]
penguins=penguins.dropna()

Let’s check the data now

penguins.head()
Species Island Culmen Length (mm) Culmen Depth (mm) Flipper Length (mm) Body Mass (g) Sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 MALE
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 FEMALE
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 FEMALE
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 FEMALE
5 Adelie Torgersen 39.3 20.6 190.0 3650.0 MALE

Visualization

Let’s start with a simple scatter plot, examing the culmen length and culmen depth of penguins

sns.relplot(data=penguins,
           x="Culmen Length (mm)",
           y="Culmen Depth (mm)")
<seaborn.axisgrid.FacetGrid at 0x1a694178370>

png

Not much information we could get from the graph, let’s distinguish the points by species

sns.relplot(data=penguins,
           x="Culmen Length (mm)",
           y="Culmen Depth (mm)",
           hue = "Species")
<seaborn.axisgrid.FacetGrid at 0x1a6993e7160>

png

Now we have a decent graph to analyze. If we want further investigation, let us take sex and Island into account.

sns.relplot(data=penguins,
           x="Culmen Length (mm)",
           y="Culmen Depth (mm)",
           hue = "Species",
           col ="Sex",
           row ="Island")
<seaborn.axisgrid.FacetGrid at 0x1a699571ee0>

png

Now we have a informative scatter plot.

Written on April 14, 2022