Introduction to Seaborn#

The definition of seaborn’s website is so concise that we replicate it here:

“Seaborn is a library for making statistical graphics in Python. It builds on top of matplotlib and integrates closely with pandas data structures.”

That’s it! The main benefit of using it is that it is a more high-level library, which means we can achieve sophisticated plots with much less lines of code. Most axes style customization are done automatically. It can automatically provide plots with summary statistics.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

We will apply the seaborn default theme, but you can choose others here.

sns.set_theme()

Scatter plots with seaborn#

Let’s load the same dataframe.

df = pd.read_csv("../../data/BBBC007_analysis.csv")
df.head()
area intensity_mean major_axis_length minor_axis_length aspect_ratio file_name
0 139 96.546763 17.504104 10.292770 1.700621 20P1_POS0010_D_1UL
1 360 86.613889 35.746808 14.983124 2.385805 20P1_POS0010_D_1UL
2 43 91.488372 12.967884 4.351573 2.980045 20P1_POS0010_D_1UL
3 140 73.742857 18.940508 10.314404 1.836316 20P1_POS0010_D_1UL
4 144 89.375000 13.639308 13.458532 1.013432 20P1_POS0010_D_1UL

And make a scatter plot of aspect_ratio vs intensity mean.

sns.scatterplot(data=df, x="aspect_ratio", y="intensity_mean")
<AxesSubplot: xlabel='aspect_ratio', ylabel='intensity_mean'>
../_images/02_Introduction_to_Seaborn_9_1.png

We can already embbed and visualize other features by providing a few extra arguments.

sns.scatterplot(data=df,
            x = "aspect_ratio",
            y = "intensity_mean",
            size = "area",
            hue = "major_axis_length",
            palette = 'magma')
<AxesSubplot: xlabel='aspect_ratio', ylabel='intensity_mean'>
../_images/02_Introduction_to_Seaborn_11_1.png

Scatter plots with subplots#

The scatterplot function is an axes-level function. This means, if we want to add subplots, we also need to create figure and axes from matplotlib first and pass the axes handles. That’s when knowing some matplotlib syntax may be handy!

However, seaborn also have figure-level functions, where the subplots are also just an argument.

In the example below, we use the relplot function (from relationship) and separate the files by providing ‘file_name’ to the argument col,

sns.relplot(data=df,
            x = "aspect_ratio",
            y = "intensity_mean",
            size = "area",
            hue = "major_axis_length",
            col = "file_name",
            palette = 'magma')
<seaborn.axisgrid.FacetGrid at 0x24aa593d820>
../_images/02_Introduction_to_Seaborn_14_1.png

Adding a line regression model#

There are two functions to make a scatter plot with a line regression model: regplot and lmplot. As before, regplot is an axes-level funtion while lmplot is a figure-level function.

Let’s plot an example of each of them

sns.regplot(data = df, x = "aspect_ratio", y = "intensity_mean")
<AxesSubplot: xlabel='aspect_ratio', ylabel='intensity_mean'>
../_images/02_Introduction_to_Seaborn_17_1.png

Line Regression with subplots#

sns.lmplot(data = df,
           x = "aspect_ratio",
           y = "intensity_mean",
           col = "file_name")
<seaborn.axisgrid.FacetGrid at 0x24aa5a4aa00>
../_images/02_Introduction_to_Seaborn_19_1.png

Exercise#

Plot a line regression model on a single plot, with points and lines having different colors according to ‘file_name’.

Hint: use a function that accepts a hue argument