Introduction to Seaborn#

Sources and inspiration:

If running this from Google Colab, uncomment the cell below and run it. Otherwise, just skip it.

# !pip install watermark

Seaborn is a library for making statistical graphics in Python. It builds on top of matplotlib and integrates closely with pandas data structures.

Seaborn helps you explore and understand your data. Its plotting functions operate on dataframes and arrays containing whole datasets and internally perform the necessary semantic mapping and statistical aggregation to produce informative plots. Its dataset-oriented, declarative API lets you focus on what the different elements of your plots mean, rather than on the details of how to draw them.

Here’s an example of what seaborn can do:

import seaborn as sns

import pandas as pd
import matplotlib.pyplot as plt
# Apply the default theme
sns.set_theme(style="ticks")

# Load an example dataset
tips = sns.load_dataset("tips")

# Create a visualization
sns.relplot(
    data=tips,
    x="total_bill",
    y="tip",
    col="time",
    hue="smoker",
    style="smoker",
    size="size",
)
plt.show()
c:\Users\mazo260d\mambaforge\envs\devbio-napari-clone\lib\site-packages\seaborn\axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)
../_images/6c362f6133d7a2cc463409024c72fbb9c46c25d8704ad538c54379fb14e34375.png

This plot shows the relationship between five variables in the tips dataset using a single call to the seaborn function relplot. Notice how we provided only the names of the variables and their roles in the plot. Unlike when using matplotlib directly, it wasn’t necessary to specify attributes of the plot elements in terms of the color values or marker codes. Behind the scenes, seaborn handled the translation from values in the dataframe to arguments that matplotlib understands. This declarative approach lets you stay focused on the questions that you want to answer, rather than on the details of how to control matplotlib.

Here is a similar example from a different dataset: intensity and shape measurements from objects in 2 fluorescence images.

nuclei_features_df = pd.read_csv("../../data/BBBC007_analysis.csv")
nuclei_features_df['round'] = nuclei_features_df['aspect_ratio'] < 1.2
nuclei_features_df.head()
area intensity_mean major_axis_length minor_axis_length aspect_ratio file_name round
0 139 96.546763 17.504104 10.292770 1.700621 20P1_POS0010_D_1UL False
1 360 86.613889 35.746808 14.983124 2.385805 20P1_POS0010_D_1UL False
2 43 91.488372 12.967884 4.351573 2.980045 20P1_POS0010_D_1UL False
3 140 73.742857 18.940508 10.314404 1.836316 20P1_POS0010_D_1UL False
4 144 89.375000 13.639308 13.458532 1.013432 20P1_POS0010_D_1UL True
sns.relplot(
    data=nuclei_features_df,
    x="aspect_ratio",
    y="intensity_mean",
    size="area",
    hue="round",
    col="file_name"
    )
c:\Users\mazo260d\mambaforge\envs\devbio-napari-clone\lib\site-packages\seaborn\axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)
<seaborn.axisgrid.FacetGrid at 0x17bf5856b50>
../_images/75d18fc36d78dd0bc2799ff856c7a40d3ebd6bfb0b05a4164ad0febe989d1cdf.png

A high-level API for statistical graphics#

There is no universally best way to visualize data. Different questions are best answered by different plots. Seaborn makes it easy to switch between different visual representations by using a consistent dataset-oriented API.

Other relplot examples#

The function relplot is named that way because it is designed to visualize many different statistical relationships. While scatter plots are often effective, relationships where one variable represents a measure of time are better represented by a line. The relplot function has a convenient kind parameter that lets you easily switch to this alternate representation:

dots = sns.load_dataset("dots")
sns.relplot(
    data=dots,
    kind="line",
    x="time",
    y="firing_rate",
    col="align",
    hue="choice",
    size="coherence",
    style="choice",
    facet_kws=dict(sharex=False),
)
c:\Users\mazo260d\mambaforge\envs\devbio-napari-clone\lib\site-packages\seaborn\axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)
../_images/7c1a660eb18ef4172ce04919e9eed2f5691a8534f252a06f8fd1c534731d070e.png

Notice how the size and style parameters are used in both the scatter and line plots, but they affect the two visualizations differently: changing the marker area and symbol in the scatter plot vs the line width and dashing in the line plot. We did not need to keep those details in mind, letting us focus on the overall structure of the plot and the information we want it to convey.

Often, we are interested in the average value of one variable as a function of other variables. Many seaborn functions will automatically perform the statistical estimation that is necessary to answer these questions:

fmri = sns.load_dataset("fmri")
sns.relplot(
    data=fmri,
    kind="line",
    x="timepoint",
    y="signal",
    col="region",
    hue="event",
    style="event",
)
c:\Users\mazo260d\mambaforge\envs\devbio-napari-clone\lib\site-packages\seaborn\axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)
../_images/8594323407b24a0efc11c52cb5a818392b1bc7e8922e3516350a92cbefc6636f.png
sns.lmplot(
    data=tips,
    x="total_bill",
    y="tip",
    col="time",
    hue="smoker")
c:\Users\mazo260d\mambaforge\envs\devbio-napari-clone\lib\site-packages\seaborn\axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)
c:\Users\mazo260d\mambaforge\envs\devbio-napari-clone\lib\site-packages\seaborn\axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)
../_images/545df54cf9f022a59021182942ec324a4e58d9ba0a7159f5daf6b81ae1e49a69.png

Exercise#

Take the nuclei_features_df again and plot a line regression model on a single plot, with points and lines having different colors according to ‘file_name’.

Hint: use a lmplot and the hue argument

from watermark import watermark
watermark(iversions=True, globals_=globals())
print(watermark())
print(watermark(packages="watermark,numpy,pandas,seaborn,matplotlib,scipy,statannotations"))
Last updated: 2023-08-25T14:46:57.523022+02:00

Python implementation: CPython
Python version       : 3.9.17
IPython version      : 8.14.0

Compiler    : MSC v.1929 64 bit (AMD64)
OS          : Windows
Release     : 10
Machine     : AMD64
Processor   : Intel64 Family 6 Model 165 Stepping 2, GenuineIntel
CPU cores   : 16
Architecture: 64bit

watermark      : 2.4.3
numpy          : 1.23.5
pandas         : 2.0.3
seaborn        : 0.12.2
matplotlib     : 3.7.2
scipy          : 1.11.2
statannotations: 0.4.4