Plotting Distributions with Seaborn#

With Seaborn, it is also very practical to plot data distributions. We start with simple boxplots and bar graphs. Then, we show how to plot histograms and kde.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()

Let’s load the same dataframe.

df = pd.read_csv("../../data/BBBC007_analysis.csv")
df.head()
area intensity_mean major_axis_length minor_axis_length aspect_ratio file_name
0 139 96.546763 17.504104 10.292770 1.700621 20P1_POS0010_D_1UL
1 360 86.613889 35.746808 14.983124 2.385805 20P1_POS0010_D_1UL
2 43 91.488372 12.967884 4.351573 2.980045 20P1_POS0010_D_1UL
3 140 73.742857 18.940508 10.314404 1.836316 20P1_POS0010_D_1UL
4 144 89.375000 13.639308 13.458532 1.013432 20P1_POS0010_D_1UL

Boxplots#

The axes function for plotting boxplots is boxplot.

Seaborn already identified file_name as a categorical value and ìntensity_mean as a numerical value. Thus, it plots boxplots for the intensity variable. If we invert x and y, we still get the same graph, but as vertical bosplots.

sns.boxplot(data=df, x="intensity_mean", y="file_name")
<AxesSubplot: xlabel='intensity_mean', ylabel='file_name'>
../_images/03_Plotting_distributions_7_1.png

The figure-level, and more general, version of this kind of plot is catplot. We just have to provide kind as box.

sns.catplot(data=df, x="intensity_mean", y="file_name", kind="box")
<seaborn.axisgrid.FacetGrid at 0x2bcf85fa070>
../_images/03_Plotting_distributions_9_1.png

There are other kinds available, like a bar graph.

sns.catplot(data=df, x="file_name", y="intensity_mean", kind="bar")
<seaborn.axisgrid.FacetGrid at 0x2bcf82144c0>
../_images/03_Plotting_distributions_11_1.png

Histograms and Distribution Plots#

The axes-level function for plotting histograms is histplot.

sns.histplot(data = df, x="intensity_mean", hue="file_name")
<AxesSubplot: xlabel='intensity_mean', ylabel='Count'>
../_images/03_Plotting_distributions_14_1.png

We can instead plot the kernel density estimation (kde) with kdeplot function. Just be careful while interpreting these plots (check some pitfalls here)

sns.kdeplot(data=df, x="intensity_mean", hue="file_name")
<AxesSubplot: xlabel='intensity_mean', ylabel='Density'>
../_images/03_Plotting_distributions_16_1.png

The figure-level function for distributions is distplot. With it, you can have histograms and kde in the same plot, or other kinds of plots, like the empirical cumulative distribution function (ecdf).

sns.displot(data = df, x="intensity_mean", hue="file_name", kde=True)
<seaborn.axisgrid.FacetGrid at 0x2bcf852f610>
../_images/03_Plotting_distributions_18_1.png

Exercise#

Plot two empirical cumulative distribution functions for ‘area’ from different files on a same graph with different colors.

Repeat this for the property ‘intensity_mean’ on a second figure. Infer whether you would expect these properties to be different or not.

*Hint: look for the kind parameter of displot