Plotting Distributions with Seaborn
Contents
Plotting Distributions with Seaborn#
With Seaborn, it is also very practical to plot data distributions. We start with simple boxplots and bar graphs. Then, we show how to plot histograms and kde.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()
Let’s load the same dataframe.
df = pd.read_csv("../../data/BBBC007_analysis.csv")
df.head()
area | intensity_mean | major_axis_length | minor_axis_length | aspect_ratio | file_name | |
---|---|---|---|---|---|---|
0 | 139 | 96.546763 | 17.504104 | 10.292770 | 1.700621 | 20P1_POS0010_D_1UL |
1 | 360 | 86.613889 | 35.746808 | 14.983124 | 2.385805 | 20P1_POS0010_D_1UL |
2 | 43 | 91.488372 | 12.967884 | 4.351573 | 2.980045 | 20P1_POS0010_D_1UL |
3 | 140 | 73.742857 | 18.940508 | 10.314404 | 1.836316 | 20P1_POS0010_D_1UL |
4 | 144 | 89.375000 | 13.639308 | 13.458532 | 1.013432 | 20P1_POS0010_D_1UL |
Boxplots#
The axes function for plotting boxplots is boxplot
.
Seaborn already identified file_name
as a categorical value and ìntensity_mean
as a numerical value. Thus, it plots boxplots for the intensity variable. If we invert x and y, we still get the same graph, but as vertical bosplots.
sns.boxplot(data=df, x="intensity_mean", y="file_name")
<AxesSubplot: xlabel='intensity_mean', ylabel='file_name'>
The figure-level, and more general, version of this kind of plot is catplot
. We just have to provide kind
as box
.
sns.catplot(data=df, x="intensity_mean", y="file_name", kind="box")
<seaborn.axisgrid.FacetGrid at 0x2bcf85fa070>
There are other kinds available, like a bar
graph.
sns.catplot(data=df, x="file_name", y="intensity_mean", kind="bar")
<seaborn.axisgrid.FacetGrid at 0x2bcf82144c0>
Histograms and Distribution Plots#
The axes-level function for plotting histograms is histplot
.
sns.histplot(data = df, x="intensity_mean", hue="file_name")
<AxesSubplot: xlabel='intensity_mean', ylabel='Count'>
We can instead plot the kernel density estimation (kde) with kdeplot
function. Just be careful while interpreting these plots (check some pitfalls here)
sns.kdeplot(data=df, x="intensity_mean", hue="file_name")
<AxesSubplot: xlabel='intensity_mean', ylabel='Density'>
The figure-level function for distributions is distplot
. With it, you can have histograms and kde in the same plot, or other kinds of plots, like the empirical cumulative distribution function (ecdf).
sns.displot(data = df, x="intensity_mean", hue="file_name", kde=True)
<seaborn.axisgrid.FacetGrid at 0x2bcf852f610>
Exercise#
Plot two empirical cumulative distribution functions for ‘area’ from different files on a same graph with different colors.
Repeat this for the property ‘intensity_mean’ on a second figure. Infer whether you would expect these properties to be different or not.
*Hint: look for the kind
parameter of displot