Plotting Data with Matplotlib#

If running this from Google Colab, uncomment the cell below and run it. Otherwise, just skip it.

#!pip install watermark

Data, be it images or object features, can and must be plotted for a better understanding of their properties or relationships. We already saw that we can use napari to interactively visualize images. Sometimes, we may want to have a static view inside a notebook to consistently share with collaborators or as material in a publication.

Python has many libraries for plotting data, like matplotlib, seaborn, plotly and bokeh, to name a few. Some libraries ship plotting function inside them as a convenience. For example, the pandas method .plot can plot graphs directly from dataframes.

In this notebook, we will explain the basics of Matplotlib, probably the most flexible and traditional library to display images and data in Python.

Knowing a bit of its syntax help understanding other higher level libraries.

import pandas as pd
import numpy as np
from skimage.io import imread
import matplotlib.pyplot as plt

Reading data#

In this notebook, we will use an image and a table to plot. Let’s read them.

The table contains continuous data from 2 images, identified by the last categorical column ‘file_name’.

image1 = imread("../../data/BBBC007_batch/20P1_POS0010_D_1UL.tif")

df = pd.read_csv("../../data/BBBC007_analysis.csv")
df.head(5)
area intensity_mean major_axis_length minor_axis_length aspect_ratio file_name
0 139 96.546763 17.504104 10.292770 1.700621 20P1_POS0010_D_1UL
1 360 86.613889 35.746808 14.983124 2.385805 20P1_POS0010_D_1UL
2 43 91.488372 12.967884 4.351573 2.980045 20P1_POS0010_D_1UL
3 140 73.742857 18.940508 10.314404 1.836316 20P1_POS0010_D_1UL
4 144 89.375000 13.639308 13.458532 1.013432 20P1_POS0010_D_1UL

Plotting an image with matplotlib#

To start, we briefly recap how we display images. You just need a single line:

plt.imshow(image1)
<matplotlib.image.AxesImage at 0x12e62818ac0>
../_images/d16227e2116bdf8aae1d00294e5f3eb802cb067a05b61dc487a97d8ecd862fee.png

Plotting a graph with matplotlib#

To plot a graph with matplotlib, like a scatter plot, we need to get the data from the table and feed it to plt.scatter.

Let’s plot the aspect_ratio vs mean_intensity.

x = df['aspect_ratio']
y = df['intensity_mean']

plt.scatter(x, y)
<matplotlib.collections.PathCollection at 0x12e628ee280>
../_images/1e8a389f5e04397de73c3d8f3fcb4a71adc97ad1fe6a30c3f54ebc501f77c848.png

In a similar fashion, it is possible to provide extra arguments to customize plots like this. Below, we change the marker symbol, marker size (s), color and make marker half transparent (alpha).

figure, axes = plt.subplots(figsize=(7,4))

axes.scatter(x, y, color='magenta', marker='*', s=80, alpha=0.5)
axes.set_xlabel('aspect ratio')
axes.set_ylabel('mean intensity')
axes.set_title('Aspect Ratio vs. Intensity')
Text(0.5, 1.0, 'Aspect Ratio vs. Intensity')
../_images/6179bf8496ffa3f15653654a0c09bc0fddd2fb3af5be67005773ac5b27759af0.png

Configuring figure and axes#

Besides plotting graphs as shown above, we usually want to furhter configure the figure and its axes, like provide the names to the axes, change the figure size and maybe have more than one plot in the same figure.

To be able to do all that and more, it is necessary to have handles: variables that represent the figure and the axes objects. We can have access to them by, before plotting, creating an empty figure with the function plt.subplots.

figure, axes = plt.subplots()
../_images/b939873a0ebfe0d1abe1e7218ce8f485bb698d87d0fd5a003af297c5e30b4bbf.png

Let’s add our plot to this new figure. We now do that by passing the scatter function as an axes method.

figure, axes = plt.subplots()
axes.scatter(x, y, color = 'magenta', marker = '*', s = 200, alpha = 0.5)
<matplotlib.collections.PathCollection at 0x12e62b2e970>
../_images/b24ccc53b20d3e83b92f1f884e78ec180735578bdcafc7910629e6b95b693cc0.png

OK, we got the same figure back, so what?

The difference is that now we have access to the figure handles! This adds a lot of editability.

Let’s give axes proper names, put a title and increase the figure size.

Note: the default figure size is [6.4, 4.8] inches (width, height)

figure, axes = plt.subplots(figsize = [10,6])

axes.scatter(x, y, color = 'magenta', marker = '*', s = 200, alpha = 0.5)
axes.set_xlabel('aspect_ratio')
axes.set_ylabel('intensity_mean')
axes.set_title('Aspect Ratio vs Intensity')
Text(0.5, 1.0, 'Aspect Ratio vs Intensity')
../_images/ffd5e7f0611cfae8a3cc86ebc66c6674596ee7bd3ea0b87ef95c20bdad9fdfca.png

Subplots#

So far we are plotting one image or graph per figure containing all the data.

We could also make a grid plot by providing the number of rows and columns of the grid to plt.subplots

figure, axes = plt.subplots(1,2, figsize = [10,6])
../_images/6a34bab41585665e273cbcb0103494e26c25ef47ac5ebf722f90ea75f30aae96.png
axes
array([<Axes: >, <Axes: >], dtype=object)

Now our axes has two elements because we specified 1 row and 2 columns.

Imagine each file was a different experimental group. We can now plot the same relationship, separated by image file on different axes, but in the same figure.

First, we get data separated by ‘file_name’.

# Aspect ratio and intensity where 'file_name' equals first file name
x1 = df[df['file_name'] == '20P1_POS0010_D_1UL']['aspect_ratio']
y1 = df[df['file_name'] == '20P1_POS0010_D_1UL']['intensity_mean']

# Aspect ratio and intensity where 'file_name' equals second file name
x2 = df[df['file_name'] == '20P1_POS0007_D_1UL']['aspect_ratio']
y2 = df[df['file_name'] == '20P1_POS0007_D_1UL']['intensity_mean']

Then, specify an index to the axes to indicate which axis will get the plot.

# Get major_axis_length from table
major_axis_length = df['major_axis_length']
# Create empty figure and axes grid
figure, axes = plt.subplots(1,2, figsize = [10,6])

# Configure plot and properties of first axis
axes[0].scatter(x1, y1, color = 'magenta', marker = '*', s = 200, alpha = 0.5)
axes[0].set_xlabel('aspect_ratio')
axes[0].set_ylabel('intensity_mean')
axes[0].set_title('Image1: Aspect Ratio vs Intensity')

# Configure plot and properties of second axis
axes[1].scatter(x2, y2, color = 'blue', marker = 'D', s = 100, alpha = 0.5)
axes[1].set_xlabel('aspect_ratio')
axes[1].set_ylabel('intensity_mean')
axes[1].set_title('Image2: Aspect Ratio vs Intensity')

# Hint: this command in the end is very useful when axes labels overlap
plt.tight_layout()
../_images/cdd67c4ed1ef64e633fd01f881dffd81217cf0db032fe28f9112d6fc8ece11cc.png

Saving the figure#

Because we have create a figure object and assigned it to the fig variable, we can save the whole figure to disk by running .savefig.

It also allows us to export the figure as raster or vector image.

figure.savefig('aspect_ratio_vs_intensity.png', dpi=300)
figure.savefig('aspect_ratio_vs_intensity_SVG.svg')
from watermark import watermark
watermark(iversions=True, globals_=globals())
print(watermark())
print(watermark(packages="watermark,numpy,pandas,matplotlib,skimage"))
Last updated: 2023-08-25T14:44:59.747389+02:00

Python implementation: CPython
Python version       : 3.9.17
IPython version      : 8.14.0

Compiler    : MSC v.1929 64 bit (AMD64)
OS          : Windows
Release     : 10
Machine     : AMD64
Processor   : Intel64 Family 6 Model 165 Stepping 2, GenuineIntel
CPU cores   : 16
Architecture: 64bit

watermark : 2.4.3
numpy     : 1.23.5
pandas    : 2.0.3
matplotlib: 3.7.2
skimage   : 0.21.0