Binarization
Contents
Binarization#
Binarization is a data transformation that splits them into two groups. The value that splits them is called threshold value.
import numpy as np
from skimage import data
from skimage.io import imread
import matplotlib.pyplot as plt
from skimage.exposure import histogram
Image Thesholding#
Consider the image below. We can see at least two categories:
- cells;
- background;
If we want to write a code to quantify some feature of the cells, we eventually have to binarize the image, i.e., split the image into these two groups.
image_path = '../../data/mitosis_mod.tif'
image_cells = imread(image_path)
plt.imshow(image_cells, cmap='gray')
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x1eef00338b0>
We can see the range of pixels by the colorbar or by the code below:
print('max pixel value = ', image_cells.max())
print('min pixel value = ', image_cells.min())
max pixel value = 255
min pixel value = 8
Therefore, we can choose a value in between that can split the image into two groups. Let’s try with a value in the middle (125).
Thus, we can binarize the image like this:
image_binary = image_cells > 125
plt.imshow(image_binary, cmap='gray')
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x1eef019c5e0>
Ok maybe 125 was a bad choice… We could at least guess with more insight. The histogram is a useful tool for that.
plt.figure(figsize=(8,4)) # Create an empty figure with a specific screen size
piexl_counts, pixel_values = histogram(image_cells) # Calculate the image histogram
plt.bar(pixel_values, piexl_counts, width=2) # Plot the histogram
<BarContainer object of 248 artists>
Exercise#
Guided by the histogram, try to improve the binarization by finding a better value for the threshold and plot the new binary image. Which value did you choose?
value = 125
image_binary = image_cells > value
plt.imshow(image_binary, cmap='gray')
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x1eef04c5430>
Important note: Manually setting the threshold may only be interesting while exploring the data, not as part of an image analysis workflow.
A few reasons why:
- We do not want to choose this every time by hand;
- We would like to avoid introducing bias;
- We would like the workflow to be easily reproducible by others.
Fortunately, there are several very robust methods that automatically calculate the threshold value based the image itself.
One of them was developed by Nobuyuki Otsu in 1979! It is called… Otsu’s method. There is an implementation of it in scikit-image.