Histogram2D Artist#

In this example notebook, we show how to create a histogram 2D plot using the Histogram2D class. The Histogram2D class is a subclass of the Artist class. It has a simplified interface for creating histogram 2D plots and updating some of its properties, like assigning different classes to underlying points and displaying bins/patches with different colors based on a overlay colormap.

It can be imported like shown below:

import numpy as np
import matplotlib.pyplot as plt
from biaplotter.artists import Histogram2D

np.random.seed(2)

Creating a Histogram 2D Plot#

To create an empty histogram 2D plot, just instanciate the Histogram2D class and provide an axes object as an argument.

fig, ax = plt.subplots()
histogram = Histogram2D(ax)
../_images/0cbf1359a7a70fab4cc16a8c1e9c6fe1bc989234a4665de6aff9e1106c6b267b.png

Adding Data to the Histogram 2D Artist#

To add data to the histogram 2D plot, just feed the property data with a (N, 2) shaped numpy array. The plot gets updated automatically every time one of its properties is changed. Below, we have a small function to generate 2 gaussian distributions with different means and standard deviations.

n_samples = 100

def generate_gaussian_data(n_samples):
    """Generate a 2D dataset with two Gaussian clusters."""
    # Gaussian 1
    x1 = np.random.normal(loc=2, scale=1, size=n_samples//2)
    y1 = np.random.normal(loc=2, scale=1, size=n_samples//2)
    # Gaussian 2
    x2 = np.random.normal(loc=-2, scale=0.5, size=n_samples//2)
    y2 = np.random.normal(loc=-2, scale=0.5, size=n_samples//2)
    x_data = np.concatenate([x1, x2])
    y_data = np.concatenate([y1, y2])
    return np.vstack([x_data, y_data]).T

data = generate_gaussian_data(n_samples)
histogram.data = data
fig # show the updated figure
../_images/db9b33e1c011ada53326ea1f7ff8c6b8c8895b48abd7910a793d488f261440f7.png

Assigning Classes to Data Points#

The Histogram2D artist comes with a custom categorical overlay colormap, which can be used to assign different classes to underlying points and display the bins/patches with the corresponding class color as an overlay. You can access the histogram current categorical colormap via overlay_colormap attribute.

histogram.overlay_colormap
cat10_modified_first_transparent
cat10_modified_first_transparent colormap
under
bad
over

To assign classes to underlying points, just feed the property color_indices with a (N,) shaped numpy array containing integers. These integers will be used as indices to the colormap.

Histogram2D has a convenience method called indices_in_patches_above_threshold, which finds histogram patches where counts are above threshold and returns the indices of the points belonging to those patches.

Below, we define two different thresholds and get the indices of points that fall in those patches. We assign classes 1 and 2 to those indices and feed this array to color_indices.

Note that class 0 represents the background, thus, color_indices 0 are transparent for the histogram.

threshold_1 = 2
threshold_2 = 4

indices_in_patches_above_threshold_1 = histogram.indices_in_patches_above_threshold(threshold_1)
indices_in_patches_above_threshold_2 = histogram.indices_in_patches_above_threshold(threshold_2)

color_indices = np.zeros(n_samples, dtype=int)
color_indices[indices_in_patches_above_threshold_1] = 1
color_indices[indices_in_patches_above_threshold_2] = 2


histogram.color_indices = color_indices
fig
../_images/c5a8d53322808821619ca669affe9f0e36d818e57a288cf82c640bcc95f81931.png
print("Histogram color_indices:\n", histogram.color_indices)
Histogram color_indices:
 [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
 0 0 1 0 1 0 0 0 0 1 1 0 0 0 1 1 0 2 0 0 1 0 0 0 2 1 0 1 2 0 0 0 1 0 0 1 1
 1 0 1 1 0 1 0 1 1 1 2 0 0 1 1 1 1 2 0 0 0 0 0 1 1 1]

If new data of a different size is added, the previous data values are overwritten and the respective color_indices are reset to all zeros. This makes sure that the color_indices are always in snc with the amount of data in the plot.

# Adding 400 more samples
n_samples = 400
data = np.concatenate([data, generate_gaussian_data(n_samples)])

histogram.data = data
fig
../_images/101504949d057af01b8b48a2dab89d5cddaa4327f5b790ff06f8cf25dee5fdab.png
print("Histogram color_indices (up to 100th index):\n", histogram.color_indices[:100])
Histogram color_indices (up to 100th index):
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

If you want to clear an existing, non-zero array of color_indices (i.e., an histogram overlay) we can reset colors by setting color_indices to 0 (default) or np.nan.

histogram.color_indices = 0
fig
../_images/101504949d057af01b8b48a2dab89d5cddaa4327f5b790ff06f8cf25dee5fdab.png

Properties#

Histogram Colormap#

You can change the histogram colormap by setting the histogram_colormap attribute. Below, we show the default histogram colormap, which is magma.

histogram.histogram_colormap
magma
magma colormap
under
bad
over

Here we display another colormap from matplotlib (viridis) and assign it to the histogram.

plt.cm.viridis
viridis
viridis colormap
under
bad
over
histogram.histogram_colormap = plt.cm.viridis
fig
../_images/13f5aefd2922445dc9e3810dadfe1b9b194fefa60017ab8fb9d3b564ce65fda5.png

Histogram Color Normalization#

You can change the histogram color normalization by setting the histogram_color_normalization_method attribute. Below, we show the default color normalization, which is linear, by log. This is useful for visualizing data with a wide range of values.

histogram.histogram_color_normalization_method = 'log'
fig
C:\Users\mazo260d\Documents\GitHub\biaplotter\src\biaplotter\artists_base.py:248: UserWarning: Log normalization applied to color indices with min value 0.01. Values below 0.01 were set to 0.01.
  warnings.warn(
../_images/6348975d95d573b1837cd42e4be9c193525bdd24bd381686de1e59d60cce819b.png

Histogram Interpolation#

You can change the histogram interpolation by setting the histogram_interpolation attribute. Below, we replace the default interpolation, which is nearest, by bilinear.

histogram.histogram_interpolation = 'bilinear'
fig
../_images/783c7bc716cda819090c968772570642ae35ec6fdc5f3cb01e6e52791ee1bb26.png
histogram.histogram_interpolation = 'nearest'
fig
../_images/6348975d95d573b1837cd42e4be9c193525bdd24bd381686de1e59d60cce819b.png

Histogram Bins#

You can set the number of bins in the histogram by setting the bins attribute. The default number of bins is 20.

histogram.bins = 50
fig
../_images/d3081bd10c0103b96e055f9695100014ae65cff5187583c363367ccf36002763.png

Histogram Minimum Count#

You can choose a minimum count value for the histogram by setting the cmin attribute. The default value is 0. This will make patches with counts below this value transparent and shift the histogram colormap visualization accordingly.

histogram.cmin = 1
fig
C:\Users\mazo260d\Documents\GitHub\biaplotter\src\biaplotter\artists_base.py:248: UserWarning: Log normalization applied to color indices with min value 1.0. Values below 0.01 were set to 0.01.
  warnings.warn(
../_images/29780bb3fe3a29dee0084b689dc72e5a302388fd25e2d5581e9b53bed8bcb15f.png
histogram.cmin = 0
fig
C:\Users\mazo260d\Documents\GitHub\biaplotter\src\biaplotter\artists_base.py:248: UserWarning: Log normalization applied to color indices with min value 0.01. Values below 0.01 were set to 0.01.
  warnings.warn(
../_images/d3081bd10c0103b96e055f9695100014ae65cff5187583c363367ccf36002763.png

Assigning a Feature as an Overlay#

You can assign a feature to be displayed as an overlay on the histogram. This feature can be a continuous or categorical feature. Let’s assign the x coordinate as an overlay and use a different colormap (‘jet’) to display it.

plt.cm.jet
jet
jet colormap
under
bad
over
feature = data[:, 0] # x coordinates
histogram.overlay_colormap = plt.cm.jet
histogram.color_indices = feature
fig
../_images/0dbf76e56134cdccafc2c543dca4591f6277bf7bc15438dbd8ad2edc4d393b02.png

Of course now we lose the histogram count information, but we can edit the overlay opacity or visibility to better see the original histogram again.

histogram.overlay_opacity = 0.4
fig
../_images/a629ee6119b17b52ecf02d2a471e4a6e89b13f3fad8132e25abc5b31850fa12b.png
histogram.overlay_visible = False
fig
../_images/d3081bd10c0103b96e055f9695100014ae65cff5187583c363367ccf36002763.png

Here we restore the defualt overlay opacity and visibility.

histogram.overlay_opacity = 1
histogram.overlay_visible = True
fig
../_images/0dbf76e56134cdccafc2c543dca4591f6277bf7bc15438dbd8ad2edc4d393b02.png

We can also apply other color normalization method and interpolation to the overlay. Below, we change the overlay normalization to symlog and the interpolation to bilinear.

histogram.overlay_color_normalization_method = 'symlog'
histogram.overlay_interpolation = 'bilinear'
fig
../_images/e6a9b48c1da13bb333e5432cd3ddbb3089efbe424bd41b960953ecd26099fb86.png

Again, we can clear the overlay again by setting color_indices to np.nan.

histogram.color_indices = np.nan
fig
c:\Users\mazo260d\miniforge3\envs\biaplotter\Lib\site-packages\numpy\lib\_nanfunctions_impl.py:1437: RuntimeWarning: All-NaN slice encountered
  return _nanquantile_unchecked(
../_images/d3081bd10c0103b96e055f9695100014ae65cff5187583c363367ccf36002763.png

Histogram Visibility#

Optionally, hide/show the artist by setting the visible attribute.

histogram.visible = False
fig
../_images/fcd0e2237e125f35ebc13c0d27bf85ef182487570655f7bae0bd9cbfad00ec3c.png
histogram.visible = True
fig
../_images/d3081bd10c0103b96e055f9695100014ae65cff5187583c363367ccf36002763.png

Resetting the Histogram2D Artist#

The Histogram2D artist can be reset to its default state by calling the reset() method. This will remove all data and properties set on the artist, and it will be ready to accept new data.

histogram.reset()
fig
../_images/fcd0e2237e125f35ebc13c0d27bf85ef182487570655f7bae0bd9cbfad00ec3c.png