Correlation matrix#

In practice (particularly in image analysis) we often calculate a large variety of features that may often be strongly correlated with other features. The introduced correlation coefficients can help us to identify groups of redundant features.

from skimage import data, filters, measure
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

image = data.human_mitosis()

fig, ax = plt.subplots()
ax.imshow(image, cmap='gray')

<matplotlib.image.AxesImage at 0x2481148dbe0>

../_images/03_correlation_matrix_3_1.png

binary = image > filters.threshold_otsu(image)
labels = measure.label(binary)

fig, ax = plt.subplots()
ax.imshow(labels)

<matplotlib.image.AxesImage at 0x2481636d910>

../_images/03_correlation_matrix_5_1.png

props = measure.regionprops_table(labels, intensity_image=image, properties=['area', 'area_bbox', 'area_convex',
                                                                    'area_filled', 'axis_major_length',
                                                                    'axis_minor_length', 'eccentricity',
                                                                    'equivalent_diameter_area', 'extent',
                                                                    'feret_diameter_max', 'intensity_max',
                                                                    'intensity_mean', 'intensity_min'])
df = pd.DataFrame(props)
df

	area	area_bbox	area_convex	area_filled	axis_major_length	axis_minor_length	eccentricity	equivalent_diameter_area	extent	feret_diameter_max	intensity_max	intensity_mean	intensity_min
0	62	70	63	62	10.571311	7.557049	0.699264	8.884866	0.885714	10.770330	63.0	50.645161	40.0
1	7	7	7	7	8.000000	0.000000	1.000000	2.985411	1.000000	7.000000	68.0	58.285714	39.0
2	121	143	124	121	13.746529	11.516064	0.546064	12.412171	0.846154	14.317821	82.0	61.487603	39.0
3	19	24	20	19	6.674754	3.805741	0.821527	4.918491	0.791667	6.708204	78.0	58.473684	39.0
4	62	80	65	62	11.482908	6.872199	0.801144	8.884866	0.775000	11.661904	86.0	63.387097	42.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...
288	45	60	48	45	11.333091	5.339585	0.882053	7.569398	0.750000	12.041595	102.0	78.533333	42.0
289	49	90	61	49	18.128803	4.509369	0.968570	7.898654	0.544444	18.027756	100.0	73.387755	40.0
290	39	50	42	39	9.496172	5.480726	0.816637	7.046726	0.780000	10.049876	87.0	66.000000	39.0
291	4	4	4	4	4.472136	0.000000	1.000000	2.256758	1.000000	4.000000	59.0	53.750000	45.0
292	4	4	4	4	4.472136	0.000000	1.000000	2.256758	1.000000	4.000000	41.0	40.250000	39.0

293 rows × 13 columns

We can calculate a correlation matrix using a given correlation metric with pandas:

correlation_matrix = df.corr(method='pearson')

It seems obvious that there is quite a large number of features that are strongly connected to each other - Seaborn offers the heatmap function for this:

ax = sns.heatmap(correlation_matrix, annot=False, vmin=-1, vmax=1)

../_images/03_correlation_matrix_10_0.png

Maybe we can make this even clearer by rearranging some of the columns/rows. We can use the seaborn clustermap feature for this:

fig = sns.clustermap(correlation_matrix, vmin=-1, vmax=1, cmap='twilight')

../_images/03_correlation_matrix_12_0.png

Quantitative Bio-image Analysis with Python

Correlation matrix

Correlation matrix#