Appending tables#

When processing multiple images, potentially using multiple image processing libraries, a common task is to combine tables.

We start with two small tables of measurements.

import pandas as pd
table1 = pd.DataFrame({
    "label":       [1,   2,   3],
    "circularity": [0.3, 0.5, 0.7],
    "elongation":  [2.3, 3.4, 1.2],
    })
table1
label circularity elongation
0 1 0.3 2.3
1 2 0.5 3.4
2 3 0.7 1.2
table2 = pd.DataFrame({
    "label":    [3,   2,   1,   4],
    "area":     [22,  32,  25,  18],
    "skewness": [0.5, 0.6, 0.3, 0.3],
    })
table2
label area skewness
0 3 22 0.5
1 2 32 0.6
2 1 25 0.3
3 4 18 0.3

Combining columns of tables#

According to the pandas documentation there are multiple ways for combining tables. We first use a wrong example to highlight pitfalls when combining tables.

In the following example, measurements of label 1 and 3 are mixed. Furthermore, one of our tables did not contain measurements for label 4.

wrongly_combined_tables = pd.concat([table1, table2], axis=1)
wrongly_combined_tables
label circularity elongation label area skewness
0 1.0 0.3 2.3 3 22 0.5
1 2.0 0.5 3.4 2 32 0.6
2 3.0 0.7 1.2 1 25 0.3
3 NaN NaN NaN 4 18 0.3

A better way for combining tables is the merge command. It allows to explicitly specify on which column the tables should be combined. Data scientists speak of the ‘index’ or ‘identifier’ of rows in the tables.

correctly_combined_tables1 = pd.merge(table1, table2, how='inner', on='label')
correctly_combined_tables1
label circularity elongation area skewness
0 1 0.3 2.3 25 0.3
1 2 0.5 3.4 32 0.6
2 3 0.7 1.2 22 0.5

You may note that in the above example, label 4 is missing. We can also get it by out table by performing an outer join.

correctly_combined_tables2 = pd.merge(table1, table2, how='outer', on='label')
correctly_combined_tables2
label circularity elongation area skewness
0 1 0.3 2.3 25 0.3
1 2 0.5 3.4 32 0.6
2 3 0.7 1.2 22 0.5
3 4 NaN NaN 18 0.3