Appending tables
Contents
Appending tables#
When processing multiple images, potentially using multiple image processing libraries, a common task is to combine tables.
We start with two small tables of measurements.
import pandas as pd
table1 = pd.DataFrame({
"label": [1, 2, 3],
"circularity": [0.3, 0.5, 0.7],
"elongation": [2.3, 3.4, 1.2],
})
table1
label | circularity | elongation | |
---|---|---|---|
0 | 1 | 0.3 | 2.3 |
1 | 2 | 0.5 | 3.4 |
2 | 3 | 0.7 | 1.2 |
table2 = pd.DataFrame({
"label": [3, 2, 1, 4],
"area": [22, 32, 25, 18],
"skewness": [0.5, 0.6, 0.3, 0.3],
})
table2
label | area | skewness | |
---|---|---|---|
0 | 3 | 22 | 0.5 |
1 | 2 | 32 | 0.6 |
2 | 1 | 25 | 0.3 |
3 | 4 | 18 | 0.3 |
Combining columns of tables#
According to the pandas documentation there are multiple ways for combining tables. We first use a wrong example to highlight pitfalls when combining tables.
In the following example, measurements of label 1 and 3 are mixed. Furthermore, one of our tables did not contain measurements for label 4.
wrongly_combined_tables = pd.concat([table1, table2], axis=1)
wrongly_combined_tables
label | circularity | elongation | label | area | skewness | |
---|---|---|---|---|---|---|
0 | 1.0 | 0.3 | 2.3 | 3 | 22 | 0.5 |
1 | 2.0 | 0.5 | 3.4 | 2 | 32 | 0.6 |
2 | 3.0 | 0.7 | 1.2 | 1 | 25 | 0.3 |
3 | NaN | NaN | NaN | 4 | 18 | 0.3 |
A better way for combining tables is the merge
command. It allows to explicitly specify on
which column the tables should be combined. Data scientists speak of the ‘index’ or ‘identifier’ of rows in the tables.
correctly_combined_tables1 = pd.merge(table1, table2, how='inner', on='label')
correctly_combined_tables1
label | circularity | elongation | area | skewness | |
---|---|---|---|---|---|
0 | 1 | 0.3 | 2.3 | 25 | 0.3 |
1 | 2 | 0.5 | 3.4 | 32 | 0.6 |
2 | 3 | 0.7 | 1.2 | 22 | 0.5 |
You may note that in the above example, label 4 is missing. We can also get it by out table by performing an outer join
.
correctly_combined_tables2 = pd.merge(table1, table2, how='outer', on='label')
correctly_combined_tables2
label | circularity | elongation | area | skewness | |
---|---|---|---|---|---|
0 | 1 | 0.3 | 2.3 | 25 | 0.3 |
1 | 2 | 0.5 | 3.4 | 32 | 0.6 |
2 | 3 | 0.7 | 1.2 | 22 | 0.5 |
3 | 4 | NaN | NaN | 18 | 0.3 |