{
"cells": [
{
"cell_type": "markdown",
"id": "273def9d-da84-49b7-83be-11695c64c3d9",
"metadata": {},
"source": [
"# Appending tables\n",
"When processing multiple images, potentially using multiple image processing libraries, a common task is to combine tables.\n",
"\n",
"We start with two small tables of measurements."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "1477c227",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "156d6b7c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" label | \n",
" circularity | \n",
" elongation | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0.3 | \n",
" 2.3 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 0.5 | \n",
" 3.4 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 0.7 | \n",
" 1.2 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" label circularity elongation\n",
"0 1 0.3 2.3\n",
"1 2 0.5 3.4\n",
"2 3 0.7 1.2"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table1 = pd.DataFrame({\n",
" \"label\": [1, 2, 3],\n",
" \"circularity\": [0.3, 0.5, 0.7],\n",
" \"elongation\": [2.3, 3.4, 1.2],\n",
" })\n",
"table1"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "1877c769",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" label | \n",
" area | \n",
" skewness | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 3 | \n",
" 22 | \n",
" 0.5 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 32 | \n",
" 0.6 | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" 25 | \n",
" 0.3 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" 18 | \n",
" 0.3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" label area skewness\n",
"0 3 22 0.5\n",
"1 2 32 0.6\n",
"2 1 25 0.3\n",
"3 4 18 0.3"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table2 = pd.DataFrame({\n",
" \"label\": [3, 2, 1, 4],\n",
" \"area\": [22, 32, 25, 18],\n",
" \"skewness\": [0.5, 0.6, 0.3, 0.3],\n",
" })\n",
"table2"
]
},
{
"cell_type": "markdown",
"id": "49df41b0-b63a-44d0-8b6e-ae6ac7cfa263",
"metadata": {},
"source": [
"## Combining columns of tables\n",
"According to the [pandas documentation](https://pandas.pydata.org/docs/user_guide/merging.html) there are multiple ways for combining tables. We first use a _wrong_ example to highlight pitfalls when combining tables.\n",
"\n",
"In the following example, measurements of label 1 and 3 are mixed. Furthermore, one of our tables did not contain measurements for label 4."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "6f255657",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" label | \n",
" circularity | \n",
" elongation | \n",
" label | \n",
" area | \n",
" skewness | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1.0 | \n",
" 0.3 | \n",
" 2.3 | \n",
" 3 | \n",
" 22 | \n",
" 0.5 | \n",
"
\n",
" \n",
" 1 | \n",
" 2.0 | \n",
" 0.5 | \n",
" 3.4 | \n",
" 2 | \n",
" 32 | \n",
" 0.6 | \n",
"
\n",
" \n",
" 2 | \n",
" 3.0 | \n",
" 0.7 | \n",
" 1.2 | \n",
" 1 | \n",
" 25 | \n",
" 0.3 | \n",
"
\n",
" \n",
" 3 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 4 | \n",
" 18 | \n",
" 0.3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" label circularity elongation label area skewness\n",
"0 1.0 0.3 2.3 3 22 0.5\n",
"1 2.0 0.5 3.4 2 32 0.6\n",
"2 3.0 0.7 1.2 1 25 0.3\n",
"3 NaN NaN NaN 4 18 0.3"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wrongly_combined_tables = pd.concat([table1, table2], axis=1)\n",
"wrongly_combined_tables"
]
},
{
"cell_type": "markdown",
"id": "16a4e200-8964-43d1-b43c-eb5977ed195a",
"metadata": {},
"source": [
"A better way for combining tables is the `merge` command. It allows to explicitly specify `on` which column the tables should be combined. Data scientists speak of the 'index' or 'identifier' of rows in the tables."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "29ccfdb1-2b09-46d5-90c2-c4374cb73d02",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" label | \n",
" circularity | \n",
" elongation | \n",
" area | \n",
" skewness | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0.3 | \n",
" 2.3 | \n",
" 25 | \n",
" 0.3 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 0.5 | \n",
" 3.4 | \n",
" 32 | \n",
" 0.6 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 0.7 | \n",
" 1.2 | \n",
" 22 | \n",
" 0.5 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" label circularity elongation area skewness\n",
"0 1 0.3 2.3 25 0.3\n",
"1 2 0.5 3.4 32 0.6\n",
"2 3 0.7 1.2 22 0.5"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"correctly_combined_tables1 = pd.merge(table1, table2, how='inner', on='label')\n",
"correctly_combined_tables1"
]
},
{
"cell_type": "markdown",
"id": "eebbf929-6cb4-48e4-bff4-60d444c95d49",
"metadata": {},
"source": [
"You may note that in the above example, label 4 is missing. We can also get it by out table by performing an `outer join`. "
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "84d160b6-e577-478a-a14d-f4bba371afee",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" label | \n",
" circularity | \n",
" elongation | \n",
" area | \n",
" skewness | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0.3 | \n",
" 2.3 | \n",
" 25 | \n",
" 0.3 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 0.5 | \n",
" 3.4 | \n",
" 32 | \n",
" 0.6 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 0.7 | \n",
" 1.2 | \n",
" 22 | \n",
" 0.5 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" NaN | \n",
" NaN | \n",
" 18 | \n",
" 0.3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" label circularity elongation area skewness\n",
"0 1 0.3 2.3 25 0.3\n",
"1 2 0.5 3.4 32 0.6\n",
"2 3 0.7 1.2 22 0.5\n",
"3 4 NaN NaN 18 0.3"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"correctly_combined_tables2 = pd.merge(table1, table2, how='outer', on='label')\n",
"correctly_combined_tables2"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}