{ "cells": [ { "cell_type": "markdown", "id": "273def9d-da84-49b7-83be-11695c64c3d9", "metadata": {}, "source": [ "# Appending tables\n", "When processing multiple images, potentially using multiple image processing libraries, a common task is to combine tables.\n", "\n", "We start with two small tables of measurements." ] }, { "cell_type": "code", "execution_count": 6, "id": "1477c227", "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 7, "id": "156d6b7c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelcircularityelongation
010.32.3
120.53.4
230.71.2
\n", "
" ], "text/plain": [ " label circularity elongation\n", "0 1 0.3 2.3\n", "1 2 0.5 3.4\n", "2 3 0.7 1.2" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table1 = pd.DataFrame({\n", " \"label\": [1, 2, 3],\n", " \"circularity\": [0.3, 0.5, 0.7],\n", " \"elongation\": [2.3, 3.4, 1.2],\n", " })\n", "table1" ] }, { "cell_type": "code", "execution_count": 8, "id": "1877c769", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelareaskewness
03220.5
12320.6
21250.3
34180.3
\n", "
" ], "text/plain": [ " label area skewness\n", "0 3 22 0.5\n", "1 2 32 0.6\n", "2 1 25 0.3\n", "3 4 18 0.3" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table2 = pd.DataFrame({\n", " \"label\": [3, 2, 1, 4],\n", " \"area\": [22, 32, 25, 18],\n", " \"skewness\": [0.5, 0.6, 0.3, 0.3],\n", " })\n", "table2" ] }, { "cell_type": "markdown", "id": "49df41b0-b63a-44d0-8b6e-ae6ac7cfa263", "metadata": {}, "source": [ "## Combining columns of tables\n", "According to the [pandas documentation](https://pandas.pydata.org/docs/user_guide/merging.html) there are multiple ways for combining tables. We first use a _wrong_ example to highlight pitfalls when combining tables.\n", "\n", "In the following example, measurements of label 1 and 3 are mixed. Furthermore, one of our tables did not contain measurements for label 4." ] }, { "cell_type": "code", "execution_count": 9, "id": "6f255657", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelcircularityelongationlabelareaskewness
01.00.32.33220.5
12.00.53.42320.6
23.00.71.21250.3
3NaNNaNNaN4180.3
\n", "
" ], "text/plain": [ " label circularity elongation label area skewness\n", "0 1.0 0.3 2.3 3 22 0.5\n", "1 2.0 0.5 3.4 2 32 0.6\n", "2 3.0 0.7 1.2 1 25 0.3\n", "3 NaN NaN NaN 4 18 0.3" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wrongly_combined_tables = pd.concat([table1, table2], axis=1)\n", "wrongly_combined_tables" ] }, { "cell_type": "markdown", "id": "16a4e200-8964-43d1-b43c-eb5977ed195a", "metadata": {}, "source": [ "A better way for combining tables is the `merge` command. It allows to explicitly specify `on` which column the tables should be combined. Data scientists speak of the 'index' or 'identifier' of rows in the tables." ] }, { "cell_type": "code", "execution_count": 10, "id": "29ccfdb1-2b09-46d5-90c2-c4374cb73d02", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelcircularityelongationareaskewness
010.32.3250.3
120.53.4320.6
230.71.2220.5
\n", "
" ], "text/plain": [ " label circularity elongation area skewness\n", "0 1 0.3 2.3 25 0.3\n", "1 2 0.5 3.4 32 0.6\n", "2 3 0.7 1.2 22 0.5" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "correctly_combined_tables1 = pd.merge(table1, table2, how='inner', on='label')\n", "correctly_combined_tables1" ] }, { "cell_type": "markdown", "id": "eebbf929-6cb4-48e4-bff4-60d444c95d49", "metadata": {}, "source": [ "You may note that in the above example, label 4 is missing. We can also get it by out table by performing an `outer join`. " ] }, { "cell_type": "code", "execution_count": 11, "id": "84d160b6-e577-478a-a14d-f4bba371afee", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelcircularityelongationareaskewness
010.32.3250.3
120.53.4320.6
230.71.2220.5
34NaNNaN180.3
\n", "
" ], "text/plain": [ " label circularity elongation area skewness\n", "0 1 0.3 2.3 25 0.3\n", "1 2 0.5 3.4 32 0.6\n", "2 3 0.7 1.2 22 0.5\n", "3 4 NaN NaN 18 0.3" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "correctly_combined_tables2 = pd.merge(table1, table2, how='outer', on='label')\n", "correctly_combined_tables2" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 5 }