{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "273def9d-da84-49b7-83be-11695c64c3d9",
   "metadata": {},
   "source": [
    "# Appending tables\n",
    "When processing multiple images, potentially using multiple image processing libraries, a common task is to combine tables.\n",
    "\n",
    "We start with two small tables of measurements."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "1477c227",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "156d6b7c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>circularity</th>\n",
       "      <th>elongation</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0.3</td>\n",
       "      <td>2.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>0.5</td>\n",
       "      <td>3.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>0.7</td>\n",
       "      <td>1.2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   label  circularity  elongation\n",
       "0      1          0.3         2.3\n",
       "1      2          0.5         3.4\n",
       "2      3          0.7         1.2"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table1 = pd.DataFrame({\n",
    "    \"label\":       [1,   2,   3],\n",
    "    \"circularity\": [0.3, 0.5, 0.7],\n",
    "    \"elongation\":  [2.3, 3.4, 1.2],\n",
    "    })\n",
    "table1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "1877c769",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>area</th>\n",
       "      <th>skewness</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>22</td>\n",
       "      <td>0.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>32</td>\n",
       "      <td>0.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>25</td>\n",
       "      <td>0.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>18</td>\n",
       "      <td>0.3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   label  area  skewness\n",
       "0      3    22       0.5\n",
       "1      2    32       0.6\n",
       "2      1    25       0.3\n",
       "3      4    18       0.3"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table2 = pd.DataFrame({\n",
    "    \"label\":    [3,   2,   1,   4],\n",
    "    \"area\":     [22,  32,  25,  18],\n",
    "    \"skewness\": [0.5, 0.6, 0.3, 0.3],\n",
    "    })\n",
    "table2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "49df41b0-b63a-44d0-8b6e-ae6ac7cfa263",
   "metadata": {},
   "source": [
    "## Combining columns of tables\n",
    "According to the [pandas documentation](https://pandas.pydata.org/docs/user_guide/merging.html) there are multiple ways for combining tables. We first use a _wrong_ example to highlight pitfalls when combining tables.\n",
    "\n",
    "In the following example, measurements of label 1 and 3 are mixed. Furthermore, one of our tables did not contain measurements for label 4."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "6f255657",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>circularity</th>\n",
       "      <th>elongation</th>\n",
       "      <th>label</th>\n",
       "      <th>area</th>\n",
       "      <th>skewness</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.3</td>\n",
       "      <td>2.3</td>\n",
       "      <td>3</td>\n",
       "      <td>22</td>\n",
       "      <td>0.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2.0</td>\n",
       "      <td>0.5</td>\n",
       "      <td>3.4</td>\n",
       "      <td>2</td>\n",
       "      <td>32</td>\n",
       "      <td>0.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3.0</td>\n",
       "      <td>0.7</td>\n",
       "      <td>1.2</td>\n",
       "      <td>1</td>\n",
       "      <td>25</td>\n",
       "      <td>0.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4</td>\n",
       "      <td>18</td>\n",
       "      <td>0.3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   label  circularity  elongation  label  area  skewness\n",
       "0    1.0          0.3         2.3      3    22       0.5\n",
       "1    2.0          0.5         3.4      2    32       0.6\n",
       "2    3.0          0.7         1.2      1    25       0.3\n",
       "3    NaN          NaN         NaN      4    18       0.3"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wrongly_combined_tables = pd.concat([table1, table2], axis=1)\n",
    "wrongly_combined_tables"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16a4e200-8964-43d1-b43c-eb5977ed195a",
   "metadata": {},
   "source": [
    "A better way for combining tables is the `merge` command. It allows to explicitly specify `on` which column the tables should be combined. Data scientists speak of the 'index' or 'identifier' of rows in the tables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "29ccfdb1-2b09-46d5-90c2-c4374cb73d02",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>circularity</th>\n",
       "      <th>elongation</th>\n",
       "      <th>area</th>\n",
       "      <th>skewness</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0.3</td>\n",
       "      <td>2.3</td>\n",
       "      <td>25</td>\n",
       "      <td>0.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>0.5</td>\n",
       "      <td>3.4</td>\n",
       "      <td>32</td>\n",
       "      <td>0.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>0.7</td>\n",
       "      <td>1.2</td>\n",
       "      <td>22</td>\n",
       "      <td>0.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   label  circularity  elongation  area  skewness\n",
       "0      1          0.3         2.3    25       0.3\n",
       "1      2          0.5         3.4    32       0.6\n",
       "2      3          0.7         1.2    22       0.5"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "correctly_combined_tables1 = pd.merge(table1, table2, how='inner', on='label')\n",
    "correctly_combined_tables1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eebbf929-6cb4-48e4-bff4-60d444c95d49",
   "metadata": {},
   "source": [
    "You may note that in the above example, label 4 is missing. We can also get it by out table by performing an `outer join`.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "84d160b6-e577-478a-a14d-f4bba371afee",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>circularity</th>\n",
       "      <th>elongation</th>\n",
       "      <th>area</th>\n",
       "      <th>skewness</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0.3</td>\n",
       "      <td>2.3</td>\n",
       "      <td>25</td>\n",
       "      <td>0.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>0.5</td>\n",
       "      <td>3.4</td>\n",
       "      <td>32</td>\n",
       "      <td>0.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>0.7</td>\n",
       "      <td>1.2</td>\n",
       "      <td>22</td>\n",
       "      <td>0.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>18</td>\n",
       "      <td>0.3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   label  circularity  elongation  area  skewness\n",
       "0      1          0.3         2.3    25       0.3\n",
       "1      2          0.5         3.4    32       0.6\n",
       "2      3          0.7         1.2    22       0.5\n",
       "3      4          NaN         NaN    18       0.3"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "correctly_combined_tables2 = pd.merge(table1, table2, how='outer', on='label')\n",
    "correctly_combined_tables2"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}