{
    "cells": [
        {
            "cell_type": "markdown",
            "id": "99c606f8-037f-4258-81e7-a9f4ac511242",
            "metadata": {},
            "source": [
                "# Introduction to working with DataFrames\n",
                "In basic python, we often use dictionaries containing our measurements as vectors. While these basic structures are handy for collecting data, they are suboptimal for further data processing. For that, we introduce [pandas DataFrames](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), the primary tool in the Python ecosystem for handling data. Its primary object, the \"DataFrame\" is extremely useful in wrangling data. which are more handy in the next steps."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 1,
            "id": "0cfceb6c-1acc-4632-b084-8b0871a7c50a",
            "metadata": {},
            "outputs": [],
            "source": [
                "import pandas as pd"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "8b77888b-c9a8-4a67-a4eb-f7df46eda970",
            "metadata": {},
            "source": [
                "## Creating DataFrames from a dictionary of lists\n",
                "Assume we did some image processing and have some results available in a dictionary that contains lists of numbers:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 2,
            "id": "ff80484f-657b-4231-8d8f-cdc26577542b",
            "metadata": {},
            "outputs": [],
            "source": [
                "measurements = {\n",
                "    \"labels\":      [1, 2, 3],\n",
                "    \"area\":       [45, 23, 68],\n",
                "    \"minor_axis\": [2, 4, 4],\n",
                "    \"major_axis\": [3, 4, 5],\n",
                "}"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "b2afa6a9-e15c-4147-bdd4-ec4d4f87fb36",
            "metadata": {},
            "source": [
                "This data structure can be nicely visualized using a DataFrame:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 3,
            "id": "8bf4e4b5-ef72-4f63-84d2-48cc3a77c297",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>labels</th>\n",
                            "      <th>area</th>\n",
                            "      <th>minor_axis</th>\n",
                            "      <th>major_axis</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>1</td>\n",
                            "      <td>45</td>\n",
                            "      <td>2</td>\n",
                            "      <td>3</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>2</td>\n",
                            "      <td>23</td>\n",
                            "      <td>4</td>\n",
                            "      <td>4</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>3</td>\n",
                            "      <td>68</td>\n",
                            "      <td>4</td>\n",
                            "      <td>5</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "   labels  area  minor_axis  major_axis\n",
                            "0       1    45           2           3\n",
                            "1       2    23           4           4\n",
                            "2       3    68           4           5"
                        ]
                    },
                    "execution_count": 3,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df = pd.DataFrame(measurements)\n",
                "df"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "930c082b-8f16-4711-b3e0-e56a7ec6d272",
            "metadata": {},
            "source": [
                "Using these DataFrames, data modification is straighforward. For example one can append a new column and compute its values from existing columns. This is done elementwise."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 4,
            "id": "a34866ff-a2cb-4a7c-a4e8-4544559b634c",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>labels</th>\n",
                            "      <th>area</th>\n",
                            "      <th>minor_axis</th>\n",
                            "      <th>major_axis</th>\n",
                            "      <th>aspect_ratio</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>1</td>\n",
                            "      <td>45</td>\n",
                            "      <td>2</td>\n",
                            "      <td>3</td>\n",
                            "      <td>1.50</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>2</td>\n",
                            "      <td>23</td>\n",
                            "      <td>4</td>\n",
                            "      <td>4</td>\n",
                            "      <td>1.00</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>3</td>\n",
                            "      <td>68</td>\n",
                            "      <td>4</td>\n",
                            "      <td>5</td>\n",
                            "      <td>1.25</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "   labels  area  minor_axis  major_axis  aspect_ratio\n",
                            "0       1    45           2           3          1.50\n",
                            "1       2    23           4           4          1.00\n",
                            "2       3    68           4           5          1.25"
                        ]
                    },
                    "execution_count": 4,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df[\"aspect_ratio\"] = df[\"major_axis\"] / df[\"minor_axis\"]\n",
                "df"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "201a2142-22c7-4607-bc2d-f1dfce4c7e26",
            "metadata": {},
            "source": [
                "## Saving data frames\n",
                "We can also save this table for continuing to work with it. We chose to save it as a CSV file, where CSV stands for comma-separated value. This is a text file that is easily read into data structures in many programming languages."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 5,
            "id": "fb01d2d9-4d8b-4b6a-b158-9516a581e000",
            "metadata": {},
            "outputs": [],
            "source": [
                "df.to_csv(\"../../data/short_table.csv\")"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "2677e5a9-8d1b-4454-b009-4ac26e549b2d",
            "metadata": {},
            "source": [
                "You should generally always store your data in such a format, not necessarily CSV, but a format that is open, has a well-defined specification, and is readable in many contexts. Excel files do not meet these criteria. Neither do .mat files."
            ]
        },
        {
            "cell_type": "markdown",
            "id": "0240857d-292f-4ac3-ba87-8878aa941cde",
            "metadata": {},
            "source": [
                "## Creating DataFrames from lists of lists\n",
                "Sometimes, we are confronted to data in form of lists of lists. To make pandas understand that form of data correctly, we also need to provide the headers in the same order as the lists"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 6,
            "id": "c72a82b1-4da6-468d-afa6-149cb00f7d37",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>0</th>\n",
                            "      <th>1</th>\n",
                            "      <th>2</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>labels</th>\n",
                            "      <td>1</td>\n",
                            "      <td>2</td>\n",
                            "      <td>3</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>area</th>\n",
                            "      <td>45</td>\n",
                            "      <td>23</td>\n",
                            "      <td>68</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>minor_axis</th>\n",
                            "      <td>2</td>\n",
                            "      <td>4</td>\n",
                            "      <td>4</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>major_axis</th>\n",
                            "      <td>3</td>\n",
                            "      <td>4</td>\n",
                            "      <td>5</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "             0   1   2\n",
                            "labels       1   2   3\n",
                            "area        45  23  68\n",
                            "minor_axis   2   4   4\n",
                            "major_axis   3   4   5"
                        ]
                    },
                    "execution_count": 6,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "row_header = ['labels', 'area', 'minor_axis', 'major_axis']\n",
                "\n",
                "data = [\n",
                "    [1, 2, 3],      # first row\n",
                "    [45, 23, 68],   # second row\n",
                "    [2, 4, 4],      # third row\n",
                "    [3, 4, 5],      # fourth row\n",
                "]\n",
                "          \n",
                "# convert the data and header arrays in a pandas data frame\n",
                "data_frame = pd.DataFrame(data, row_header)\n",
                "\n",
                "# show it\n",
                "data_frame"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "9f64996f",
            "metadata": {},
            "source": [
                "## Or from a 2D numpy array"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 7,
            "id": "320a8295",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>area</th>\n",
                            "      <th>minor_axis</th>\n",
                            "      <th>major_axis</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>0.800797</td>\n",
                            "      <td>0.421862</td>\n",
                            "      <td>0.956755</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>0.470318</td>\n",
                            "      <td>0.431116</td>\n",
                            "      <td>0.171390</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>0.210930</td>\n",
                            "      <td>0.076568</td>\n",
                            "      <td>0.199061</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>0.050682</td>\n",
                            "      <td>0.346919</td>\n",
                            "      <td>0.254751</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "       area  minor_axis  major_axis\n",
                            "0  0.800797    0.421862    0.956755\n",
                            "1  0.470318    0.431116    0.171390\n",
                            "2  0.210930    0.076568    0.199061\n",
                            "3  0.050682    0.346919    0.254751"
                        ]
                    },
                    "execution_count": 7,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "import numpy as np\n",
                "\n",
                "data = np.random.random((4,3))\n",
                "column_header = ['area', 'minor_axis', 'major_axis']\n",
                "\n",
                "pd.DataFrame(data, columns=column_header)"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "a8b1b6b0-027c-4536-8710-e3f87aca1896",
            "metadata": {},
            "source": [
                "As you can see, this table is _rotated_. We can bring it in the usual form like this:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 8,
            "id": "40669e82-4264-4883-9c4e-8a366b061610",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>labels</th>\n",
                            "      <th>area</th>\n",
                            "      <th>minor_axis</th>\n",
                            "      <th>major_axis</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>1</td>\n",
                            "      <td>45</td>\n",
                            "      <td>2</td>\n",
                            "      <td>3</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>2</td>\n",
                            "      <td>23</td>\n",
                            "      <td>4</td>\n",
                            "      <td>4</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>3</td>\n",
                            "      <td>68</td>\n",
                            "      <td>4</td>\n",
                            "      <td>5</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "   labels  area  minor_axis  major_axis\n",
                            "0       1    45           2           3\n",
                            "1       2    23           4           4\n",
                            "2       3    68           4           5"
                        ]
                    },
                    "execution_count": 8,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "# rotate/flip it\n",
                "data_frame = data_frame.transpose()\n",
                "\n",
                "# show it\n",
                "data_frame"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "ccf08662-fccf-4dc1-91c2-3365fa85a96b",
            "metadata": {},
            "source": [
                "## Loading data frames\n",
                "Tables can be read from CSV files with [pd.read_csv](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html?highlight=read_csv#pandas.read_csv)."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 9,
            "id": "aa7c74db-68ab-4004-aa5e-01ba1ad88c79",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>Unnamed: 0</th>\n",
                            "      <th>area</th>\n",
                            "      <th>mean_intensity</th>\n",
                            "      <th>minor_axis_length</th>\n",
                            "      <th>major_axis_length</th>\n",
                            "      <th>eccentricity</th>\n",
                            "      <th>extent</th>\n",
                            "      <th>feret_diameter_max</th>\n",
                            "      <th>equivalent_diameter_area</th>\n",
                            "      <th>bbox-0</th>\n",
                            "      <th>bbox-1</th>\n",
                            "      <th>bbox-2</th>\n",
                            "      <th>bbox-3</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>0</td>\n",
                            "      <td>422</td>\n",
                            "      <td>192.379147</td>\n",
                            "      <td>16.488550</td>\n",
                            "      <td>34.566789</td>\n",
                            "      <td>0.878900</td>\n",
                            "      <td>0.586111</td>\n",
                            "      <td>35.227830</td>\n",
                            "      <td>23.179885</td>\n",
                            "      <td>0</td>\n",
                            "      <td>11</td>\n",
                            "      <td>30</td>\n",
                            "      <td>35</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>1</td>\n",
                            "      <td>182</td>\n",
                            "      <td>180.131868</td>\n",
                            "      <td>11.736074</td>\n",
                            "      <td>20.802697</td>\n",
                            "      <td>0.825665</td>\n",
                            "      <td>0.787879</td>\n",
                            "      <td>21.377558</td>\n",
                            "      <td>15.222667</td>\n",
                            "      <td>0</td>\n",
                            "      <td>53</td>\n",
                            "      <td>11</td>\n",
                            "      <td>74</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>2</td>\n",
                            "      <td>661</td>\n",
                            "      <td>205.216339</td>\n",
                            "      <td>28.409502</td>\n",
                            "      <td>30.208433</td>\n",
                            "      <td>0.339934</td>\n",
                            "      <td>0.874339</td>\n",
                            "      <td>32.756679</td>\n",
                            "      <td>29.010538</td>\n",
                            "      <td>0</td>\n",
                            "      <td>95</td>\n",
                            "      <td>28</td>\n",
                            "      <td>122</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>3</td>\n",
                            "      <td>437</td>\n",
                            "      <td>216.585812</td>\n",
                            "      <td>23.143996</td>\n",
                            "      <td>24.606130</td>\n",
                            "      <td>0.339576</td>\n",
                            "      <td>0.826087</td>\n",
                            "      <td>26.925824</td>\n",
                            "      <td>23.588253</td>\n",
                            "      <td>0</td>\n",
                            "      <td>144</td>\n",
                            "      <td>23</td>\n",
                            "      <td>167</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>4</td>\n",
                            "      <td>476</td>\n",
                            "      <td>212.302521</td>\n",
                            "      <td>19.852882</td>\n",
                            "      <td>31.075106</td>\n",
                            "      <td>0.769317</td>\n",
                            "      <td>0.863884</td>\n",
                            "      <td>31.384710</td>\n",
                            "      <td>24.618327</td>\n",
                            "      <td>0</td>\n",
                            "      <td>237</td>\n",
                            "      <td>29</td>\n",
                            "      <td>256</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>...</th>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>56</th>\n",
                            "      <td>56</td>\n",
                            "      <td>211</td>\n",
                            "      <td>185.061611</td>\n",
                            "      <td>14.522762</td>\n",
                            "      <td>18.489138</td>\n",
                            "      <td>0.618893</td>\n",
                            "      <td>0.781481</td>\n",
                            "      <td>18.973666</td>\n",
                            "      <td>16.390654</td>\n",
                            "      <td>232</td>\n",
                            "      <td>39</td>\n",
                            "      <td>250</td>\n",
                            "      <td>54</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>57</th>\n",
                            "      <td>57</td>\n",
                            "      <td>78</td>\n",
                            "      <td>185.230769</td>\n",
                            "      <td>6.028638</td>\n",
                            "      <td>17.579799</td>\n",
                            "      <td>0.939361</td>\n",
                            "      <td>0.722222</td>\n",
                            "      <td>18.027756</td>\n",
                            "      <td>9.965575</td>\n",
                            "      <td>248</td>\n",
                            "      <td>170</td>\n",
                            "      <td>254</td>\n",
                            "      <td>188</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>58</th>\n",
                            "      <td>58</td>\n",
                            "      <td>86</td>\n",
                            "      <td>183.720930</td>\n",
                            "      <td>5.426871</td>\n",
                            "      <td>21.261427</td>\n",
                            "      <td>0.966876</td>\n",
                            "      <td>0.781818</td>\n",
                            "      <td>22.000000</td>\n",
                            "      <td>10.464158</td>\n",
                            "      <td>249</td>\n",
                            "      <td>117</td>\n",
                            "      <td>254</td>\n",
                            "      <td>139</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>59</th>\n",
                            "      <td>59</td>\n",
                            "      <td>51</td>\n",
                            "      <td>190.431373</td>\n",
                            "      <td>5.032414</td>\n",
                            "      <td>13.742079</td>\n",
                            "      <td>0.930534</td>\n",
                            "      <td>0.728571</td>\n",
                            "      <td>14.035669</td>\n",
                            "      <td>8.058239</td>\n",
                            "      <td>249</td>\n",
                            "      <td>228</td>\n",
                            "      <td>254</td>\n",
                            "      <td>242</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>60</th>\n",
                            "      <td>60</td>\n",
                            "      <td>46</td>\n",
                            "      <td>175.304348</td>\n",
                            "      <td>3.803982</td>\n",
                            "      <td>15.948714</td>\n",
                            "      <td>0.971139</td>\n",
                            "      <td>0.766667</td>\n",
                            "      <td>15.033296</td>\n",
                            "      <td>7.653040</td>\n",
                            "      <td>250</td>\n",
                            "      <td>67</td>\n",
                            "      <td>254</td>\n",
                            "      <td>82</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "<p>61 rows × 13 columns</p>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "    Unnamed: 0  area  mean_intensity  minor_axis_length  major_axis_length  \\\n",
                            "0            0   422      192.379147          16.488550          34.566789   \n",
                            "1            1   182      180.131868          11.736074          20.802697   \n",
                            "2            2   661      205.216339          28.409502          30.208433   \n",
                            "3            3   437      216.585812          23.143996          24.606130   \n",
                            "4            4   476      212.302521          19.852882          31.075106   \n",
                            "..         ...   ...             ...                ...                ...   \n",
                            "56          56   211      185.061611          14.522762          18.489138   \n",
                            "57          57    78      185.230769           6.028638          17.579799   \n",
                            "58          58    86      183.720930           5.426871          21.261427   \n",
                            "59          59    51      190.431373           5.032414          13.742079   \n",
                            "60          60    46      175.304348           3.803982          15.948714   \n",
                            "\n",
                            "    eccentricity    extent  feret_diameter_max  equivalent_diameter_area  \\\n",
                            "0       0.878900  0.586111           35.227830                 23.179885   \n",
                            "1       0.825665  0.787879           21.377558                 15.222667   \n",
                            "2       0.339934  0.874339           32.756679                 29.010538   \n",
                            "3       0.339576  0.826087           26.925824                 23.588253   \n",
                            "4       0.769317  0.863884           31.384710                 24.618327   \n",
                            "..           ...       ...                 ...                       ...   \n",
                            "56      0.618893  0.781481           18.973666                 16.390654   \n",
                            "57      0.939361  0.722222           18.027756                  9.965575   \n",
                            "58      0.966876  0.781818           22.000000                 10.464158   \n",
                            "59      0.930534  0.728571           14.035669                  8.058239   \n",
                            "60      0.971139  0.766667           15.033296                  7.653040   \n",
                            "\n",
                            "    bbox-0  bbox-1  bbox-2  bbox-3  \n",
                            "0        0      11      30      35  \n",
                            "1        0      53      11      74  \n",
                            "2        0      95      28     122  \n",
                            "3        0     144      23     167  \n",
                            "4        0     237      29     256  \n",
                            "..     ...     ...     ...     ...  \n",
                            "56     232      39     250      54  \n",
                            "57     248     170     254     188  \n",
                            "58     249     117     254     139  \n",
                            "59     249     228     254     242  \n",
                            "60     250      67     254      82  \n",
                            "\n",
                            "[61 rows x 13 columns]"
                        ]
                    },
                    "execution_count": 9,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df_csv = pd.read_csv('../../data/blobs_statistics.csv')\n",
                "df_csv"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "da74dce2-609a-4e52-bc16-77ddf45efc98",
            "metadata": {},
            "source": [
                "That's a bit too much information. We can use the `.head()` method of data frames to look at the first few rows (or the `.tail()` to check the last rows)."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 10,
            "id": "08d22b50-f860-428f-9edc-41c749ba5ae7",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>Unnamed: 0</th>\n",
                            "      <th>area</th>\n",
                            "      <th>mean_intensity</th>\n",
                            "      <th>minor_axis_length</th>\n",
                            "      <th>major_axis_length</th>\n",
                            "      <th>eccentricity</th>\n",
                            "      <th>extent</th>\n",
                            "      <th>feret_diameter_max</th>\n",
                            "      <th>equivalent_diameter_area</th>\n",
                            "      <th>bbox-0</th>\n",
                            "      <th>bbox-1</th>\n",
                            "      <th>bbox-2</th>\n",
                            "      <th>bbox-3</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>0</td>\n",
                            "      <td>422</td>\n",
                            "      <td>192.379147</td>\n",
                            "      <td>16.488550</td>\n",
                            "      <td>34.566789</td>\n",
                            "      <td>0.878900</td>\n",
                            "      <td>0.586111</td>\n",
                            "      <td>35.227830</td>\n",
                            "      <td>23.179885</td>\n",
                            "      <td>0</td>\n",
                            "      <td>11</td>\n",
                            "      <td>30</td>\n",
                            "      <td>35</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>1</td>\n",
                            "      <td>182</td>\n",
                            "      <td>180.131868</td>\n",
                            "      <td>11.736074</td>\n",
                            "      <td>20.802697</td>\n",
                            "      <td>0.825665</td>\n",
                            "      <td>0.787879</td>\n",
                            "      <td>21.377558</td>\n",
                            "      <td>15.222667</td>\n",
                            "      <td>0</td>\n",
                            "      <td>53</td>\n",
                            "      <td>11</td>\n",
                            "      <td>74</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>2</td>\n",
                            "      <td>661</td>\n",
                            "      <td>205.216339</td>\n",
                            "      <td>28.409502</td>\n",
                            "      <td>30.208433</td>\n",
                            "      <td>0.339934</td>\n",
                            "      <td>0.874339</td>\n",
                            "      <td>32.756679</td>\n",
                            "      <td>29.010538</td>\n",
                            "      <td>0</td>\n",
                            "      <td>95</td>\n",
                            "      <td>28</td>\n",
                            "      <td>122</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "   Unnamed: 0  area  mean_intensity  minor_axis_length  major_axis_length  \\\n",
                            "0           0   422      192.379147          16.488550          34.566789   \n",
                            "1           1   182      180.131868          11.736074          20.802697   \n",
                            "2           2   661      205.216339          28.409502          30.208433   \n",
                            "\n",
                            "   eccentricity    extent  feret_diameter_max  equivalent_diameter_area  \\\n",
                            "0      0.878900  0.586111           35.227830                 23.179885   \n",
                            "1      0.825665  0.787879           21.377558                 15.222667   \n",
                            "2      0.339934  0.874339           32.756679                 29.010538   \n",
                            "\n",
                            "   bbox-0  bbox-1  bbox-2  bbox-3  \n",
                            "0       0      11      30      35  \n",
                            "1       0      53      11      74  \n",
                            "2       0      95      28     122  "
                        ]
                    },
                    "execution_count": 10,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df_csv.head(3)"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 11,
            "id": "c3c35f55-146f-4cad-bda6-3d83b049a8f9",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>Unnamed: 0</th>\n",
                            "      <th>area</th>\n",
                            "      <th>mean_intensity</th>\n",
                            "      <th>minor_axis_length</th>\n",
                            "      <th>major_axis_length</th>\n",
                            "      <th>eccentricity</th>\n",
                            "      <th>extent</th>\n",
                            "      <th>feret_diameter_max</th>\n",
                            "      <th>equivalent_diameter_area</th>\n",
                            "      <th>bbox-0</th>\n",
                            "      <th>bbox-1</th>\n",
                            "      <th>bbox-2</th>\n",
                            "      <th>bbox-3</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>58</th>\n",
                            "      <td>58</td>\n",
                            "      <td>86</td>\n",
                            "      <td>183.720930</td>\n",
                            "      <td>5.426871</td>\n",
                            "      <td>21.261427</td>\n",
                            "      <td>0.966876</td>\n",
                            "      <td>0.781818</td>\n",
                            "      <td>22.000000</td>\n",
                            "      <td>10.464158</td>\n",
                            "      <td>249</td>\n",
                            "      <td>117</td>\n",
                            "      <td>254</td>\n",
                            "      <td>139</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>59</th>\n",
                            "      <td>59</td>\n",
                            "      <td>51</td>\n",
                            "      <td>190.431373</td>\n",
                            "      <td>5.032414</td>\n",
                            "      <td>13.742079</td>\n",
                            "      <td>0.930534</td>\n",
                            "      <td>0.728571</td>\n",
                            "      <td>14.035669</td>\n",
                            "      <td>8.058239</td>\n",
                            "      <td>249</td>\n",
                            "      <td>228</td>\n",
                            "      <td>254</td>\n",
                            "      <td>242</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>60</th>\n",
                            "      <td>60</td>\n",
                            "      <td>46</td>\n",
                            "      <td>175.304348</td>\n",
                            "      <td>3.803982</td>\n",
                            "      <td>15.948714</td>\n",
                            "      <td>0.971139</td>\n",
                            "      <td>0.766667</td>\n",
                            "      <td>15.033296</td>\n",
                            "      <td>7.653040</td>\n",
                            "      <td>250</td>\n",
                            "      <td>67</td>\n",
                            "      <td>254</td>\n",
                            "      <td>82</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "    Unnamed: 0  area  mean_intensity  minor_axis_length  major_axis_length  \\\n",
                            "58          58    86      183.720930           5.426871          21.261427   \n",
                            "59          59    51      190.431373           5.032414          13.742079   \n",
                            "60          60    46      175.304348           3.803982          15.948714   \n",
                            "\n",
                            "    eccentricity    extent  feret_diameter_max  equivalent_diameter_area  \\\n",
                            "58      0.966876  0.781818           22.000000                 10.464158   \n",
                            "59      0.930534  0.728571           14.035669                  8.058239   \n",
                            "60      0.971139  0.766667           15.033296                  7.653040   \n",
                            "\n",
                            "    bbox-0  bbox-1  bbox-2  bbox-3  \n",
                            "58     249     117     254     139  \n",
                            "59     249     228     254     242  \n",
                            "60     250      67     254      82  "
                        ]
                    },
                    "execution_count": 11,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df_csv.tail(3)"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "01732b57-35d9-4b25-9c1b-d322487d2757",
            "metadata": {},
            "source": [
                "We can also get column names with the DataFrame attribute `columns`."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 12,
            "id": "cc7d6cbe-6487-49a6-84b2-e837f7070f25",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "Index(['Unnamed: 0', 'area', 'mean_intensity', 'minor_axis_length',\n",
                            "       'major_axis_length', 'eccentricity', 'extent', 'feret_diameter_max',\n",
                            "       'equivalent_diameter_area', 'bbox-0', 'bbox-1', 'bbox-2', 'bbox-3'],\n",
                            "      dtype='object')"
                        ]
                    },
                    "execution_count": 12,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df_csv.columns"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "ff187a52-9fc0-4f6f-b143-f872dfe620c2",
            "metadata": {},
            "source": [
                "## Selecting rows and columns"
            ]
        },
        {
            "attachments": {},
            "cell_type": "markdown",
            "id": "19fe76e8-5ea4-40c5-8bcd-1f9e12c9d85f",
            "metadata": {},
            "source": [
                "Ok, let's get the dataframe first column:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 13,
            "id": "bb65a716-5e56-466b-9a19-e388e6900d8e",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "0     422\n",
                            "1     182\n",
                            "2     661\n",
                            "3     437\n",
                            "4     476\n",
                            "     ... \n",
                            "56    211\n",
                            "57     78\n",
                            "58     86\n",
                            "59     51\n",
                            "60     46\n",
                            "Name: area, Length: 61, dtype: int64"
                        ]
                    },
                    "execution_count": 13,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df_csv['area']"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "62ba3625-5206-44c6-b79f-0554fc55bdb7",
            "metadata": {},
            "source": [
                "Notice that when it was printed, the index of the rows came along with it. That's beacuse a Pandas DataFrame with one column is a [Pandas Series](https://pandas.pydata.org/docs/reference/api/pandas.Series.html?highlight=series#pandas.Series)."
            ]
        },
        {
            "cell_type": "markdown",
            "id": "3cb46414-d625-4f6a-85ba-8e05e521b8a5",
            "metadata": {},
            "source": [
                "We can get more columns by passing their names as a list. Furthermore, we can store this \"sub-dataframe\" in a new variable."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 14,
            "id": "ffb0453b-a0d5-4ae9-88ae-da79e332bb0c",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>area</th>\n",
                            "      <th>mean_intensity</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>422</td>\n",
                            "      <td>192.379147</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>182</td>\n",
                            "      <td>180.131868</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>661</td>\n",
                            "      <td>205.216339</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>437</td>\n",
                            "      <td>216.585812</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>476</td>\n",
                            "      <td>212.302521</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>...</th>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>56</th>\n",
                            "      <td>211</td>\n",
                            "      <td>185.061611</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>57</th>\n",
                            "      <td>78</td>\n",
                            "      <td>185.230769</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>58</th>\n",
                            "      <td>86</td>\n",
                            "      <td>183.720930</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>59</th>\n",
                            "      <td>51</td>\n",
                            "      <td>190.431373</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>60</th>\n",
                            "      <td>46</td>\n",
                            "      <td>175.304348</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "<p>61 rows × 2 columns</p>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "    area  mean_intensity\n",
                            "0    422      192.379147\n",
                            "1    182      180.131868\n",
                            "2    661      205.216339\n",
                            "3    437      216.585812\n",
                            "4    476      212.302521\n",
                            "..   ...             ...\n",
                            "56   211      185.061611\n",
                            "57    78      185.230769\n",
                            "58    86      183.720930\n",
                            "59    51      190.431373\n",
                            "60    46      175.304348\n",
                            "\n",
                            "[61 rows x 2 columns]"
                        ]
                    },
                    "execution_count": 14,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df_analysis = df_csv[ ['area', 'mean_intensity'] ].copy()\n",
                "df_analysis"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "361726eb-c73d-4c3a-a5ed-530e9517abb5",
            "metadata": {},
            "source": [
                "This gave us the area measurements we were after. \n",
                "\n",
                "If we want to get a single row, the proper way of doing that is to use the `.loc` method:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 15,
            "id": "6df270e8-5f3c-4f23-a934-3c554575dd09",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "area              661.000000\n",
                            "mean_intensity    205.216339\n",
                            "Name: 2, dtype: float64"
                        ]
                    },
                    "execution_count": 15,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df_csv.loc[ 2, ['area', 'mean_intensity']]"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "95b4a73b-f03b-484a-8ed9-82a06ae38bc5",
            "metadata": {},
            "source": [
                "Note that following `.loc`, we have the index by row then column, separated by a comma, in brackets. It is also important to note that row indices need not be integers. And you should not count on them being integers.\n",
                "\n",
                "In case you really want to access elements in a dataframe by integer indices, like in a numpy array, you can use the `.iloc` method. "
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 16,
            "id": "5a6cf822-dd51-4955-94ab-ece56bc7ac45",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>area</th>\n",
                            "      <th>mean_intensity</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>661</td>\n",
                            "      <td>205.216339</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>437</td>\n",
                            "      <td>216.585812</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "   area  mean_intensity\n",
                            "2   661      205.216339\n",
                            "3   437      216.585812"
                        ]
                    },
                    "execution_count": 16,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df_csv.iloc[2:4, 1:3]"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "d576e7e3-db2b-4a73-9237-0deec1f4116d",
            "metadata": {},
            "source": [
                "The downside is that the code becomes less explicit when column names are absent.\n",
                "\n",
                "In practice you will almost never use row indices, but rather use Boolean indexing or Masking."
            ]
        },
        {
            "attachments": {},
            "cell_type": "markdown",
            "id": "6fe65e75-8003-4175-96c6-21cb26eb5d31",
            "metadata": {},
            "source": [
                "## Selecting parts of tables based on criteria\n",
                "In case we want to focus our further analysis on cells that have a certain minimum area, we can do this by passing boolean indices to the rows. This process is also sometimes call masking.\n",
                "\n",
                "Suppose we want the rows for which `df_analysis[\"area\"] > 50`. We can essentially plug this syntax directly when using `.loc`."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 17,
            "id": "a4eadd9b-e287-4ca8-b1ff-d1278c24151c",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>area</th>\n",
                            "      <th>mean_intensity</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>55</th>\n",
                            "      <td>280</td>\n",
                            "      <td>189.800000</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>56</th>\n",
                            "      <td>211</td>\n",
                            "      <td>185.061611</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>57</th>\n",
                            "      <td>78</td>\n",
                            "      <td>185.230769</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>58</th>\n",
                            "      <td>86</td>\n",
                            "      <td>183.720930</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>59</th>\n",
                            "      <td>51</td>\n",
                            "      <td>190.431373</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "    area  mean_intensity\n",
                            "55   280      189.800000\n",
                            "56   211      185.061611\n",
                            "57    78      185.230769\n",
                            "58    86      183.720930\n",
                            "59    51      190.431373"
                        ]
                    },
                    "execution_count": 17,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df_analysis_filtered  = df_analysis[ df_analysis[\"area\"] > 50]\n",
                "df_analysis_filtered.tail()"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "64eb1086-ebc8-4905-afc2-ed0dc01620b9",
            "metadata": {},
            "source": [
                "## Adding new columns\n",
                "In Pandas, it is very easy to generate new columns from existing columns. We just pass some operation between other columns to a new column name."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 18,
            "id": "402892eb-b1ea-4f11-b272-9c44207f7991",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>area</th>\n",
                            "      <th>mean_intensity</th>\n",
                            "      <th>total_intensity</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>422</td>\n",
                            "      <td>192.379147</td>\n",
                            "      <td>81184.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>182</td>\n",
                            "      <td>180.131868</td>\n",
                            "      <td>32784.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>661</td>\n",
                            "      <td>205.216339</td>\n",
                            "      <td>135648.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>437</td>\n",
                            "      <td>216.585812</td>\n",
                            "      <td>94648.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>476</td>\n",
                            "      <td>212.302521</td>\n",
                            "      <td>101056.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>...</th>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>56</th>\n",
                            "      <td>211</td>\n",
                            "      <td>185.061611</td>\n",
                            "      <td>39048.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>57</th>\n",
                            "      <td>78</td>\n",
                            "      <td>185.230769</td>\n",
                            "      <td>14448.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>58</th>\n",
                            "      <td>86</td>\n",
                            "      <td>183.720930</td>\n",
                            "      <td>15800.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>59</th>\n",
                            "      <td>51</td>\n",
                            "      <td>190.431373</td>\n",
                            "      <td>9712.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>60</th>\n",
                            "      <td>46</td>\n",
                            "      <td>175.304348</td>\n",
                            "      <td>8064.0</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "<p>61 rows × 3 columns</p>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "    area  mean_intensity  total_intensity\n",
                            "0    422      192.379147          81184.0\n",
                            "1    182      180.131868          32784.0\n",
                            "2    661      205.216339         135648.0\n",
                            "3    437      216.585812          94648.0\n",
                            "4    476      212.302521         101056.0\n",
                            "..   ...             ...              ...\n",
                            "56   211      185.061611          39048.0\n",
                            "57    78      185.230769          14448.0\n",
                            "58    86      183.720930          15800.0\n",
                            "59    51      190.431373           9712.0\n",
                            "60    46      175.304348           8064.0\n",
                            "\n",
                            "[61 rows x 3 columns]"
                        ]
                    },
                    "execution_count": 18,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df_analysis['total_intensity'] = df_analysis['area'] * df_analysis['mean_intensity']\n",
                "df_analysis"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "273def9d-da84-49b7-83be-11695c64c3d9",
            "metadata": {},
            "source": [
                "# Appending tables\n",
                "When processing multiple images, potentially using multiple image processing libraries, a common task is to combine tables.\n",
                "\n",
                "We start with two small tables of measurements that could have been obtained from different functions or different libraries."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 19,
            "id": "1477c227",
            "metadata": {},
            "outputs": [],
            "source": [
                "import pandas as pd"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 20,
            "id": "156d6b7c",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>label</th>\n",
                            "      <th>circularity</th>\n",
                            "      <th>elongation</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>1</td>\n",
                            "      <td>0.3</td>\n",
                            "      <td>2.3</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>2</td>\n",
                            "      <td>0.5</td>\n",
                            "      <td>3.4</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>3</td>\n",
                            "      <td>0.7</td>\n",
                            "      <td>1.2</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "   label  circularity  elongation\n",
                            "0      1          0.3         2.3\n",
                            "1      2          0.5         3.4\n",
                            "2      3          0.7         1.2"
                        ]
                    },
                    "execution_count": 20,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "table1 = pd.DataFrame({\n",
                "    \"label\":       [1,   2,   3],\n",
                "    \"circularity\": [0.3, 0.5, 0.7],\n",
                "    \"elongation\":  [2.3, 3.4, 1.2],\n",
                "    })\n",
                "table1"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 21,
            "id": "1877c769",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>label</th>\n",
                            "      <th>area</th>\n",
                            "      <th>skewness</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>3</td>\n",
                            "      <td>22</td>\n",
                            "      <td>0.5</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>2</td>\n",
                            "      <td>32</td>\n",
                            "      <td>0.6</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>1</td>\n",
                            "      <td>25</td>\n",
                            "      <td>0.3</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>4</td>\n",
                            "      <td>18</td>\n",
                            "      <td>0.3</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "   label  area  skewness\n",
                            "0      3    22       0.5\n",
                            "1      2    32       0.6\n",
                            "2      1    25       0.3\n",
                            "3      4    18       0.3"
                        ]
                    },
                    "execution_count": 21,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "table2 = pd.DataFrame({\n",
                "    \"label\":    [3,   2,   1,   4],\n",
                "    \"area\":     [22,  32,  25,  18],\n",
                "    \"skewness\": [0.5, 0.6, 0.3, 0.3],\n",
                "    })\n",
                "table2"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "9682caa7-ea47-452a-a773-952639b53de8",
            "metadata": {
                "tags": []
            },
            "source": [
                "## Combining same measurement from different image files"
            ]
        },
        {
            "attachments": {},
            "cell_type": "markdown",
            "id": "3e37e64c-0185-439f-9e52-f1e23e732865",
            "metadata": {},
            "source": [
                "When applying a workflow to many images, you would get tables with the same column names, but with a variable number of rows.\n",
                "To calculate statistics for whole folders or to perform machine learning, we usually need to concatenate those tables, but it is important to keep track of the source files.\n",
                "\n",
                "Let's open two tables generated from applying the same worflow to different files."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 22,
            "id": "23b7b945-b874-49ba-b48f-d494adf7e8c3",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>area</th>\n",
                            "      <th>intensity_mean</th>\n",
                            "      <th>major_axis_length</th>\n",
                            "      <th>minor_axis_length</th>\n",
                            "      <th>aspect_ratio</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>256</td>\n",
                            "      <td>93.250000</td>\n",
                            "      <td>19.995017</td>\n",
                            "      <td>17.021559</td>\n",
                            "      <td>1.174688</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>90</td>\n",
                            "      <td>82.488889</td>\n",
                            "      <td>15.939969</td>\n",
                            "      <td>7.516326</td>\n",
                            "      <td>2.120713</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>577</td>\n",
                            "      <td>90.637782</td>\n",
                            "      <td>35.324458</td>\n",
                            "      <td>21.759434</td>\n",
                            "      <td>1.623409</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>270</td>\n",
                            "      <td>95.640741</td>\n",
                            "      <td>20.229431</td>\n",
                            "      <td>17.669052</td>\n",
                            "      <td>1.144908</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>153</td>\n",
                            "      <td>84.908497</td>\n",
                            "      <td>15.683703</td>\n",
                            "      <td>12.420475</td>\n",
                            "      <td>1.262730</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "   area  intensity_mean  major_axis_length  minor_axis_length  aspect_ratio\n",
                            "0   256       93.250000          19.995017          17.021559      1.174688\n",
                            "1    90       82.488889          15.939969           7.516326      2.120713\n",
                            "2   577       90.637782          35.324458          21.759434      1.623409\n",
                            "3   270       95.640741          20.229431          17.669052      1.144908\n",
                            "4   153       84.908497          15.683703          12.420475      1.262730"
                        ]
                    },
                    "execution_count": 22,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df1 = pd.read_csv('../../data/BBBC007_20P1_POS0007_D_1UL.csv')\n",
                "df1.head()"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 23,
            "id": "f0bf9a4b-925d-4227-8aff-4348fa2afca3",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>area</th>\n",
                            "      <th>intensity_mean</th>\n",
                            "      <th>major_axis_length</th>\n",
                            "      <th>minor_axis_length</th>\n",
                            "      <th>aspect_ratio</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>139</td>\n",
                            "      <td>96.546763</td>\n",
                            "      <td>17.504104</td>\n",
                            "      <td>10.292770</td>\n",
                            "      <td>1.700621</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>360</td>\n",
                            "      <td>86.613889</td>\n",
                            "      <td>35.746808</td>\n",
                            "      <td>14.983124</td>\n",
                            "      <td>2.385805</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>43</td>\n",
                            "      <td>91.488372</td>\n",
                            "      <td>12.967884</td>\n",
                            "      <td>4.351573</td>\n",
                            "      <td>2.980045</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>140</td>\n",
                            "      <td>73.742857</td>\n",
                            "      <td>18.940508</td>\n",
                            "      <td>10.314404</td>\n",
                            "      <td>1.836316</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>144</td>\n",
                            "      <td>89.375000</td>\n",
                            "      <td>13.639308</td>\n",
                            "      <td>13.458532</td>\n",
                            "      <td>1.013432</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "   area  intensity_mean  major_axis_length  minor_axis_length  aspect_ratio\n",
                            "0   139       96.546763          17.504104          10.292770      1.700621\n",
                            "1   360       86.613889          35.746808          14.983124      2.385805\n",
                            "2    43       91.488372          12.967884           4.351573      2.980045\n",
                            "3   140       73.742857          18.940508          10.314404      1.836316\n",
                            "4   144       89.375000          13.639308          13.458532      1.013432"
                        ]
                    },
                    "execution_count": 23,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df2 = pd.read_csv('../../data/BBBC007_20P1_POS0010_D_1UL.csv')\n",
                "df2.head()"
            ]
        },
        {
            "attachments": {},
            "cell_type": "markdown",
            "id": "6121c288-6584-4c30-a584-43cc58871599",
            "metadata": {},
            "source": [
                "Because we know we have the same columns, we can concatenate them into a single big table."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 24,
            "id": "0cf88538-6bea-4a47-ba1e-09a26fba2602",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>area</th>\n",
                            "      <th>intensity_mean</th>\n",
                            "      <th>major_axis_length</th>\n",
                            "      <th>minor_axis_length</th>\n",
                            "      <th>aspect_ratio</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>256</td>\n",
                            "      <td>93.250000</td>\n",
                            "      <td>19.995017</td>\n",
                            "      <td>17.021559</td>\n",
                            "      <td>1.174688</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>90</td>\n",
                            "      <td>82.488889</td>\n",
                            "      <td>15.939969</td>\n",
                            "      <td>7.516326</td>\n",
                            "      <td>2.120713</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>577</td>\n",
                            "      <td>90.637782</td>\n",
                            "      <td>35.324458</td>\n",
                            "      <td>21.759434</td>\n",
                            "      <td>1.623409</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>270</td>\n",
                            "      <td>95.640741</td>\n",
                            "      <td>20.229431</td>\n",
                            "      <td>17.669052</td>\n",
                            "      <td>1.144908</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>153</td>\n",
                            "      <td>84.908497</td>\n",
                            "      <td>15.683703</td>\n",
                            "      <td>12.420475</td>\n",
                            "      <td>1.262730</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>...</th>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>42</th>\n",
                            "      <td>315</td>\n",
                            "      <td>91.133333</td>\n",
                            "      <td>20.927095</td>\n",
                            "      <td>19.209283</td>\n",
                            "      <td>1.089426</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>43</th>\n",
                            "      <td>206</td>\n",
                            "      <td>94.262136</td>\n",
                            "      <td>23.381879</td>\n",
                            "      <td>11.669668</td>\n",
                            "      <td>2.003646</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>44</th>\n",
                            "      <td>45</td>\n",
                            "      <td>68.377778</td>\n",
                            "      <td>9.406371</td>\n",
                            "      <td>6.276445</td>\n",
                            "      <td>1.498678</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>45</th>\n",
                            "      <td>33</td>\n",
                            "      <td>76.727273</td>\n",
                            "      <td>10.724275</td>\n",
                            "      <td>4.174568</td>\n",
                            "      <td>2.568955</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>46</th>\n",
                            "      <td>16</td>\n",
                            "      <td>76.750000</td>\n",
                            "      <td>7.745967</td>\n",
                            "      <td>2.783882</td>\n",
                            "      <td>2.782433</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "<p>111 rows × 5 columns</p>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "    area  intensity_mean  major_axis_length  minor_axis_length  aspect_ratio\n",
                            "0    256       93.250000          19.995017          17.021559      1.174688\n",
                            "1     90       82.488889          15.939969           7.516326      2.120713\n",
                            "2    577       90.637782          35.324458          21.759434      1.623409\n",
                            "3    270       95.640741          20.229431          17.669052      1.144908\n",
                            "4    153       84.908497          15.683703          12.420475      1.262730\n",
                            "..   ...             ...                ...                ...           ...\n",
                            "42   315       91.133333          20.927095          19.209283      1.089426\n",
                            "43   206       94.262136          23.381879          11.669668      2.003646\n",
                            "44    45       68.377778           9.406371           6.276445      1.498678\n",
                            "45    33       76.727273          10.724275           4.174568      2.568955\n",
                            "46    16       76.750000           7.745967           2.783882      2.782433\n",
                            "\n",
                            "[111 rows x 5 columns]"
                        ]
                    },
                    "execution_count": 24,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "big_df = pd.concat([df1, df2], axis=0)\n",
                "big_df"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "12290fe6-86aa-408e-b9f7-1c2b2900a64a",
            "metadata": {},
            "source": [
                "The problem is that we lose their source identity. An easy fix for that is to add a new column with the file name before concatenating them. This will facilitating splitting them back and plotting later on.\n",
                "\n",
                "When we give a single value to a new column, it is assigned to all rows."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 25,
            "id": "a9a4745e-69a6-4bef-95e0-a449d8ec2dea",
            "metadata": {},
            "outputs": [],
            "source": [
                "df1['file_name'] = 'BBBC007_20P1_POS0007_D_1UL'\n",
                "\n",
                "df2['file_name'] = 'BBBC007_20P1_POS0010_D_1UL'"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 26,
            "id": "4c87f225-eb3c-4107-b5d6-454c2598c084",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>area</th>\n",
                            "      <th>intensity_mean</th>\n",
                            "      <th>major_axis_length</th>\n",
                            "      <th>minor_axis_length</th>\n",
                            "      <th>aspect_ratio</th>\n",
                            "      <th>file_name</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>256</td>\n",
                            "      <td>93.250000</td>\n",
                            "      <td>19.995017</td>\n",
                            "      <td>17.021559</td>\n",
                            "      <td>1.174688</td>\n",
                            "      <td>BBBC007_20P1_POS0007_D_1UL</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>90</td>\n",
                            "      <td>82.488889</td>\n",
                            "      <td>15.939969</td>\n",
                            "      <td>7.516326</td>\n",
                            "      <td>2.120713</td>\n",
                            "      <td>BBBC007_20P1_POS0007_D_1UL</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>577</td>\n",
                            "      <td>90.637782</td>\n",
                            "      <td>35.324458</td>\n",
                            "      <td>21.759434</td>\n",
                            "      <td>1.623409</td>\n",
                            "      <td>BBBC007_20P1_POS0007_D_1UL</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>270</td>\n",
                            "      <td>95.640741</td>\n",
                            "      <td>20.229431</td>\n",
                            "      <td>17.669052</td>\n",
                            "      <td>1.144908</td>\n",
                            "      <td>BBBC007_20P1_POS0007_D_1UL</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>153</td>\n",
                            "      <td>84.908497</td>\n",
                            "      <td>15.683703</td>\n",
                            "      <td>12.420475</td>\n",
                            "      <td>1.262730</td>\n",
                            "      <td>BBBC007_20P1_POS0007_D_1UL</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>...</th>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>42</th>\n",
                            "      <td>315</td>\n",
                            "      <td>91.133333</td>\n",
                            "      <td>20.927095</td>\n",
                            "      <td>19.209283</td>\n",
                            "      <td>1.089426</td>\n",
                            "      <td>BBBC007_20P1_POS0010_D_1UL</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>43</th>\n",
                            "      <td>206</td>\n",
                            "      <td>94.262136</td>\n",
                            "      <td>23.381879</td>\n",
                            "      <td>11.669668</td>\n",
                            "      <td>2.003646</td>\n",
                            "      <td>BBBC007_20P1_POS0010_D_1UL</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>44</th>\n",
                            "      <td>45</td>\n",
                            "      <td>68.377778</td>\n",
                            "      <td>9.406371</td>\n",
                            "      <td>6.276445</td>\n",
                            "      <td>1.498678</td>\n",
                            "      <td>BBBC007_20P1_POS0010_D_1UL</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>45</th>\n",
                            "      <td>33</td>\n",
                            "      <td>76.727273</td>\n",
                            "      <td>10.724275</td>\n",
                            "      <td>4.174568</td>\n",
                            "      <td>2.568955</td>\n",
                            "      <td>BBBC007_20P1_POS0010_D_1UL</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>46</th>\n",
                            "      <td>16</td>\n",
                            "      <td>76.750000</td>\n",
                            "      <td>7.745967</td>\n",
                            "      <td>2.783882</td>\n",
                            "      <td>2.782433</td>\n",
                            "      <td>BBBC007_20P1_POS0010_D_1UL</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "<p>111 rows × 6 columns</p>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "    area  intensity_mean  major_axis_length  minor_axis_length  aspect_ratio  \\\n",
                            "0    256       93.250000          19.995017          17.021559      1.174688   \n",
                            "1     90       82.488889          15.939969           7.516326      2.120713   \n",
                            "2    577       90.637782          35.324458          21.759434      1.623409   \n",
                            "3    270       95.640741          20.229431          17.669052      1.144908   \n",
                            "4    153       84.908497          15.683703          12.420475      1.262730   \n",
                            "..   ...             ...                ...                ...           ...   \n",
                            "42   315       91.133333          20.927095          19.209283      1.089426   \n",
                            "43   206       94.262136          23.381879          11.669668      2.003646   \n",
                            "44    45       68.377778           9.406371           6.276445      1.498678   \n",
                            "45    33       76.727273          10.724275           4.174568      2.568955   \n",
                            "46    16       76.750000           7.745967           2.783882      2.782433   \n",
                            "\n",
                            "                     file_name  \n",
                            "0   BBBC007_20P1_POS0007_D_1UL  \n",
                            "1   BBBC007_20P1_POS0007_D_1UL  \n",
                            "2   BBBC007_20P1_POS0007_D_1UL  \n",
                            "3   BBBC007_20P1_POS0007_D_1UL  \n",
                            "4   BBBC007_20P1_POS0007_D_1UL  \n",
                            "..                         ...  \n",
                            "42  BBBC007_20P1_POS0010_D_1UL  \n",
                            "43  BBBC007_20P1_POS0010_D_1UL  \n",
                            "44  BBBC007_20P1_POS0010_D_1UL  \n",
                            "45  BBBC007_20P1_POS0010_D_1UL  \n",
                            "46  BBBC007_20P1_POS0010_D_1UL  \n",
                            "\n",
                            "[111 rows x 6 columns]"
                        ]
                    },
                    "execution_count": 26,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "big_df = pd.concat([df1, df2], axis=0)\n",
                "big_df"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "bf553c51-dd29-4d5b-8e72-dcf3a262c418",
            "metadata": {},
            "source": [
                "Now, when we can safely distinguish the source of each row."
            ]
        },
        {
            "cell_type": "markdown",
            "id": "78a139a0",
            "metadata": {},
            "source": [
                "# Handling NaN values\n",
                "When analysing tabular data, sometimes table cells are present that does not contain data. In Python this typically means the value is _Not a Number_ ([NaN](https://en.wikipedia.org/wiki/NaN)). We cannot assume these values are `0` or `-1` or any other value because that would distort descriptive statistics, for example. We need to deal with these NaN entries differently and this notebook will introduce how.\n",
                "\n",
                "To get a first view where NaNs play a role, we load again an example table and sort it."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 27,
            "id": "189e76b0-0cc2-4baa-8290-e5a06ab2d70b",
            "metadata": {},
            "outputs": [],
            "source": [
                "import numpy as np\n",
                "import pandas as pd "
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 28,
            "id": "4e617db1-ac10-4f69-9ba9-97913d517a15",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>Area</th>\n",
                            "      <th>Mean</th>\n",
                            "      <th>StdDev</th>\n",
                            "      <th>Min</th>\n",
                            "      <th>Max</th>\n",
                            "      <th>X</th>\n",
                            "      <th>Y</th>\n",
                            "      <th>XM</th>\n",
                            "      <th>YM</th>\n",
                            "      <th>Major</th>\n",
                            "      <th>Minor</th>\n",
                            "      <th>Angle</th>\n",
                            "      <th>%Area</th>\n",
                            "      <th>Type</th>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>190</th>\n",
                            "      <td>2755.0</td>\n",
                            "      <td>859.928</td>\n",
                            "      <td>235.458</td>\n",
                            "      <td>539.0</td>\n",
                            "      <td>3880.0</td>\n",
                            "      <td>108.710</td>\n",
                            "      <td>302.158</td>\n",
                            "      <td>110.999</td>\n",
                            "      <td>300.247</td>\n",
                            "      <td>144.475</td>\n",
                            "      <td>24.280</td>\n",
                            "      <td>39.318</td>\n",
                            "      <td>100</td>\n",
                            "      <td>C</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>81</th>\n",
                            "      <td>2295.0</td>\n",
                            "      <td>765.239</td>\n",
                            "      <td>96.545</td>\n",
                            "      <td>558.0</td>\n",
                            "      <td>1431.0</td>\n",
                            "      <td>375.003</td>\n",
                            "      <td>134.888</td>\n",
                            "      <td>374.982</td>\n",
                            "      <td>135.359</td>\n",
                            "      <td>65.769</td>\n",
                            "      <td>44.429</td>\n",
                            "      <td>127.247</td>\n",
                            "      <td>100</td>\n",
                            "      <td>B</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>209</th>\n",
                            "      <td>1821.0</td>\n",
                            "      <td>847.761</td>\n",
                            "      <td>122.074</td>\n",
                            "      <td>600.0</td>\n",
                            "      <td>1510.0</td>\n",
                            "      <td>287.795</td>\n",
                            "      <td>321.115</td>\n",
                            "      <td>288.074</td>\n",
                            "      <td>321.824</td>\n",
                            "      <td>55.879</td>\n",
                            "      <td>41.492</td>\n",
                            "      <td>112.124</td>\n",
                            "      <td>100</td>\n",
                            "      <td>A</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>252</th>\n",
                            "      <td>1528.0</td>\n",
                            "      <td>763.777</td>\n",
                            "      <td>83.183</td>\n",
                            "      <td>572.0</td>\n",
                            "      <td>1172.0</td>\n",
                            "      <td>191.969</td>\n",
                            "      <td>385.944</td>\n",
                            "      <td>192.487</td>\n",
                            "      <td>385.697</td>\n",
                            "      <td>63.150</td>\n",
                            "      <td>30.808</td>\n",
                            "      <td>34.424</td>\n",
                            "      <td>100</td>\n",
                            "      <td>B</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>265</th>\n",
                            "      <td>1252.0</td>\n",
                            "      <td>793.371</td>\n",
                            "      <td>117.139</td>\n",
                            "      <td>579.0</td>\n",
                            "      <td>1668.0</td>\n",
                            "      <td>262.071</td>\n",
                            "      <td>394.497</td>\n",
                            "      <td>262.268</td>\n",
                            "      <td>394.326</td>\n",
                            "      <td>60.154</td>\n",
                            "      <td>26.500</td>\n",
                            "      <td>50.147</td>\n",
                            "      <td>100</td>\n",
                            "      <td>A</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>...</th>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>113</th>\n",
                            "      <td>1.0</td>\n",
                            "      <td>587.000</td>\n",
                            "      <td>0.000</td>\n",
                            "      <td>587.0</td>\n",
                            "      <td>587.0</td>\n",
                            "      <td>399.500</td>\n",
                            "      <td>117.500</td>\n",
                            "      <td>399.500</td>\n",
                            "      <td>117.500</td>\n",
                            "      <td>1.128</td>\n",
                            "      <td>1.128</td>\n",
                            "      <td>0.000</td>\n",
                            "      <td>100</td>\n",
                            "      <td>A</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>310</th>\n",
                            "      <td>1.0</td>\n",
                            "      <td>866.000</td>\n",
                            "      <td>0.000</td>\n",
                            "      <td>866.0</td>\n",
                            "      <td>866.0</td>\n",
                            "      <td>343.500</td>\n",
                            "      <td>408.500</td>\n",
                            "      <td>343.500</td>\n",
                            "      <td>408.500</td>\n",
                            "      <td>1.128</td>\n",
                            "      <td>1.128</td>\n",
                            "      <td>0.000</td>\n",
                            "      <td>100</td>\n",
                            "      <td>A</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>219</th>\n",
                            "      <td>1.0</td>\n",
                            "      <td>763.000</td>\n",
                            "      <td>0.000</td>\n",
                            "      <td>763.0</td>\n",
                            "      <td>763.0</td>\n",
                            "      <td>411.500</td>\n",
                            "      <td>296.500</td>\n",
                            "      <td>411.500</td>\n",
                            "      <td>296.500</td>\n",
                            "      <td>1.128</td>\n",
                            "      <td>1.128</td>\n",
                            "      <td>0.000</td>\n",
                            "      <td>100</td>\n",
                            "      <td>A</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>608.0</td>\n",
                            "      <td>964.0</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>7.665</td>\n",
                            "      <td>7.359</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>101.121</td>\n",
                            "      <td>100</td>\n",
                            "      <td>A</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>5</th>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>69.438</td>\n",
                            "      <td>566.0</td>\n",
                            "      <td>792.0</td>\n",
                            "      <td>348.500</td>\n",
                            "      <td>7.500</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>7.508</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>3.088</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>100</td>\n",
                            "      <td>A</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "<p>391 rows × 14 columns</p>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "       Area     Mean   StdDev    Min     Max        X        Y       XM  \\\n",
                            "                                                                          \n",
                            "190  2755.0  859.928  235.458  539.0  3880.0  108.710  302.158  110.999   \n",
                            "81   2295.0  765.239   96.545  558.0  1431.0  375.003  134.888  374.982   \n",
                            "209  1821.0  847.761  122.074  600.0  1510.0  287.795  321.115  288.074   \n",
                            "252  1528.0  763.777   83.183  572.0  1172.0  191.969  385.944  192.487   \n",
                            "265  1252.0  793.371  117.139  579.0  1668.0  262.071  394.497  262.268   \n",
                            "..      ...      ...      ...    ...     ...      ...      ...      ...   \n",
                            "113     1.0  587.000    0.000  587.0   587.0  399.500  117.500  399.500   \n",
                            "310     1.0  866.000    0.000  866.0   866.0  343.500  408.500  343.500   \n",
                            "219     1.0  763.000    0.000  763.0   763.0  411.500  296.500  411.500   \n",
                            "3       NaN      NaN      NaN  608.0   964.0      NaN      NaN      NaN   \n",
                            "5       NaN      NaN   69.438  566.0   792.0  348.500    7.500      NaN   \n",
                            "\n",
                            "          YM    Major   Minor    Angle  %Area Type  \n",
                            "                                                    \n",
                            "190  300.247  144.475  24.280   39.318    100    C  \n",
                            "81   135.359   65.769  44.429  127.247    100    B  \n",
                            "209  321.824   55.879  41.492  112.124    100    A  \n",
                            "252  385.697   63.150  30.808   34.424    100    B  \n",
                            "265  394.326   60.154  26.500   50.147    100    A  \n",
                            "..       ...      ...     ...      ...    ...  ...  \n",
                            "113  117.500    1.128   1.128    0.000    100    A  \n",
                            "310  408.500    1.128   1.128    0.000    100    A  \n",
                            "219  296.500    1.128   1.128    0.000    100    A  \n",
                            "3      7.665    7.359     NaN  101.121    100    A  \n",
                            "5      7.508      NaN   3.088      NaN    100    A  \n",
                            "\n",
                            "[391 rows x 14 columns]"
                        ]
                    },
                    "execution_count": 28,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "data = pd.read_csv('../../data/Results.csv', index_col=0, delimiter=';')\n",
                "data.sort_values(by = \"Area\", ascending=False)"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "d4532812",
            "metadata": {},
            "source": [
                "As you can see, there are rows at the bottom containing NaNs. These are at the bottom of the table because pandas cannot sort them."
            ]
        },
        {
            "cell_type": "markdown",
            "id": "2bc0c85e",
            "metadata": {},
            "source": [
                "A quick check if there are NaNs anywhere in a DataFrame is an important quality check and good scientific practice."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 29,
            "id": "5c152771",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "True"
                        ]
                    },
                    "execution_count": 29,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "data.isnull().values.any()"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "87f14e62-b3e5-45f6-9c02-9820b21bd929",
            "metadata": {},
            "source": [
                "We can also get some deeper insights in which columns these NaN values are located."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 30,
            "id": "7f6b5eb4",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "Area      2\n",
                            "Mean      5\n",
                            "StdDev    3\n",
                            "Min       3\n",
                            "Max       3\n",
                            "X         2\n",
                            "Y         3\n",
                            "XM        3\n",
                            "YM        5\n",
                            "Major     8\n",
                            "Minor     3\n",
                            "Angle     1\n",
                            "%Area     0\n",
                            "Type      0\n",
                            "dtype: int64"
                        ]
                    },
                    "execution_count": 30,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "data.isnull().sum()"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "b27eeccd",
            "metadata": {},
            "source": [
                "For getting a glimpse about if we can further process that tabel, we may want to know the percentage of NaNs for each column?"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 31,
            "id": "a9297b56",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "Major     2.046036\n",
                            "Mean      1.278772\n",
                            "YM        1.278772\n",
                            "StdDev    0.767263\n",
                            "Min       0.767263\n",
                            "Max       0.767263\n",
                            "Y         0.767263\n",
                            "XM        0.767263\n",
                            "Minor     0.767263\n",
                            "Area      0.511509\n",
                            "X         0.511509\n",
                            "Angle     0.255754\n",
                            "%Area     0.000000\n",
                            "Type      0.000000\n",
                            "dtype: float64"
                        ]
                    },
                    "execution_count": 31,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "data.isnull().mean().sort_values(ascending=False) *100"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "f605facf",
            "metadata": {},
            "source": [
                "# Dropping rows that contain NaNs\n",
                "Depending on what kind of data analysis should be performed, it might make sense to just ignore columns that contain NaN values. Alternatively, it is possible to delete rows that contain NaNs.\n",
                "\n",
                "It depends on your project and what is important or not for the analysis. Its not an easy answer."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 32,
            "id": "25bac0a9",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>Area</th>\n",
                            "      <th>Mean</th>\n",
                            "      <th>StdDev</th>\n",
                            "      <th>Min</th>\n",
                            "      <th>Max</th>\n",
                            "      <th>X</th>\n",
                            "      <th>Y</th>\n",
                            "      <th>XM</th>\n",
                            "      <th>YM</th>\n",
                            "      <th>Major</th>\n",
                            "      <th>Minor</th>\n",
                            "      <th>Angle</th>\n",
                            "      <th>%Area</th>\n",
                            "      <th>Type</th>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>18.0</td>\n",
                            "      <td>730.389</td>\n",
                            "      <td>103.354</td>\n",
                            "      <td>592.0</td>\n",
                            "      <td>948.0</td>\n",
                            "      <td>435.000</td>\n",
                            "      <td>4.722</td>\n",
                            "      <td>434.962</td>\n",
                            "      <td>4.697</td>\n",
                            "      <td>5.987</td>\n",
                            "      <td>3.828</td>\n",
                            "      <td>168.425</td>\n",
                            "      <td>100</td>\n",
                            "      <td>A</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>126.0</td>\n",
                            "      <td>718.333</td>\n",
                            "      <td>90.367</td>\n",
                            "      <td>556.0</td>\n",
                            "      <td>1046.0</td>\n",
                            "      <td>388.087</td>\n",
                            "      <td>8.683</td>\n",
                            "      <td>388.183</td>\n",
                            "      <td>8.687</td>\n",
                            "      <td>16.559</td>\n",
                            "      <td>9.688</td>\n",
                            "      <td>175.471</td>\n",
                            "      <td>100</td>\n",
                            "      <td>A</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>68.0</td>\n",
                            "      <td>686.985</td>\n",
                            "      <td>61.169</td>\n",
                            "      <td>571.0</td>\n",
                            "      <td>880.0</td>\n",
                            "      <td>126.147</td>\n",
                            "      <td>8.809</td>\n",
                            "      <td>126.192</td>\n",
                            "      <td>8.811</td>\n",
                            "      <td>15.136</td>\n",
                            "      <td>5.720</td>\n",
                            "      <td>168.133</td>\n",
                            "      <td>100</td>\n",
                            "      <td>A</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>6</th>\n",
                            "      <td>669.0</td>\n",
                            "      <td>697.164</td>\n",
                            "      <td>72.863</td>\n",
                            "      <td>539.0</td>\n",
                            "      <td>957.0</td>\n",
                            "      <td>471.696</td>\n",
                            "      <td>26.253</td>\n",
                            "      <td>471.694</td>\n",
                            "      <td>26.197</td>\n",
                            "      <td>36.656</td>\n",
                            "      <td>23.237</td>\n",
                            "      <td>124.340</td>\n",
                            "      <td>100</td>\n",
                            "      <td>A</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>7</th>\n",
                            "      <td>5.0</td>\n",
                            "      <td>658.600</td>\n",
                            "      <td>49.161</td>\n",
                            "      <td>607.0</td>\n",
                            "      <td>710.0</td>\n",
                            "      <td>28.300</td>\n",
                            "      <td>8.100</td>\n",
                            "      <td>28.284</td>\n",
                            "      <td>8.103</td>\n",
                            "      <td>3.144</td>\n",
                            "      <td>2.025</td>\n",
                            "      <td>161.565</td>\n",
                            "      <td>100</td>\n",
                            "      <td>A</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>...</th>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>383</th>\n",
                            "      <td>94.0</td>\n",
                            "      <td>746.617</td>\n",
                            "      <td>85.198</td>\n",
                            "      <td>550.0</td>\n",
                            "      <td>1021.0</td>\n",
                            "      <td>194.032</td>\n",
                            "      <td>498.223</td>\n",
                            "      <td>194.014</td>\n",
                            "      <td>498.239</td>\n",
                            "      <td>17.295</td>\n",
                            "      <td>6.920</td>\n",
                            "      <td>52.720</td>\n",
                            "      <td>100</td>\n",
                            "      <td>B</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>387</th>\n",
                            "      <td>152.0</td>\n",
                            "      <td>801.599</td>\n",
                            "      <td>111.328</td>\n",
                            "      <td>582.0</td>\n",
                            "      <td>1263.0</td>\n",
                            "      <td>348.487</td>\n",
                            "      <td>497.632</td>\n",
                            "      <td>348.451</td>\n",
                            "      <td>497.675</td>\n",
                            "      <td>17.773</td>\n",
                            "      <td>10.889</td>\n",
                            "      <td>11.829</td>\n",
                            "      <td>100</td>\n",
                            "      <td>A</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>389</th>\n",
                            "      <td>60.0</td>\n",
                            "      <td>758.033</td>\n",
                            "      <td>77.309</td>\n",
                            "      <td>601.0</td>\n",
                            "      <td>947.0</td>\n",
                            "      <td>259.000</td>\n",
                            "      <td>499.300</td>\n",
                            "      <td>258.990</td>\n",
                            "      <td>499.289</td>\n",
                            "      <td>9.476</td>\n",
                            "      <td>8.062</td>\n",
                            "      <td>90.000</td>\n",
                            "      <td>100</td>\n",
                            "      <td>A</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>390</th>\n",
                            "      <td>12.0</td>\n",
                            "      <td>714.833</td>\n",
                            "      <td>67.294</td>\n",
                            "      <td>551.0</td>\n",
                            "      <td>785.0</td>\n",
                            "      <td>240.167</td>\n",
                            "      <td>498.167</td>\n",
                            "      <td>240.179</td>\n",
                            "      <td>498.148</td>\n",
                            "      <td>4.606</td>\n",
                            "      <td>3.317</td>\n",
                            "      <td>168.690</td>\n",
                            "      <td>100</td>\n",
                            "      <td>A</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>391</th>\n",
                            "      <td>23.0</td>\n",
                            "      <td>695.043</td>\n",
                            "      <td>67.356</td>\n",
                            "      <td>611.0</td>\n",
                            "      <td>846.0</td>\n",
                            "      <td>49.891</td>\n",
                            "      <td>503.022</td>\n",
                            "      <td>49.882</td>\n",
                            "      <td>502.979</td>\n",
                            "      <td>6.454</td>\n",
                            "      <td>4.537</td>\n",
                            "      <td>73.243</td>\n",
                            "      <td>100</td>\n",
                            "      <td>A</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "<p>374 rows × 14 columns</p>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "      Area     Mean   StdDev    Min     Max        X        Y       XM  \\\n",
                            "                                                                         \n",
                            "1     18.0  730.389  103.354  592.0   948.0  435.000    4.722  434.962   \n",
                            "2    126.0  718.333   90.367  556.0  1046.0  388.087    8.683  388.183   \n",
                            "4     68.0  686.985   61.169  571.0   880.0  126.147    8.809  126.192   \n",
                            "6    669.0  697.164   72.863  539.0   957.0  471.696   26.253  471.694   \n",
                            "7      5.0  658.600   49.161  607.0   710.0   28.300    8.100   28.284   \n",
                            "..     ...      ...      ...    ...     ...      ...      ...      ...   \n",
                            "383   94.0  746.617   85.198  550.0  1021.0  194.032  498.223  194.014   \n",
                            "387  152.0  801.599  111.328  582.0  1263.0  348.487  497.632  348.451   \n",
                            "389   60.0  758.033   77.309  601.0   947.0  259.000  499.300  258.990   \n",
                            "390   12.0  714.833   67.294  551.0   785.0  240.167  498.167  240.179   \n",
                            "391   23.0  695.043   67.356  611.0   846.0   49.891  503.022   49.882   \n",
                            "\n",
                            "          YM   Major   Minor    Angle  %Area Type  \n",
                            "                                                   \n",
                            "1      4.697   5.987   3.828  168.425    100    A  \n",
                            "2      8.687  16.559   9.688  175.471    100    A  \n",
                            "4      8.811  15.136   5.720  168.133    100    A  \n",
                            "6     26.197  36.656  23.237  124.340    100    A  \n",
                            "7      8.103   3.144   2.025  161.565    100    A  \n",
                            "..       ...     ...     ...      ...    ...  ...  \n",
                            "383  498.239  17.295   6.920   52.720    100    B  \n",
                            "387  497.675  17.773  10.889   11.829    100    A  \n",
                            "389  499.289   9.476   8.062   90.000    100    A  \n",
                            "390  498.148   4.606   3.317  168.690    100    A  \n",
                            "391  502.979   6.454   4.537   73.243    100    A  \n",
                            "\n",
                            "[374 rows x 14 columns]"
                        ]
                    },
                    "execution_count": 32,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "data_no_nan = data.dropna(how=\"any\")\n",
                "data_no_nan "
            ]
        },
        {
            "cell_type": "markdown",
            "id": "0d857531-4c88-450b-ae62-9008388088ba",
            "metadata": {},
            "source": [
                "On the bottom of that table, you can see that it still contains 374 of the original 391 columns. If you remove rows, you should document in your later scientific publication, how many out of the total number of datasets were analysed.\n",
                "\n",
                "We can now also check again if NaNs are present."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 33,
            "id": "f09a2106",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "False"
                        ]
                    },
                    "execution_count": 33,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "data_no_nan.isnull().values.any()"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "ab941145-75fe-4d5a-80fb-ab85672d0a86",
            "metadata": {},
            "source": [
                "## Determining rows that contain NaNs\n",
                "In some use-cases it might be useful to have a list of row-indices where there are NaN values."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 34,
            "id": "84addf53-beb6-4955-bad9-fe52517f64d7",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>A</th>\n",
                            "      <th>B</th>\n",
                            "      <th>C</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>0</td>\n",
                            "      <td>2.0</td>\n",
                            "      <td>2.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>1</td>\n",
                            "      <td>3.0</td>\n",
                            "      <td>3.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>22</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>44.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>21</td>\n",
                            "      <td>2.0</td>\n",
                            "      <td>2.0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>12</td>\n",
                            "      <td>12.0</td>\n",
                            "      <td>NaN</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>5</th>\n",
                            "      <td>23</td>\n",
                            "      <td>22.0</td>\n",
                            "      <td>52.0</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "    A     B     C\n",
                            "0   0   2.0   2.0\n",
                            "1   1   3.0   3.0\n",
                            "2  22   NaN  44.0\n",
                            "3  21   2.0   2.0\n",
                            "4  12  12.0   NaN\n",
                            "5  23  22.0  52.0"
                        ]
                    },
                    "execution_count": 34,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "data2 = {\n",
                "    'A': [0, 1, 22, 21, 12, 23],\n",
                "    'B': [2, 3, np.nan,  2,  12, 22],\n",
                "    'C': [2, 3, 44,  2,  np.nan, 52],\n",
                "}\n",
                "\n",
                "table = pd.DataFrame(data2)\n",
                "table"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 35,
            "id": "7b3b3292-0864-484e-b2d4-dab5c1a0b6ed",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "array([False, False,  True, False,  True, False])"
                        ]
                    },
                    "execution_count": 35,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "np.max(table.isnull().values, axis=1)"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "51e6f4d9-a000-4b05-9bcc-dc52db91658c",
            "metadata": {},
            "source": [
                "# Tidy-Data"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "248c6fda-e47c-46c2-957d-cf4bdf4d141d",
            "metadata": {},
            "source": [
                "Hadley Wickham wrote a [great article](https://www.jstatsoft.org/article/view/v059i10) in favor of “tidy data.” Tidy data frames follow the rules:\n",
                "\n",
                "- Each variable is a column.\n",
                "\n",
                "- Each observation is a row.\n",
                "\n",
                "- Each type of observation has its own separate data frame.\n",
                "\n",
                "This is less pretty to visualize as a table, but we rarely look at data in tables. Indeed, the representation of data which is convenient for visualization is different from that which is convenient for analysis. A tidy data frame is almost always much easier to work with than non-tidy formats."
            ]
        },
        {
            "cell_type": "markdown",
            "id": "774c1a89-2f91-404b-8f74-284acabd3bcb",
            "metadata": {},
            "source": [
                "Let's import a saved table with measurements. Is this table tidy?"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 36,
            "id": "5c88af81-7a31-42bb-8f12-69a89f2f1e0a",
            "metadata": {},
            "outputs": [],
            "source": [
                "import pandas as pd"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 37,
            "id": "504fd34a-9454-4fb3-9ccf-4b9964feada9",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead tr th {\n",
                            "        text-align: left;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr>\n",
                            "      <th></th>\n",
                            "      <th colspan=\"2\" halign=\"left\">Before</th>\n",
                            "      <th colspan=\"2\" halign=\"left\">After</th>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th></th>\n",
                            "      <th>channel_1</th>\n",
                            "      <th>channel_2</th>\n",
                            "      <th>channel_1</th>\n",
                            "      <th>channel_2</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>13.250000</td>\n",
                            "      <td>21.000000</td>\n",
                            "      <td>15.137984</td>\n",
                            "      <td>42.022776</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>44.954545</td>\n",
                            "      <td>24.318182</td>\n",
                            "      <td>43.328836</td>\n",
                            "      <td>48.661610</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>13.590909</td>\n",
                            "      <td>18.772727</td>\n",
                            "      <td>11.685995</td>\n",
                            "      <td>37.926184</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>85.032258</td>\n",
                            "      <td>19.741935</td>\n",
                            "      <td>86.031461</td>\n",
                            "      <td>40.396353</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>10.731707</td>\n",
                            "      <td>25.268293</td>\n",
                            "      <td>10.075421</td>\n",
                            "      <td>51.471865</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>5</th>\n",
                            "      <td>94.625000</td>\n",
                            "      <td>36.450000</td>\n",
                            "      <td>95.180900</td>\n",
                            "      <td>73.347843</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>6</th>\n",
                            "      <td>89.836735</td>\n",
                            "      <td>34.693878</td>\n",
                            "      <td>89.857864</td>\n",
                            "      <td>69.902829</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>7</th>\n",
                            "      <td>100.261905</td>\n",
                            "      <td>34.904762</td>\n",
                            "      <td>101.989852</td>\n",
                            "      <td>70.156432</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>8</th>\n",
                            "      <td>29.615385</td>\n",
                            "      <td>52.115385</td>\n",
                            "      <td>31.516654</td>\n",
                            "      <td>104.525198</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>9</th>\n",
                            "      <td>15.868421</td>\n",
                            "      <td>24.921053</td>\n",
                            "      <td>16.086932</td>\n",
                            "      <td>50.563301</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>10</th>\n",
                            "      <td>12.475000</td>\n",
                            "      <td>25.450000</td>\n",
                            "      <td>11.529924</td>\n",
                            "      <td>51.381594</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>11</th>\n",
                            "      <td>87.875000</td>\n",
                            "      <td>28.050000</td>\n",
                            "      <td>89.745522</td>\n",
                            "      <td>56.543107</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>12</th>\n",
                            "      <td>58.800000</td>\n",
                            "      <td>22.600000</td>\n",
                            "      <td>59.646229</td>\n",
                            "      <td>45.215405</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>13</th>\n",
                            "      <td>91.061224</td>\n",
                            "      <td>40.367347</td>\n",
                            "      <td>89.935893</td>\n",
                            "      <td>81.326111</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>14</th>\n",
                            "      <td>23.500000</td>\n",
                            "      <td>117.333333</td>\n",
                            "      <td>21.676993</td>\n",
                            "      <td>235.067654</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>15</th>\n",
                            "      <td>82.566667</td>\n",
                            "      <td>34.566667</td>\n",
                            "      <td>84.097735</td>\n",
                            "      <td>69.820702</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>16</th>\n",
                            "      <td>36.120000</td>\n",
                            "      <td>29.600000</td>\n",
                            "      <td>37.688676</td>\n",
                            "      <td>59.870177</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>17</th>\n",
                            "      <td>70.687500</td>\n",
                            "      <td>33.843750</td>\n",
                            "      <td>72.569112</td>\n",
                            "      <td>68.493363</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>18</th>\n",
                            "      <td>102.021277</td>\n",
                            "      <td>33.297872</td>\n",
                            "      <td>100.419746</td>\n",
                            "      <td>67.379506</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>19</th>\n",
                            "      <td>72.318182</td>\n",
                            "      <td>103.909091</td>\n",
                            "      <td>70.843134</td>\n",
                            "      <td>207.956510</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>20</th>\n",
                            "      <td>18.100000</td>\n",
                            "      <td>29.166667</td>\n",
                            "      <td>17.865201</td>\n",
                            "      <td>58.361239</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>21</th>\n",
                            "      <td>5.217391</td>\n",
                            "      <td>36.347826</td>\n",
                            "      <td>6.961346</td>\n",
                            "      <td>73.286439</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>22</th>\n",
                            "      <td>19.925926</td>\n",
                            "      <td>72.814815</td>\n",
                            "      <td>18.607102</td>\n",
                            "      <td>145.900739</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>23</th>\n",
                            "      <td>26.673077</td>\n",
                            "      <td>57.403846</td>\n",
                            "      <td>27.611368</td>\n",
                            "      <td>115.347217</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>24</th>\n",
                            "      <td>13.340000</td>\n",
                            "      <td>30.400000</td>\n",
                            "      <td>14.160543</td>\n",
                            "      <td>61.225962</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>25</th>\n",
                            "      <td>15.028571</td>\n",
                            "      <td>38.400000</td>\n",
                            "      <td>14.529963</td>\n",
                            "      <td>77.490249</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "        Before                   After            \n",
                            "     channel_1   channel_2   channel_1   channel_2\n",
                            "0    13.250000   21.000000   15.137984   42.022776\n",
                            "1    44.954545   24.318182   43.328836   48.661610\n",
                            "2    13.590909   18.772727   11.685995   37.926184\n",
                            "3    85.032258   19.741935   86.031461   40.396353\n",
                            "4    10.731707   25.268293   10.075421   51.471865\n",
                            "5    94.625000   36.450000   95.180900   73.347843\n",
                            "6    89.836735   34.693878   89.857864   69.902829\n",
                            "7   100.261905   34.904762  101.989852   70.156432\n",
                            "8    29.615385   52.115385   31.516654  104.525198\n",
                            "9    15.868421   24.921053   16.086932   50.563301\n",
                            "10   12.475000   25.450000   11.529924   51.381594\n",
                            "11   87.875000   28.050000   89.745522   56.543107\n",
                            "12   58.800000   22.600000   59.646229   45.215405\n",
                            "13   91.061224   40.367347   89.935893   81.326111\n",
                            "14   23.500000  117.333333   21.676993  235.067654\n",
                            "15   82.566667   34.566667   84.097735   69.820702\n",
                            "16   36.120000   29.600000   37.688676   59.870177\n",
                            "17   70.687500   33.843750   72.569112   68.493363\n",
                            "18  102.021277   33.297872  100.419746   67.379506\n",
                            "19   72.318182  103.909091   70.843134  207.956510\n",
                            "20   18.100000   29.166667   17.865201   58.361239\n",
                            "21    5.217391   36.347826    6.961346   73.286439\n",
                            "22   19.925926   72.814815   18.607102  145.900739\n",
                            "23   26.673077   57.403846   27.611368  115.347217\n",
                            "24   13.340000   30.400000   14.160543   61.225962\n",
                            "25   15.028571   38.400000   14.529963   77.490249"
                        ]
                    },
                    "execution_count": 37,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df = pd.read_csv('../../data/Multi_analysis.csv', header = [0,1], sep=';')\n",
                "df"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "e497f862-4f03-4d58-9b92-0dea8d750f16",
            "metadata": {},
            "source": [
                "The most useful function for tidying data is [pd.melt](https://pandas.pydata.org/docs/reference/api/pandas.melt.html)."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 38,
            "id": "0022b484-fd53-4c82-be11-9a3fe2261497",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>variable_0</th>\n",
                            "      <th>variable_1</th>\n",
                            "      <th>value</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_1</td>\n",
                            "      <td>13.250000</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_1</td>\n",
                            "      <td>44.954545</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_1</td>\n",
                            "      <td>13.590909</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_1</td>\n",
                            "      <td>85.032258</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_1</td>\n",
                            "      <td>10.731707</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>...</th>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>99</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>73.286439</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>100</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>145.900739</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>101</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>115.347217</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>102</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>61.225962</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>103</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>77.490249</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "<p>104 rows × 3 columns</p>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "    variable_0 variable_1       value\n",
                            "0       Before  channel_1   13.250000\n",
                            "1       Before  channel_1   44.954545\n",
                            "2       Before  channel_1   13.590909\n",
                            "3       Before  channel_1   85.032258\n",
                            "4       Before  channel_1   10.731707\n",
                            "..         ...        ...         ...\n",
                            "99       After  channel_2   73.286439\n",
                            "100      After  channel_2  145.900739\n",
                            "101      After  channel_2  115.347217\n",
                            "102      After  channel_2   61.225962\n",
                            "103      After  channel_2   77.490249\n",
                            "\n",
                            "[104 rows x 3 columns]"
                        ]
                    },
                    "execution_count": 38,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df.melt()"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "dd24af67-9f6e-4f8c-a432-37eb146e2bc9",
            "metadata": {},
            "source": [
                "We can specify names for the value and for the variables. In this case, our measurements are of intensity and our variables are Intervention (before or after) and channel."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 39,
            "id": "9ffaf527-78d8-4a9c-89e0-19858b877c2e",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>Treatment</th>\n",
                            "      <th>Channel</th>\n",
                            "      <th>intensity</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_1</td>\n",
                            "      <td>13.250000</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_1</td>\n",
                            "      <td>44.954545</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_1</td>\n",
                            "      <td>13.590909</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_1</td>\n",
                            "      <td>85.032258</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_1</td>\n",
                            "      <td>10.731707</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>...</th>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>99</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>73.286439</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>100</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>145.900739</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>101</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>115.347217</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>102</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>61.225962</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>103</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>77.490249</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "<p>104 rows × 3 columns</p>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "    Treatment    Channel   intensity\n",
                            "0      Before  channel_1   13.250000\n",
                            "1      Before  channel_1   44.954545\n",
                            "2      Before  channel_1   13.590909\n",
                            "3      Before  channel_1   85.032258\n",
                            "4      Before  channel_1   10.731707\n",
                            "..        ...        ...         ...\n",
                            "99      After  channel_2   73.286439\n",
                            "100     After  channel_2  145.900739\n",
                            "101     After  channel_2  115.347217\n",
                            "102     After  channel_2   61.225962\n",
                            "103     After  channel_2   77.490249\n",
                            "\n",
                            "[104 rows x 3 columns]"
                        ]
                    },
                    "execution_count": 39,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df_tidy = df.melt(value_name='intensity', var_name=['Treatment', 'Channel'])\n",
                "df_tidy"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "d95d7c92-95fe-48a5-8554-4b366830f351",
            "metadata": {},
            "source": [
                "It may not look better for interpreting, but it becomes easier to manipulate, because now we can more easily mask by columns. Here we select intensity measurement rows for \"channel_2\" and \"After\" intervention."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 40,
            "id": "512f335d-61f7-4dc9-9ff1-239802b2f9da",
            "metadata": {},
            "outputs": [],
            "source": [
                "row_mask = (df_tidy['Channel'] == 'channel_2') & (df_tidy['Treatment'] == 'After')"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 41,
            "id": "88c4b6bb-81b7-48e8-ada8-5f2acf2e4888",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>Treatment</th>\n",
                            "      <th>Channel</th>\n",
                            "      <th>intensity</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>26</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>21.000000</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>27</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>24.318182</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>28</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>18.772727</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>29</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>19.741935</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>30</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>25.268293</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>31</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>36.450000</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>32</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>34.693878</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>33</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>34.904762</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>34</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>52.115385</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>35</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>24.921053</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>36</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>25.450000</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>37</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>28.050000</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>38</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>22.600000</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>39</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>40.367347</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>40</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>117.333333</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>41</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>34.566667</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>42</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>29.600000</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>43</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>33.843750</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>44</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>33.297872</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>45</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>103.909091</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>46</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>29.166667</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>47</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>36.347826</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>48</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>72.814815</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>49</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>57.403846</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>50</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>30.400000</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>51</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>38.400000</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>78</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>42.022776</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>79</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>48.661610</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>80</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>37.926184</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>81</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>40.396353</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>82</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>51.471865</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>83</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>73.347843</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>84</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>69.902829</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>85</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>70.156432</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>86</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>104.525198</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>87</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>50.563301</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>88</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>51.381594</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>89</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>56.543107</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>90</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>45.215405</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>91</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>81.326111</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>92</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>235.067654</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>93</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>69.820702</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>94</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>59.870177</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>95</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>68.493363</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>96</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>67.379506</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>97</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>207.956510</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>98</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>58.361239</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>99</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>73.286439</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>100</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>145.900739</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>101</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>115.347217</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>102</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>61.225962</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>103</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>77.490249</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "    Treatment    Channel   intensity\n",
                            "26     Before  channel_2   21.000000\n",
                            "27     Before  channel_2   24.318182\n",
                            "28     Before  channel_2   18.772727\n",
                            "29     Before  channel_2   19.741935\n",
                            "30     Before  channel_2   25.268293\n",
                            "31     Before  channel_2   36.450000\n",
                            "32     Before  channel_2   34.693878\n",
                            "33     Before  channel_2   34.904762\n",
                            "34     Before  channel_2   52.115385\n",
                            "35     Before  channel_2   24.921053\n",
                            "36     Before  channel_2   25.450000\n",
                            "37     Before  channel_2   28.050000\n",
                            "38     Before  channel_2   22.600000\n",
                            "39     Before  channel_2   40.367347\n",
                            "40     Before  channel_2  117.333333\n",
                            "41     Before  channel_2   34.566667\n",
                            "42     Before  channel_2   29.600000\n",
                            "43     Before  channel_2   33.843750\n",
                            "44     Before  channel_2   33.297872\n",
                            "45     Before  channel_2  103.909091\n",
                            "46     Before  channel_2   29.166667\n",
                            "47     Before  channel_2   36.347826\n",
                            "48     Before  channel_2   72.814815\n",
                            "49     Before  channel_2   57.403846\n",
                            "50     Before  channel_2   30.400000\n",
                            "51     Before  channel_2   38.400000\n",
                            "78      After  channel_2   42.022776\n",
                            "79      After  channel_2   48.661610\n",
                            "80      After  channel_2   37.926184\n",
                            "81      After  channel_2   40.396353\n",
                            "82      After  channel_2   51.471865\n",
                            "83      After  channel_2   73.347843\n",
                            "84      After  channel_2   69.902829\n",
                            "85      After  channel_2   70.156432\n",
                            "86      After  channel_2  104.525198\n",
                            "87      After  channel_2   50.563301\n",
                            "88      After  channel_2   51.381594\n",
                            "89      After  channel_2   56.543107\n",
                            "90      After  channel_2   45.215405\n",
                            "91      After  channel_2   81.326111\n",
                            "92      After  channel_2  235.067654\n",
                            "93      After  channel_2   69.820702\n",
                            "94      After  channel_2   59.870177\n",
                            "95      After  channel_2   68.493363\n",
                            "96      After  channel_2   67.379506\n",
                            "97      After  channel_2  207.956510\n",
                            "98      After  channel_2   58.361239\n",
                            "99      After  channel_2   73.286439\n",
                            "100     After  channel_2  145.900739\n",
                            "101     After  channel_2  115.347217\n",
                            "102     After  channel_2   61.225962\n",
                            "103     After  channel_2   77.490249"
                        ]
                    },
                    "execution_count": 41,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df_tidy[df_tidy['Channel'] == 'channel_2']"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "731039f0-a993-430b-996a-cad1b809ac02",
            "metadata": {},
            "source": [
                "# Split-Apply-Combine"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "f34f881f-7f9b-4e51-9ee7-8c50b877390c",
            "metadata": {},
            "source": [
                "Assume the following dataframe. It contains measurements of objects from 2 different files."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 42,
            "id": "8e6ddf36-5f87-415e-a983-364168dee4c3",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>area</th>\n",
                            "      <th>intensity_mean</th>\n",
                            "      <th>major_axis_length</th>\n",
                            "      <th>minor_axis_length</th>\n",
                            "      <th>aspect_ratio</th>\n",
                            "      <th>file_name</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>...</th>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>106</th>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>107</th>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>108</th>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>109</th>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>110</th>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "      <td>False</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "<p>111 rows × 6 columns</p>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "      area  intensity_mean  major_axis_length  minor_axis_length  \\\n",
                            "0    False           False              False              False   \n",
                            "1    False           False              False              False   \n",
                            "2    False           False              False              False   \n",
                            "3    False           False              False              False   \n",
                            "4    False           False              False              False   \n",
                            "..     ...             ...                ...                ...   \n",
                            "106  False           False              False              False   \n",
                            "107  False           False              False              False   \n",
                            "108  False           False              False              False   \n",
                            "109  False           False              False              False   \n",
                            "110  False           False              False              False   \n",
                            "\n",
                            "     aspect_ratio  file_name  \n",
                            "0           False      False  \n",
                            "1           False      False  \n",
                            "2           False      False  \n",
                            "3           False      False  \n",
                            "4           False      False  \n",
                            "..            ...        ...  \n",
                            "106         False      False  \n",
                            "107         False      False  \n",
                            "108         False      False  \n",
                            "109         False      False  \n",
                            "110         False      False  \n",
                            "\n",
                            "[111 rows x 6 columns]"
                        ]
                    },
                    "execution_count": 42,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df = pd.read_csv('../../data/BBBC007_analysis.csv')\n",
                "df.isnull()"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "d50bf409-5dd7-41b4-85a6-ae8d3189abb7",
            "metadata": {},
            "source": [
                "Let’s say we want to compute the median \"intensity_mean\" of round objects and also discriminate these objects per file. Ignoring for the second the mechanics of how we would do this with Python, let’s think about it in English. What do we need to do?\n",
                "\n",
                "- Split the data set up according to a 'round' criterion field, i.e., split it up so we have a separate data set for the two classes, those round and those not round.\n",
                "\n",
                "- Apply a median function to the intensity in these split data sets.\n",
                "\n",
                "- Combine the results of these averages on the split data set into a new, summary data set that contains the two classes (round and not round) and medians for each.\n",
                "\n",
                "We see that the strategy we want is a split-apply-combine strategy. This idea was put forward by Hadley Wickham in [this paper](https://www.jstatsoft.org/article/view/v040i01). It turns out that this is a strategy we want to use very often. Split the data in terms of some criterion. Apply some function to the split-up data. Combine the results into a new data frame.\n",
                "\n",
                "Note that if the data are tidy, this procedure makes a lot of sense. Choose the column you want to use to split by. All rows with like entries in the splitting column are then grouped into a new data set. You can then apply any function you want into these new data sets. You can then combine the results into a new data frame.\n",
                "\n",
                "Pandas’s split-apply-combine operations are achieved using the groupby() method. You can think of groupby() as the splitting part. You can then apply functions to the resulting DataFrameGroupBy object. The Pandas documentation on split-apply-combine is excellent and worth reading through. It is extensive though, so don’t let yourself get intimidated by it."
            ]
        },
        {
            "cell_type": "markdown",
            "id": "ab72dfca-5ea2-4a8c-9b26-4e6afdef4c27",
            "metadata": {},
            "source": [
                "Before all that, we create a new column with our cirterion for roundness."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 43,
            "id": "e6cd8e95-a0f7-47e5-adb7-2a57fd765f5f",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>area</th>\n",
                            "      <th>intensity_mean</th>\n",
                            "      <th>major_axis_length</th>\n",
                            "      <th>minor_axis_length</th>\n",
                            "      <th>aspect_ratio</th>\n",
                            "      <th>file_name</th>\n",
                            "      <th>round</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>139</td>\n",
                            "      <td>96.546763</td>\n",
                            "      <td>17.504104</td>\n",
                            "      <td>10.292770</td>\n",
                            "      <td>1.700621</td>\n",
                            "      <td>20P1_POS0010_D_1UL</td>\n",
                            "      <td>False</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>360</td>\n",
                            "      <td>86.613889</td>\n",
                            "      <td>35.746808</td>\n",
                            "      <td>14.983124</td>\n",
                            "      <td>2.385805</td>\n",
                            "      <td>20P1_POS0010_D_1UL</td>\n",
                            "      <td>False</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>43</td>\n",
                            "      <td>91.488372</td>\n",
                            "      <td>12.967884</td>\n",
                            "      <td>4.351573</td>\n",
                            "      <td>2.980045</td>\n",
                            "      <td>20P1_POS0010_D_1UL</td>\n",
                            "      <td>False</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>140</td>\n",
                            "      <td>73.742857</td>\n",
                            "      <td>18.940508</td>\n",
                            "      <td>10.314404</td>\n",
                            "      <td>1.836316</td>\n",
                            "      <td>20P1_POS0010_D_1UL</td>\n",
                            "      <td>False</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>144</td>\n",
                            "      <td>89.375000</td>\n",
                            "      <td>13.639308</td>\n",
                            "      <td>13.458532</td>\n",
                            "      <td>1.013432</td>\n",
                            "      <td>20P1_POS0010_D_1UL</td>\n",
                            "      <td>True</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>...</th>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>106</th>\n",
                            "      <td>305</td>\n",
                            "      <td>88.252459</td>\n",
                            "      <td>20.226532</td>\n",
                            "      <td>19.244210</td>\n",
                            "      <td>1.051045</td>\n",
                            "      <td>20P1_POS0007_D_1UL</td>\n",
                            "      <td>True</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>107</th>\n",
                            "      <td>593</td>\n",
                            "      <td>89.905565</td>\n",
                            "      <td>36.508370</td>\n",
                            "      <td>21.365394</td>\n",
                            "      <td>1.708762</td>\n",
                            "      <td>20P1_POS0007_D_1UL</td>\n",
                            "      <td>False</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>108</th>\n",
                            "      <td>289</td>\n",
                            "      <td>106.851211</td>\n",
                            "      <td>20.427809</td>\n",
                            "      <td>18.221452</td>\n",
                            "      <td>1.121086</td>\n",
                            "      <td>20P1_POS0007_D_1UL</td>\n",
                            "      <td>True</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>109</th>\n",
                            "      <td>277</td>\n",
                            "      <td>100.664260</td>\n",
                            "      <td>20.307965</td>\n",
                            "      <td>17.432920</td>\n",
                            "      <td>1.164920</td>\n",
                            "      <td>20P1_POS0007_D_1UL</td>\n",
                            "      <td>True</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>110</th>\n",
                            "      <td>46</td>\n",
                            "      <td>70.869565</td>\n",
                            "      <td>11.648895</td>\n",
                            "      <td>5.298003</td>\n",
                            "      <td>2.198733</td>\n",
                            "      <td>20P1_POS0007_D_1UL</td>\n",
                            "      <td>False</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "<p>111 rows × 7 columns</p>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "     area  intensity_mean  major_axis_length  minor_axis_length  aspect_ratio  \\\n",
                            "0     139       96.546763          17.504104          10.292770      1.700621   \n",
                            "1     360       86.613889          35.746808          14.983124      2.385805   \n",
                            "2      43       91.488372          12.967884           4.351573      2.980045   \n",
                            "3     140       73.742857          18.940508          10.314404      1.836316   \n",
                            "4     144       89.375000          13.639308          13.458532      1.013432   \n",
                            "..    ...             ...                ...                ...           ...   \n",
                            "106   305       88.252459          20.226532          19.244210      1.051045   \n",
                            "107   593       89.905565          36.508370          21.365394      1.708762   \n",
                            "108   289      106.851211          20.427809          18.221452      1.121086   \n",
                            "109   277      100.664260          20.307965          17.432920      1.164920   \n",
                            "110    46       70.869565          11.648895           5.298003      2.198733   \n",
                            "\n",
                            "              file_name  round  \n",
                            "0    20P1_POS0010_D_1UL  False  \n",
                            "1    20P1_POS0010_D_1UL  False  \n",
                            "2    20P1_POS0010_D_1UL  False  \n",
                            "3    20P1_POS0010_D_1UL  False  \n",
                            "4    20P1_POS0010_D_1UL   True  \n",
                            "..                  ...    ...  \n",
                            "106  20P1_POS0007_D_1UL   True  \n",
                            "107  20P1_POS0007_D_1UL  False  \n",
                            "108  20P1_POS0007_D_1UL   True  \n",
                            "109  20P1_POS0007_D_1UL   True  \n",
                            "110  20P1_POS0007_D_1UL  False  \n",
                            "\n",
                            "[111 rows x 7 columns]"
                        ]
                    },
                    "execution_count": 43,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df['round'] = df['aspect_ratio'] < 1.2\n",
                "df"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "464afbc4-287e-4ebc-9b29-dc80fe4f8805",
            "metadata": {},
            "source": [
                "## Aggregation: median intensity"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "ef04e5ff-02a2-4b72-aab6-e596dcb63eda",
            "metadata": {},
            "source": [
                "Let's start by grouping by 'round'."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 44,
            "id": "37aaf674-fcc8-4ff1-96e2-216a0135fe10",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x11bb28e20>"
                        ]
                    },
                    "execution_count": 44,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "grouped = df.groupby('round')\n",
                "\n",
                "# Take a look\n",
                "grouped"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "00eac35a-50a6-4085-b94e-6c79592c5ef5",
            "metadata": {},
            "source": [
                "There is not much to see in the DataFrameGroupBy object that resulted. But there is a lot we can do with this object. Typing `grouped.` and hitting tab will show you the many possibilities. For most of these possibilities, the apply and combine steps happen together and a new data frame is returned. The grouped.median() method is exactly what we want."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 45,
            "id": "14983422-5c93-431e-a105-7eb90f739372",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>area</th>\n",
                            "      <th>intensity_mean</th>\n",
                            "      <th>major_axis_length</th>\n",
                            "      <th>minor_axis_length</th>\n",
                            "      <th>aspect_ratio</th>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>round</th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>False</th>\n",
                            "      <td>270.0</td>\n",
                            "      <td>92.788345</td>\n",
                            "      <td>21.459495</td>\n",
                            "      <td>15.858324</td>\n",
                            "      <td>1.412849</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>True</th>\n",
                            "      <td>291.0</td>\n",
                            "      <td>100.256000</td>\n",
                            "      <td>20.155547</td>\n",
                            "      <td>18.352287</td>\n",
                            "      <td>1.101700</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "        area  intensity_mean  major_axis_length  minor_axis_length  \\\n",
                            "round                                                                \n",
                            "False  270.0       92.788345          21.459495          15.858324   \n",
                            "True   291.0      100.256000          20.155547          18.352287   \n",
                            "\n",
                            "       aspect_ratio  \n",
                            "round                \n",
                            "False      1.412849  \n",
                            "True       1.101700  "
                        ]
                    },
                    "execution_count": 45,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df_median = grouped.median(numeric_only = True)\n",
                "\n",
                "# Take a look\n",
                "df_median"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "cfd9ac1c-b987-444b-96cf-aad7acc82cab",
            "metadata": {},
            "source": [
                "Here the numeric_only option is set to disconsider for now calculating a median for the categorical 'file_name' column."
            ]
        },
        {
            "cell_type": "markdown",
            "id": "339a3a54-7d67-427b-9a74-c62239040959",
            "metadata": {},
            "source": [
                "The outputted data frame has the medians of all quantities, including the intensities that we wanted. Note that this data frame has 'round' as the name of the row index. If we want to instead keep 'round' (which, remember, is what we used to split up the data set before we computed the summary statistics) as a column, we can use the reset_index() method."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 46,
            "id": "103b54c4-24a1-487a-a169-4f4fc535ddf7",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>round</th>\n",
                            "      <th>area</th>\n",
                            "      <th>intensity_mean</th>\n",
                            "      <th>major_axis_length</th>\n",
                            "      <th>minor_axis_length</th>\n",
                            "      <th>aspect_ratio</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>False</td>\n",
                            "      <td>270.0</td>\n",
                            "      <td>92.788345</td>\n",
                            "      <td>21.459495</td>\n",
                            "      <td>15.858324</td>\n",
                            "      <td>1.412849</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>True</td>\n",
                            "      <td>291.0</td>\n",
                            "      <td>100.256000</td>\n",
                            "      <td>20.155547</td>\n",
                            "      <td>18.352287</td>\n",
                            "      <td>1.101700</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "   round   area  intensity_mean  major_axis_length  minor_axis_length  \\\n",
                            "0  False  270.0       92.788345          21.459495          15.858324   \n",
                            "1   True  291.0      100.256000          20.155547          18.352287   \n",
                            "\n",
                            "   aspect_ratio  \n",
                            "0      1.412849  \n",
                            "1      1.101700  "
                        ]
                    },
                    "execution_count": 46,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df_median = df_median.reset_index()\n",
                "df_median"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "588cf301-c855-4905-a565-caa307ec2dfe",
            "metadata": {},
            "source": [
                "We can also use multiple columns in our groupby() operation. For example, we may wish to look at four groups, round from dirst file, round from second file, not round from first file, and not round from second file. To do this, we simply pass in a list of columns into df.groupby(). We will chain the methods, performing a groupby, applying a median, and then resetting the index of the result, all in one line."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 47,
            "id": "e5880741-5332-431d-89af-0e332f8b984d",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>round</th>\n",
                            "      <th>file_name</th>\n",
                            "      <th>area</th>\n",
                            "      <th>intensity_mean</th>\n",
                            "      <th>major_axis_length</th>\n",
                            "      <th>minor_axis_length</th>\n",
                            "      <th>aspect_ratio</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>False</td>\n",
                            "      <td>20P1_POS0007_D_1UL</td>\n",
                            "      <td>323.0</td>\n",
                            "      <td>91.796791</td>\n",
                            "      <td>23.755227</td>\n",
                            "      <td>17.072477</td>\n",
                            "      <td>1.467410</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>False</td>\n",
                            "      <td>20P1_POS0010_D_1UL</td>\n",
                            "      <td>237.0</td>\n",
                            "      <td>93.269113</td>\n",
                            "      <td>20.410737</td>\n",
                            "      <td>14.832035</td>\n",
                            "      <td>1.353858</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>True</td>\n",
                            "      <td>20P1_POS0007_D_1UL</td>\n",
                            "      <td>293.0</td>\n",
                            "      <td>98.227799</td>\n",
                            "      <td>20.307965</td>\n",
                            "      <td>18.599043</td>\n",
                            "      <td>1.101700</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>True</td>\n",
                            "      <td>20P1_POS0010_D_1UL</td>\n",
                            "      <td>277.5</td>\n",
                            "      <td>103.299825</td>\n",
                            "      <td>19.662330</td>\n",
                            "      <td>17.680741</td>\n",
                            "      <td>1.103133</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "   round           file_name   area  intensity_mean  major_axis_length  \\\n",
                            "0  False  20P1_POS0007_D_1UL  323.0       91.796791          23.755227   \n",
                            "1  False  20P1_POS0010_D_1UL  237.0       93.269113          20.410737   \n",
                            "2   True  20P1_POS0007_D_1UL  293.0       98.227799          20.307965   \n",
                            "3   True  20P1_POS0010_D_1UL  277.5      103.299825          19.662330   \n",
                            "\n",
                            "   minor_axis_length  aspect_ratio  \n",
                            "0          17.072477      1.467410  \n",
                            "1          14.832035      1.353858  \n",
                            "2          18.599043      1.101700  \n",
                            "3          17.680741      1.103133  "
                        ]
                    },
                    "execution_count": 47,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df.groupby(['round', 'file_name']).median().reset_index()"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "5839d877-f17a-4c5c-8a3a-a3e0d5d537c4",
            "metadata": {},
            "source": [
                "This type of operation is called an aggregation. That is, we split the data set up into groups, and then computed a summary statistic for each group, in this case the median."
            ]
        },
        {
            "cell_type": "markdown",
            "id": "b6130950-ff18-4487-ab18-9fb05588d407",
            "metadata": {},
            "source": [
                "## Aggregating with custom functions"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "f4a4eeab-fd4e-43fc-b2d5-26d0c21a04e7",
            "metadata": {},
            "source": [
                "If we want to apply a function that is not built-in, we can also do so. For example, let's apply the coefficient of variance. We can define a generic function that calculates it like shown below."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 48,
            "id": "e86a620d-9b0c-4255-b66a-4c3dea178dcd",
            "metadata": {},
            "outputs": [],
            "source": [
                "def coefficient_of_variance(data):\n",
                "    \"\"\"Compute coefficient of variation from an array of data.\"\"\"\n",
                "    return np.std(data) / np.mean(data)"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "05d72d03-18dd-409f-8034-2fb2641b02b7",
            "metadata": {},
            "source": [
                "Now we group it and apply it as an aggregating function. If there are other categorical variables, they should be masked."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 49,
            "id": "a0c01a3c-6064-4764-8772-78b52a334229",
            "metadata": {},
            "outputs": [],
            "source": [
                "grouped = df.groupby(['round', 'file_name'])"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 50,
            "id": "cbc2b527-219c-4e94-a153-a7b785cdce48",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th>area</th>\n",
                            "      <th>intensity_mean</th>\n",
                            "      <th>major_axis_length</th>\n",
                            "      <th>minor_axis_length</th>\n",
                            "      <th>aspect_ratio</th>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>round</th>\n",
                            "      <th>file_name</th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th rowspan=\"2\" valign=\"top\">False</th>\n",
                            "      <th>20P1_POS0007_D_1UL</th>\n",
                            "      <td>0.540399</td>\n",
                            "      <td>0.145956</td>\n",
                            "      <td>0.349857</td>\n",
                            "      <td>0.289063</td>\n",
                            "      <td>0.243450</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>20P1_POS0010_D_1UL</th>\n",
                            "      <td>0.765156</td>\n",
                            "      <td>0.143506</td>\n",
                            "      <td>0.447638</td>\n",
                            "      <td>0.402708</td>\n",
                            "      <td>0.316206</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th rowspan=\"2\" valign=\"top\">True</th>\n",
                            "      <th>20P1_POS0007_D_1UL</th>\n",
                            "      <td>0.248799</td>\n",
                            "      <td>0.099636</td>\n",
                            "      <td>0.145247</td>\n",
                            "      <td>0.147626</td>\n",
                            "      <td>0.036950</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>20P1_POS0010_D_1UL</th>\n",
                            "      <td>0.275120</td>\n",
                            "      <td>0.107008</td>\n",
                            "      <td>0.167722</td>\n",
                            "      <td>0.166214</td>\n",
                            "      <td>0.043755</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "                              area  intensity_mean  major_axis_length  \\\n",
                            "round file_name                                                         \n",
                            "False 20P1_POS0007_D_1UL  0.540399        0.145956           0.349857   \n",
                            "      20P1_POS0010_D_1UL  0.765156        0.143506           0.447638   \n",
                            "True  20P1_POS0007_D_1UL  0.248799        0.099636           0.145247   \n",
                            "      20P1_POS0010_D_1UL  0.275120        0.107008           0.167722   \n",
                            "\n",
                            "                          minor_axis_length  aspect_ratio  \n",
                            "round file_name                                            \n",
                            "False 20P1_POS0007_D_1UL           0.289063      0.243450  \n",
                            "      20P1_POS0010_D_1UL           0.402708      0.316206  \n",
                            "True  20P1_POS0007_D_1UL           0.147626      0.036950  \n",
                            "      20P1_POS0010_D_1UL           0.166214      0.043755  "
                        ]
                    },
                    "execution_count": 50,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "grouped.agg(coefficient_of_variance)"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "e20ae5a5",
            "metadata": {},
            "source": [
                "# Exercises"
            ]
        },
        {
            "attachments": {},
            "cell_type": "markdown",
            "id": "9db24255-2290-4e83-ac74-93d780378175",
            "metadata": {},
            "source": [
                "## Exercise 1\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 51,
            "id": "87f226cd-721b-43e3-a31a-faed5e8a6733",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>Unnamed: 0</th>\n",
                            "      <th>area</th>\n",
                            "      <th>mean_intensity</th>\n",
                            "      <th>minor_axis_length</th>\n",
                            "      <th>major_axis_length</th>\n",
                            "      <th>eccentricity</th>\n",
                            "      <th>extent</th>\n",
                            "      <th>feret_diameter_max</th>\n",
                            "      <th>equivalent_diameter_area</th>\n",
                            "      <th>bbox-0</th>\n",
                            "      <th>bbox-1</th>\n",
                            "      <th>bbox-2</th>\n",
                            "      <th>bbox-3</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>0</td>\n",
                            "      <td>422</td>\n",
                            "      <td>192.379147</td>\n",
                            "      <td>16.488550</td>\n",
                            "      <td>34.566789</td>\n",
                            "      <td>0.878900</td>\n",
                            "      <td>0.586111</td>\n",
                            "      <td>35.227830</td>\n",
                            "      <td>23.179885</td>\n",
                            "      <td>0</td>\n",
                            "      <td>11</td>\n",
                            "      <td>30</td>\n",
                            "      <td>35</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>1</td>\n",
                            "      <td>182</td>\n",
                            "      <td>180.131868</td>\n",
                            "      <td>11.736074</td>\n",
                            "      <td>20.802697</td>\n",
                            "      <td>0.825665</td>\n",
                            "      <td>0.787879</td>\n",
                            "      <td>21.377558</td>\n",
                            "      <td>15.222667</td>\n",
                            "      <td>0</td>\n",
                            "      <td>53</td>\n",
                            "      <td>11</td>\n",
                            "      <td>74</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>2</td>\n",
                            "      <td>661</td>\n",
                            "      <td>205.216339</td>\n",
                            "      <td>28.409502</td>\n",
                            "      <td>30.208433</td>\n",
                            "      <td>0.339934</td>\n",
                            "      <td>0.874339</td>\n",
                            "      <td>32.756679</td>\n",
                            "      <td>29.010538</td>\n",
                            "      <td>0</td>\n",
                            "      <td>95</td>\n",
                            "      <td>28</td>\n",
                            "      <td>122</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>3</td>\n",
                            "      <td>437</td>\n",
                            "      <td>216.585812</td>\n",
                            "      <td>23.143996</td>\n",
                            "      <td>24.606130</td>\n",
                            "      <td>0.339576</td>\n",
                            "      <td>0.826087</td>\n",
                            "      <td>26.925824</td>\n",
                            "      <td>23.588253</td>\n",
                            "      <td>0</td>\n",
                            "      <td>144</td>\n",
                            "      <td>23</td>\n",
                            "      <td>167</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>4</td>\n",
                            "      <td>476</td>\n",
                            "      <td>212.302521</td>\n",
                            "      <td>19.852882</td>\n",
                            "      <td>31.075106</td>\n",
                            "      <td>0.769317</td>\n",
                            "      <td>0.863884</td>\n",
                            "      <td>31.384710</td>\n",
                            "      <td>24.618327</td>\n",
                            "      <td>0</td>\n",
                            "      <td>237</td>\n",
                            "      <td>29</td>\n",
                            "      <td>256</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "   Unnamed: 0  area  mean_intensity  minor_axis_length  major_axis_length  \\\n",
                            "0           0   422      192.379147          16.488550          34.566789   \n",
                            "1           1   182      180.131868          11.736074          20.802697   \n",
                            "2           2   661      205.216339          28.409502          30.208433   \n",
                            "3           3   437      216.585812          23.143996          24.606130   \n",
                            "4           4   476      212.302521          19.852882          31.075106   \n",
                            "\n",
                            "   eccentricity    extent  feret_diameter_max  equivalent_diameter_area  \\\n",
                            "0      0.878900  0.586111           35.227830                 23.179885   \n",
                            "1      0.825665  0.787879           21.377558                 15.222667   \n",
                            "2      0.339934  0.874339           32.756679                 29.010538   \n",
                            "3      0.339576  0.826087           26.925824                 23.588253   \n",
                            "4      0.769317  0.863884           31.384710                 24.618327   \n",
                            "\n",
                            "   bbox-0  bbox-1  bbox-2  bbox-3  \n",
                            "0       0      11      30      35  \n",
                            "1       0      53      11      74  \n",
                            "2       0      95      28     122  \n",
                            "3       0     144      23     167  \n",
                            "4       0     237      29     256  "
                        ]
                    },
                    "execution_count": 51,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df_shape = pd.read_csv('../../data/blobs_statistics.csv')\n",
                "df_shape.head()"
            ]
        },
        {
            "attachments": {},
            "cell_type": "markdown",
            "id": "75b9d1d9",
            "metadata": {},
            "source": [
                "Create a new data frame that only contains these columns:\n",
                "* `minor_axis_length`\n",
                "* `major_axis_length`\n",
                "* `aspect_ratio`"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "59916b36-cdd1-4af6-852e-25b0806ac11c",
            "metadata": {},
            "outputs": [],
            "source": []
        },
        {
            "attachments": {},
            "cell_type": "markdown",
            "id": "780acb1c-e8f0-4678-aa2c-ca497fe26872",
            "metadata": {},
            "source": [
                "## Exercise 2\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 52,
            "id": "3022565a-6713-4fd0-ad78-1bb626e27077",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>Area</th>\n",
                            "      <th>Mean</th>\n",
                            "      <th>StdDev</th>\n",
                            "      <th>Min</th>\n",
                            "      <th>Max</th>\n",
                            "      <th>X</th>\n",
                            "      <th>Y</th>\n",
                            "      <th>XM</th>\n",
                            "      <th>YM</th>\n",
                            "      <th>Major</th>\n",
                            "      <th>Minor</th>\n",
                            "      <th>Angle</th>\n",
                            "      <th>%Area</th>\n",
                            "      <th>Type</th>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "      <th></th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>18.0</td>\n",
                            "      <td>730.389</td>\n",
                            "      <td>103.354</td>\n",
                            "      <td>592.0</td>\n",
                            "      <td>948.0</td>\n",
                            "      <td>435.000</td>\n",
                            "      <td>4.722</td>\n",
                            "      <td>434.962</td>\n",
                            "      <td>4.697</td>\n",
                            "      <td>5.987</td>\n",
                            "      <td>3.828</td>\n",
                            "      <td>168.425</td>\n",
                            "      <td>100</td>\n",
                            "      <td>A</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>126.0</td>\n",
                            "      <td>718.333</td>\n",
                            "      <td>90.367</td>\n",
                            "      <td>556.0</td>\n",
                            "      <td>1046.0</td>\n",
                            "      <td>388.087</td>\n",
                            "      <td>8.683</td>\n",
                            "      <td>388.183</td>\n",
                            "      <td>8.687</td>\n",
                            "      <td>16.559</td>\n",
                            "      <td>9.688</td>\n",
                            "      <td>175.471</td>\n",
                            "      <td>100</td>\n",
                            "      <td>A</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>608.0</td>\n",
                            "      <td>964.0</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>7.665</td>\n",
                            "      <td>7.359</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>101.121</td>\n",
                            "      <td>100</td>\n",
                            "      <td>A</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>68.0</td>\n",
                            "      <td>686.985</td>\n",
                            "      <td>61.169</td>\n",
                            "      <td>571.0</td>\n",
                            "      <td>880.0</td>\n",
                            "      <td>126.147</td>\n",
                            "      <td>8.809</td>\n",
                            "      <td>126.192</td>\n",
                            "      <td>8.811</td>\n",
                            "      <td>15.136</td>\n",
                            "      <td>5.720</td>\n",
                            "      <td>168.133</td>\n",
                            "      <td>100</td>\n",
                            "      <td>A</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>5</th>\n",
                            "      <td>NaN</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>69.438</td>\n",
                            "      <td>566.0</td>\n",
                            "      <td>792.0</td>\n",
                            "      <td>348.500</td>\n",
                            "      <td>7.500</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>7.508</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>3.088</td>\n",
                            "      <td>NaN</td>\n",
                            "      <td>100</td>\n",
                            "      <td>A</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "    Area     Mean   StdDev    Min     Max        X      Y       XM     YM  \\\n",
                            "                                                                            \n",
                            "1   18.0  730.389  103.354  592.0   948.0  435.000  4.722  434.962  4.697   \n",
                            "2  126.0  718.333   90.367  556.0  1046.0  388.087  8.683  388.183  8.687   \n",
                            "3    NaN      NaN      NaN  608.0   964.0      NaN    NaN      NaN  7.665   \n",
                            "4   68.0  686.985   61.169  571.0   880.0  126.147  8.809  126.192  8.811   \n",
                            "5    NaN      NaN   69.438  566.0   792.0  348.500  7.500      NaN  7.508   \n",
                            "\n",
                            "    Major  Minor    Angle  %Area Type  \n",
                            "                                       \n",
                            "1   5.987  3.828  168.425    100    A  \n",
                            "2  16.559  9.688  175.471    100    A  \n",
                            "3   7.359    NaN  101.121    100    A  \n",
                            "4  15.136  5.720  168.133    100    A  \n",
                            "5     NaN  3.088      NaN    100    A  "
                        ]
                    },
                    "execution_count": 52,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "data = pd.read_csv('../../data/Results.csv', index_col=0, delimiter=';')\n",
                "data.head()\n"
            ]
        },
        {
            "attachments": {},
            "cell_type": "markdown",
            "id": "1667d7b9",
            "metadata": {},
            "source": [
                "Select the columns `Area` and `Mean`. Remove all rows that contain NaNs and count the remaining rows."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "6ef5bfa3",
            "metadata": {},
            "outputs": [],
            "source": []
        },
        {
            "attachments": {},
            "cell_type": "markdown",
            "id": "38e39be3",
            "metadata": {},
            "source": [
                "Now select the columns `Major` and `Minor`. Remove NaNs and count the remaining rows again. What do you conclude?"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "4e1fa0d0",
            "metadata": {},
            "outputs": [],
            "source": []
        },
        {
            "attachments": {},
            "cell_type": "markdown",
            "id": "88a608f2-0337-4ebe-bf1b-ff0b46def99e",
            "metadata": {},
            "source": [
                "## Exercise 3"
            ]
        },
        {
            "attachments": {},
            "cell_type": "markdown",
            "id": "7e45e410-c69b-41b0-b11e-e939d35e80df",
            "metadata": {},
            "source": [
                "Group the tidy dataframe from above by 'Treatment' and 'Channel'"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 53,
            "id": "1b8769e5-0287-4fcc-b0d1-8ab9884c2c9d",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>Treatment</th>\n",
                            "      <th>Channel</th>\n",
                            "      <th>intensity</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_1</td>\n",
                            "      <td>13.250000</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_1</td>\n",
                            "      <td>44.954545</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_1</td>\n",
                            "      <td>13.590909</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_1</td>\n",
                            "      <td>85.032258</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>Before</td>\n",
                            "      <td>channel_1</td>\n",
                            "      <td>10.731707</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>...</th>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "      <td>...</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>99</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>73.286439</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>100</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>145.900739</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>101</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>115.347217</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>102</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>61.225962</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>103</th>\n",
                            "      <td>After</td>\n",
                            "      <td>channel_2</td>\n",
                            "      <td>77.490249</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "<p>104 rows × 3 columns</p>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "    Treatment    Channel   intensity\n",
                            "0      Before  channel_1   13.250000\n",
                            "1      Before  channel_1   44.954545\n",
                            "2      Before  channel_1   13.590909\n",
                            "3      Before  channel_1   85.032258\n",
                            "4      Before  channel_1   10.731707\n",
                            "..        ...        ...         ...\n",
                            "99      After  channel_2   73.286439\n",
                            "100     After  channel_2  145.900739\n",
                            "101     After  channel_2  115.347217\n",
                            "102     After  channel_2   61.225962\n",
                            "103     After  channel_2   77.490249\n",
                            "\n",
                            "[104 rows x 3 columns]"
                        ]
                    },
                    "execution_count": 53,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df_tidy"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "6eade53a",
            "metadata": {},
            "outputs": [],
            "source": []
        },
        {
            "attachments": {},
            "cell_type": "markdown",
            "id": "2d09a3a3",
            "metadata": {},
            "source": [
                "Calculate the mean and standard deviation for the groups"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "7abd7722-9eb2-4acc-81b7-3f9dceeac73a",
            "metadata": {},
            "outputs": [],
            "source": []
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "Python 3.9.13 ('dbnapari-arm64')",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.9.13"
        },
        "vscode": {
            "interpreter": {
                "hash": "f13a533d7f6b525f1b9e27d74dd8bd7ea06c4c95ae7aaf0187c7426193b5690e"
            }
        }
    },
    "nbformat": 4,
    "nbformat_minor": 5
}