{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "51e6f4d9-a000-4b05-9bcc-dc52db91658c",
   "metadata": {},
   "source": [
    "# Tidy-Data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "248c6fda-e47c-46c2-957d-cf4bdf4d141d",
   "metadata": {},
   "source": [
    "Hadley Wickham wrote a [great article](https://www.jstatsoft.org/article/view/v059i10) in favor of “tidy data.” Tidy data frames follow the rules:\n",
    "\n",
    "- Each variable is a column.\n",
    "\n",
    "- Each observation is a row.\n",
    "\n",
    "- Each type of observation has its own separate data frame.\n",
    "\n",
    "This is less pretty to visualize as a table, but we rarely look at data in tables. Indeed, the representation of data which is convenient for visualization is different from that which is convenient for analysis. A tidy data frame is almost always much easier to work with than non-tidy formats."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "774c1a89-2f91-404b-8f74-284acabd3bcb",
   "metadata": {},
   "source": [
    "Let's import a saved table with measurements. Is this table tidy?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "5c88af81-7a31-42bb-8f12-69a89f2f1e0a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "504fd34a-9454-4fb3-9ccf-4b9964feada9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"2\" halign=\"left\">Before</th>\n",
       "      <th colspan=\"2\" halign=\"left\">After</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>channel_1</th>\n",
       "      <th>channel_2</th>\n",
       "      <th>channel_1</th>\n",
       "      <th>channel_2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>13.250000</td>\n",
       "      <td>21.000000</td>\n",
       "      <td>15.137984</td>\n",
       "      <td>42.022776</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>44.954545</td>\n",
       "      <td>24.318182</td>\n",
       "      <td>43.328836</td>\n",
       "      <td>48.661610</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>13.590909</td>\n",
       "      <td>18.772727</td>\n",
       "      <td>11.685995</td>\n",
       "      <td>37.926184</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>85.032258</td>\n",
       "      <td>19.741935</td>\n",
       "      <td>86.031461</td>\n",
       "      <td>40.396353</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>10.731707</td>\n",
       "      <td>25.268293</td>\n",
       "      <td>10.075421</td>\n",
       "      <td>51.471865</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>94.625000</td>\n",
       "      <td>36.450000</td>\n",
       "      <td>95.180900</td>\n",
       "      <td>73.347843</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>89.836735</td>\n",
       "      <td>34.693878</td>\n",
       "      <td>89.857864</td>\n",
       "      <td>69.902829</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>100.261905</td>\n",
       "      <td>34.904762</td>\n",
       "      <td>101.989852</td>\n",
       "      <td>70.156432</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>29.615385</td>\n",
       "      <td>52.115385</td>\n",
       "      <td>31.516654</td>\n",
       "      <td>104.525198</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>15.868421</td>\n",
       "      <td>24.921053</td>\n",
       "      <td>16.086932</td>\n",
       "      <td>50.563301</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>12.475000</td>\n",
       "      <td>25.450000</td>\n",
       "      <td>11.529924</td>\n",
       "      <td>51.381594</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>87.875000</td>\n",
       "      <td>28.050000</td>\n",
       "      <td>89.745522</td>\n",
       "      <td>56.543107</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>58.800000</td>\n",
       "      <td>22.600000</td>\n",
       "      <td>59.646229</td>\n",
       "      <td>45.215405</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>91.061224</td>\n",
       "      <td>40.367347</td>\n",
       "      <td>89.935893</td>\n",
       "      <td>81.326111</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>23.500000</td>\n",
       "      <td>117.333333</td>\n",
       "      <td>21.676993</td>\n",
       "      <td>235.067654</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>82.566667</td>\n",
       "      <td>34.566667</td>\n",
       "      <td>84.097735</td>\n",
       "      <td>69.820702</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>36.120000</td>\n",
       "      <td>29.600000</td>\n",
       "      <td>37.688676</td>\n",
       "      <td>59.870177</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>70.687500</td>\n",
       "      <td>33.843750</td>\n",
       "      <td>72.569112</td>\n",
       "      <td>68.493363</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>102.021277</td>\n",
       "      <td>33.297872</td>\n",
       "      <td>100.419746</td>\n",
       "      <td>67.379506</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>72.318182</td>\n",
       "      <td>103.909091</td>\n",
       "      <td>70.843134</td>\n",
       "      <td>207.956510</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>18.100000</td>\n",
       "      <td>29.166667</td>\n",
       "      <td>17.865201</td>\n",
       "      <td>58.361239</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>5.217391</td>\n",
       "      <td>36.347826</td>\n",
       "      <td>6.961346</td>\n",
       "      <td>73.286439</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>19.925926</td>\n",
       "      <td>72.814815</td>\n",
       "      <td>18.607102</td>\n",
       "      <td>145.900739</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>26.673077</td>\n",
       "      <td>57.403846</td>\n",
       "      <td>27.611368</td>\n",
       "      <td>115.347217</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>13.340000</td>\n",
       "      <td>30.400000</td>\n",
       "      <td>14.160543</td>\n",
       "      <td>61.225962</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>15.028571</td>\n",
       "      <td>38.400000</td>\n",
       "      <td>14.529963</td>\n",
       "      <td>77.490249</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Before                   After            \n",
       "     channel_1   channel_2   channel_1   channel_2\n",
       "0    13.250000   21.000000   15.137984   42.022776\n",
       "1    44.954545   24.318182   43.328836   48.661610\n",
       "2    13.590909   18.772727   11.685995   37.926184\n",
       "3    85.032258   19.741935   86.031461   40.396353\n",
       "4    10.731707   25.268293   10.075421   51.471865\n",
       "5    94.625000   36.450000   95.180900   73.347843\n",
       "6    89.836735   34.693878   89.857864   69.902829\n",
       "7   100.261905   34.904762  101.989852   70.156432\n",
       "8    29.615385   52.115385   31.516654  104.525198\n",
       "9    15.868421   24.921053   16.086932   50.563301\n",
       "10   12.475000   25.450000   11.529924   51.381594\n",
       "11   87.875000   28.050000   89.745522   56.543107\n",
       "12   58.800000   22.600000   59.646229   45.215405\n",
       "13   91.061224   40.367347   89.935893   81.326111\n",
       "14   23.500000  117.333333   21.676993  235.067654\n",
       "15   82.566667   34.566667   84.097735   69.820702\n",
       "16   36.120000   29.600000   37.688676   59.870177\n",
       "17   70.687500   33.843750   72.569112   68.493363\n",
       "18  102.021277   33.297872  100.419746   67.379506\n",
       "19   72.318182  103.909091   70.843134  207.956510\n",
       "20   18.100000   29.166667   17.865201   58.361239\n",
       "21    5.217391   36.347826    6.961346   73.286439\n",
       "22   19.925926   72.814815   18.607102  145.900739\n",
       "23   26.673077   57.403846   27.611368  115.347217\n",
       "24   13.340000   30.400000   14.160543   61.225962\n",
       "25   15.028571   38.400000   14.529963   77.490249"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('../../data/Multi_analysis.csv', header = [0,1], sep=';')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e497f862-4f03-4d58-9b92-0dea8d750f16",
   "metadata": {},
   "source": [
    "The most useful function for tidying data is [pd.melt](https://pandas.pydata.org/docs/reference/api/pandas.melt.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "0022b484-fd53-4c82-be11-9a3fe2261497",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>variable_0</th>\n",
       "      <th>variable_1</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Before</td>\n",
       "      <td>channel_1</td>\n",
       "      <td>13.250000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Before</td>\n",
       "      <td>channel_1</td>\n",
       "      <td>44.954545</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Before</td>\n",
       "      <td>channel_1</td>\n",
       "      <td>13.590909</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Before</td>\n",
       "      <td>channel_1</td>\n",
       "      <td>85.032258</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Before</td>\n",
       "      <td>channel_1</td>\n",
       "      <td>10.731707</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99</th>\n",
       "      <td>After</td>\n",
       "      <td>channel_2</td>\n",
       "      <td>73.286439</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100</th>\n",
       "      <td>After</td>\n",
       "      <td>channel_2</td>\n",
       "      <td>145.900739</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>101</th>\n",
       "      <td>After</td>\n",
       "      <td>channel_2</td>\n",
       "      <td>115.347217</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>102</th>\n",
       "      <td>After</td>\n",
       "      <td>channel_2</td>\n",
       "      <td>61.225962</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103</th>\n",
       "      <td>After</td>\n",
       "      <td>channel_2</td>\n",
       "      <td>77.490249</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>104 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    variable_0 variable_1       value\n",
       "0       Before  channel_1   13.250000\n",
       "1       Before  channel_1   44.954545\n",
       "2       Before  channel_1   13.590909\n",
       "3       Before  channel_1   85.032258\n",
       "4       Before  channel_1   10.731707\n",
       "..         ...        ...         ...\n",
       "99       After  channel_2   73.286439\n",
       "100      After  channel_2  145.900739\n",
       "101      After  channel_2  115.347217\n",
       "102      After  channel_2   61.225962\n",
       "103      After  channel_2   77.490249\n",
       "\n",
       "[104 rows x 3 columns]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.melt()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dd24af67-9f6e-4f8c-a432-37eb146e2bc9",
   "metadata": {},
   "source": [
    "We can specify names for the value and for the variables. In this case, our measurements are of intensity and our variables are Intervention (before or after) and channel."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "9ffaf527-78d8-4a9c-89e0-19858b877c2e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Intervention</th>\n",
       "      <th>Channel</th>\n",
       "      <th>intensity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Before</td>\n",
       "      <td>channel_1</td>\n",
       "      <td>13.250000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Before</td>\n",
       "      <td>channel_1</td>\n",
       "      <td>44.954545</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Before</td>\n",
       "      <td>channel_1</td>\n",
       "      <td>13.590909</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Before</td>\n",
       "      <td>channel_1</td>\n",
       "      <td>85.032258</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Before</td>\n",
       "      <td>channel_1</td>\n",
       "      <td>10.731707</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99</th>\n",
       "      <td>After</td>\n",
       "      <td>channel_2</td>\n",
       "      <td>73.286439</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100</th>\n",
       "      <td>After</td>\n",
       "      <td>channel_2</td>\n",
       "      <td>145.900739</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>101</th>\n",
       "      <td>After</td>\n",
       "      <td>channel_2</td>\n",
       "      <td>115.347217</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>102</th>\n",
       "      <td>After</td>\n",
       "      <td>channel_2</td>\n",
       "      <td>61.225962</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103</th>\n",
       "      <td>After</td>\n",
       "      <td>channel_2</td>\n",
       "      <td>77.490249</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>104 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    Intervention    Channel   intensity\n",
       "0         Before  channel_1   13.250000\n",
       "1         Before  channel_1   44.954545\n",
       "2         Before  channel_1   13.590909\n",
       "3         Before  channel_1   85.032258\n",
       "4         Before  channel_1   10.731707\n",
       "..           ...        ...         ...\n",
       "99         After  channel_2   73.286439\n",
       "100        After  channel_2  145.900739\n",
       "101        After  channel_2  115.347217\n",
       "102        After  channel_2   61.225962\n",
       "103        After  channel_2   77.490249\n",
       "\n",
       "[104 rows x 3 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_tidy = df.melt(value_name='intensity', var_name=['Intervention', 'Channel'])\n",
    "df_tidy"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d95d7c92-95fe-48a5-8554-4b366830f351",
   "metadata": {},
   "source": [
    "It may not look better for interpreting, but it becomes easier to manipulate, because now we can more easily mask by columns. Here we select intensity measurement rows for \"channel_2\" and \"After\" intervention."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "512f335d-61f7-4dc9-9ff1-239802b2f9da",
   "metadata": {},
   "outputs": [],
   "source": [
    "row_mask = (df_tidy['Channel'] == 'channel_2') & (df_tidy['Intervention'] == 'After')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "88c4b6bb-81b7-48e8-ada8-5f2acf2e4888",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "78      42.022776\n",
       "79      48.661610\n",
       "80      37.926184\n",
       "81      40.396353\n",
       "82      51.471865\n",
       "83      73.347843\n",
       "84      69.902829\n",
       "85      70.156432\n",
       "86     104.525198\n",
       "87      50.563301\n",
       "88      51.381594\n",
       "89      56.543107\n",
       "90      45.215405\n",
       "91      81.326111\n",
       "92     235.067654\n",
       "93      69.820702\n",
       "94      59.870177\n",
       "95      68.493363\n",
       "96      67.379506\n",
       "97     207.956510\n",
       "98      58.361239\n",
       "99      73.286439\n",
       "100    145.900739\n",
       "101    115.347217\n",
       "102     61.225962\n",
       "103     77.490249\n",
       "Name: intensity, dtype: float64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_tidy.loc[row_mask, :]['intensity']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22c0f001-a3ae-4d4a-902a-8fe55cbd79b9",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}