{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "78a139a0",
   "metadata": {},
   "source": [
    "# Handling NaN values\n",
    "When analysing tabular data, sometimes table cells are present that does not contain data. In Python this typically means the value is _Not a Number_ ([NaN](https://en.wikipedia.org/wiki/NaN)). We cannot assume these values are `0` or `-1` or any other value because that would distort descriptive statistics, for example. We need to deal with these NaN entries differently and this notebook will introduce how.\n",
    "\n",
    "To get a first view where NaNs play a role, we load again an example table and sort it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "189e76b0-0cc2-4baa-8290-e5a06ab2d70b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "4e617db1-ac10-4f69-9ba9-97913d517a15",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Area</th>\n",
       "      <th>Mean</th>\n",
       "      <th>StdDev</th>\n",
       "      <th>Min</th>\n",
       "      <th>Max</th>\n",
       "      <th>X</th>\n",
       "      <th>Y</th>\n",
       "      <th>XM</th>\n",
       "      <th>YM</th>\n",
       "      <th>Major</th>\n",
       "      <th>Minor</th>\n",
       "      <th>Angle</th>\n",
       "      <th>%Area</th>\n",
       "      <th>Type</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>190</th>\n",
       "      <td>2755.0</td>\n",
       "      <td>859.928</td>\n",
       "      <td>235.458</td>\n",
       "      <td>539.0</td>\n",
       "      <td>3880.0</td>\n",
       "      <td>108.710</td>\n",
       "      <td>302.158</td>\n",
       "      <td>110.999</td>\n",
       "      <td>300.247</td>\n",
       "      <td>144.475</td>\n",
       "      <td>24.280</td>\n",
       "      <td>39.318</td>\n",
       "      <td>100</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>81</th>\n",
       "      <td>2295.0</td>\n",
       "      <td>765.239</td>\n",
       "      <td>96.545</td>\n",
       "      <td>558.0</td>\n",
       "      <td>1431.0</td>\n",
       "      <td>375.003</td>\n",
       "      <td>134.888</td>\n",
       "      <td>374.982</td>\n",
       "      <td>135.359</td>\n",
       "      <td>65.769</td>\n",
       "      <td>44.429</td>\n",
       "      <td>127.247</td>\n",
       "      <td>100</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>209</th>\n",
       "      <td>1821.0</td>\n",
       "      <td>847.761</td>\n",
       "      <td>122.074</td>\n",
       "      <td>600.0</td>\n",
       "      <td>1510.0</td>\n",
       "      <td>287.795</td>\n",
       "      <td>321.115</td>\n",
       "      <td>288.074</td>\n",
       "      <td>321.824</td>\n",
       "      <td>55.879</td>\n",
       "      <td>41.492</td>\n",
       "      <td>112.124</td>\n",
       "      <td>100</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>252</th>\n",
       "      <td>1528.0</td>\n",
       "      <td>763.777</td>\n",
       "      <td>83.183</td>\n",
       "      <td>572.0</td>\n",
       "      <td>1172.0</td>\n",
       "      <td>191.969</td>\n",
       "      <td>385.944</td>\n",
       "      <td>192.487</td>\n",
       "      <td>385.697</td>\n",
       "      <td>63.150</td>\n",
       "      <td>30.808</td>\n",
       "      <td>34.424</td>\n",
       "      <td>100</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>265</th>\n",
       "      <td>1252.0</td>\n",
       "      <td>793.371</td>\n",
       "      <td>117.139</td>\n",
       "      <td>579.0</td>\n",
       "      <td>1668.0</td>\n",
       "      <td>262.071</td>\n",
       "      <td>394.497</td>\n",
       "      <td>262.268</td>\n",
       "      <td>394.326</td>\n",
       "      <td>60.154</td>\n",
       "      <td>26.500</td>\n",
       "      <td>50.147</td>\n",
       "      <td>100</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>113</th>\n",
       "      <td>1.0</td>\n",
       "      <td>587.000</td>\n",
       "      <td>0.000</td>\n",
       "      <td>587.0</td>\n",
       "      <td>587.0</td>\n",
       "      <td>399.500</td>\n",
       "      <td>117.500</td>\n",
       "      <td>399.500</td>\n",
       "      <td>117.500</td>\n",
       "      <td>1.128</td>\n",
       "      <td>1.128</td>\n",
       "      <td>0.000</td>\n",
       "      <td>100</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>310</th>\n",
       "      <td>1.0</td>\n",
       "      <td>866.000</td>\n",
       "      <td>0.000</td>\n",
       "      <td>866.0</td>\n",
       "      <td>866.0</td>\n",
       "      <td>343.500</td>\n",
       "      <td>408.500</td>\n",
       "      <td>343.500</td>\n",
       "      <td>408.500</td>\n",
       "      <td>1.128</td>\n",
       "      <td>1.128</td>\n",
       "      <td>0.000</td>\n",
       "      <td>100</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>219</th>\n",
       "      <td>1.0</td>\n",
       "      <td>763.000</td>\n",
       "      <td>0.000</td>\n",
       "      <td>763.0</td>\n",
       "      <td>763.0</td>\n",
       "      <td>411.500</td>\n",
       "      <td>296.500</td>\n",
       "      <td>411.500</td>\n",
       "      <td>296.500</td>\n",
       "      <td>1.128</td>\n",
       "      <td>1.128</td>\n",
       "      <td>0.000</td>\n",
       "      <td>100</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>608.0</td>\n",
       "      <td>964.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>7.665</td>\n",
       "      <td>7.359</td>\n",
       "      <td>NaN</td>\n",
       "      <td>101.121</td>\n",
       "      <td>100</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>69.438</td>\n",
       "      <td>566.0</td>\n",
       "      <td>792.0</td>\n",
       "      <td>348.500</td>\n",
       "      <td>7.500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>7.508</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3.088</td>\n",
       "      <td>NaN</td>\n",
       "      <td>100</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>391 rows × 14 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       Area     Mean   StdDev    Min     Max        X        Y       XM  \\\n",
       "                                                                          \n",
       "190  2755.0  859.928  235.458  539.0  3880.0  108.710  302.158  110.999   \n",
       "81   2295.0  765.239   96.545  558.0  1431.0  375.003  134.888  374.982   \n",
       "209  1821.0  847.761  122.074  600.0  1510.0  287.795  321.115  288.074   \n",
       "252  1528.0  763.777   83.183  572.0  1172.0  191.969  385.944  192.487   \n",
       "265  1252.0  793.371  117.139  579.0  1668.0  262.071  394.497  262.268   \n",
       "..      ...      ...      ...    ...     ...      ...      ...      ...   \n",
       "113     1.0  587.000    0.000  587.0   587.0  399.500  117.500  399.500   \n",
       "310     1.0  866.000    0.000  866.0   866.0  343.500  408.500  343.500   \n",
       "219     1.0  763.000    0.000  763.0   763.0  411.500  296.500  411.500   \n",
       "3       NaN      NaN      NaN  608.0   964.0      NaN      NaN      NaN   \n",
       "5       NaN      NaN   69.438  566.0   792.0  348.500    7.500      NaN   \n",
       "\n",
       "          YM    Major   Minor    Angle  %Area Type  \n",
       "                                                    \n",
       "190  300.247  144.475  24.280   39.318    100    C  \n",
       "81   135.359   65.769  44.429  127.247    100    B  \n",
       "209  321.824   55.879  41.492  112.124    100    A  \n",
       "252  385.697   63.150  30.808   34.424    100    B  \n",
       "265  394.326   60.154  26.500   50.147    100    A  \n",
       "..       ...      ...     ...      ...    ...  ...  \n",
       "113  117.500    1.128   1.128    0.000    100    A  \n",
       "310  408.500    1.128   1.128    0.000    100    A  \n",
       "219  296.500    1.128   1.128    0.000    100    A  \n",
       "3      7.665    7.359     NaN  101.121    100    A  \n",
       "5      7.508      NaN   3.088      NaN    100    A  \n",
       "\n",
       "[391 rows x 14 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.read_csv('../../data/Results.csv', index_col=0, delimiter=';')\n",
    "data.sort_values(by = \"Area\", ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4532812",
   "metadata": {},
   "source": [
    "As you can see, there are rows at the bottom containing NaNs. These are at the bottom of the table because pandas cannot sort them."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2bc0c85e",
   "metadata": {},
   "source": [
    "A quick check if there are NaNs anywhere in a DataFrame is an important quality check and good scientific practice."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "5c152771",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.isnull().values.any()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "87f14e62-b3e5-45f6-9c02-9820b21bd929",
   "metadata": {},
   "source": [
    "We can also get some deeper insights in which columns these NaN values are located."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "7f6b5eb4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Area      2\n",
       "Mean      5\n",
       "StdDev    3\n",
       "Min       3\n",
       "Max       3\n",
       "X         2\n",
       "Y         3\n",
       "XM        3\n",
       "YM        5\n",
       "Major     8\n",
       "Minor     3\n",
       "Angle     1\n",
       "%Area     0\n",
       "Type      0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.isnull().sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b27eeccd",
   "metadata": {},
   "source": [
    "For getting a glimpse about if we can further process that tabel, we may want to know the percentage of NaNs for each column?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "a9297b56",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Major     2.046036\n",
       "Mean      1.278772\n",
       "YM        1.278772\n",
       "StdDev    0.767263\n",
       "Min       0.767263\n",
       "Max       0.767263\n",
       "Y         0.767263\n",
       "XM        0.767263\n",
       "Minor     0.767263\n",
       "Area      0.511509\n",
       "X         0.511509\n",
       "Angle     0.255754\n",
       "%Area     0.000000\n",
       "Type      0.000000\n",
       "dtype: float64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.isnull().mean().sort_values(ascending=False) *100"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f605facf",
   "metadata": {},
   "source": [
    "# Dropping rows that contain NaNs\n",
    "Depending on what kind of data analysis should be performed, it might make sense to just ignore columns that contain NaN values. Alternatively, it is possible to delete rows that contain NaNs.\n",
    "\n",
    "It depends on your project and what is important or not for the analysis. Its not an easy answer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "25bac0a9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Area</th>\n",
       "      <th>Mean</th>\n",
       "      <th>StdDev</th>\n",
       "      <th>Min</th>\n",
       "      <th>Max</th>\n",
       "      <th>X</th>\n",
       "      <th>Y</th>\n",
       "      <th>XM</th>\n",
       "      <th>YM</th>\n",
       "      <th>Major</th>\n",
       "      <th>Minor</th>\n",
       "      <th>Angle</th>\n",
       "      <th>%Area</th>\n",
       "      <th>Type</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>18.0</td>\n",
       "      <td>730.389</td>\n",
       "      <td>103.354</td>\n",
       "      <td>592.0</td>\n",
       "      <td>948.0</td>\n",
       "      <td>435.000</td>\n",
       "      <td>4.722</td>\n",
       "      <td>434.962</td>\n",
       "      <td>4.697</td>\n",
       "      <td>5.987</td>\n",
       "      <td>3.828</td>\n",
       "      <td>168.425</td>\n",
       "      <td>100</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>126.0</td>\n",
       "      <td>718.333</td>\n",
       "      <td>90.367</td>\n",
       "      <td>556.0</td>\n",
       "      <td>1046.0</td>\n",
       "      <td>388.087</td>\n",
       "      <td>8.683</td>\n",
       "      <td>388.183</td>\n",
       "      <td>8.687</td>\n",
       "      <td>16.559</td>\n",
       "      <td>9.688</td>\n",
       "      <td>175.471</td>\n",
       "      <td>100</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>68.0</td>\n",
       "      <td>686.985</td>\n",
       "      <td>61.169</td>\n",
       "      <td>571.0</td>\n",
       "      <td>880.0</td>\n",
       "      <td>126.147</td>\n",
       "      <td>8.809</td>\n",
       "      <td>126.192</td>\n",
       "      <td>8.811</td>\n",
       "      <td>15.136</td>\n",
       "      <td>5.720</td>\n",
       "      <td>168.133</td>\n",
       "      <td>100</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>669.0</td>\n",
       "      <td>697.164</td>\n",
       "      <td>72.863</td>\n",
       "      <td>539.0</td>\n",
       "      <td>957.0</td>\n",
       "      <td>471.696</td>\n",
       "      <td>26.253</td>\n",
       "      <td>471.694</td>\n",
       "      <td>26.197</td>\n",
       "      <td>36.656</td>\n",
       "      <td>23.237</td>\n",
       "      <td>124.340</td>\n",
       "      <td>100</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>5.0</td>\n",
       "      <td>658.600</td>\n",
       "      <td>49.161</td>\n",
       "      <td>607.0</td>\n",
       "      <td>710.0</td>\n",
       "      <td>28.300</td>\n",
       "      <td>8.100</td>\n",
       "      <td>28.284</td>\n",
       "      <td>8.103</td>\n",
       "      <td>3.144</td>\n",
       "      <td>2.025</td>\n",
       "      <td>161.565</td>\n",
       "      <td>100</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>383</th>\n",
       "      <td>94.0</td>\n",
       "      <td>746.617</td>\n",
       "      <td>85.198</td>\n",
       "      <td>550.0</td>\n",
       "      <td>1021.0</td>\n",
       "      <td>194.032</td>\n",
       "      <td>498.223</td>\n",
       "      <td>194.014</td>\n",
       "      <td>498.239</td>\n",
       "      <td>17.295</td>\n",
       "      <td>6.920</td>\n",
       "      <td>52.720</td>\n",
       "      <td>100</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>387</th>\n",
       "      <td>152.0</td>\n",
       "      <td>801.599</td>\n",
       "      <td>111.328</td>\n",
       "      <td>582.0</td>\n",
       "      <td>1263.0</td>\n",
       "      <td>348.487</td>\n",
       "      <td>497.632</td>\n",
       "      <td>348.451</td>\n",
       "      <td>497.675</td>\n",
       "      <td>17.773</td>\n",
       "      <td>10.889</td>\n",
       "      <td>11.829</td>\n",
       "      <td>100</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>389</th>\n",
       "      <td>60.0</td>\n",
       "      <td>758.033</td>\n",
       "      <td>77.309</td>\n",
       "      <td>601.0</td>\n",
       "      <td>947.0</td>\n",
       "      <td>259.000</td>\n",
       "      <td>499.300</td>\n",
       "      <td>258.990</td>\n",
       "      <td>499.289</td>\n",
       "      <td>9.476</td>\n",
       "      <td>8.062</td>\n",
       "      <td>90.000</td>\n",
       "      <td>100</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>390</th>\n",
       "      <td>12.0</td>\n",
       "      <td>714.833</td>\n",
       "      <td>67.294</td>\n",
       "      <td>551.0</td>\n",
       "      <td>785.0</td>\n",
       "      <td>240.167</td>\n",
       "      <td>498.167</td>\n",
       "      <td>240.179</td>\n",
       "      <td>498.148</td>\n",
       "      <td>4.606</td>\n",
       "      <td>3.317</td>\n",
       "      <td>168.690</td>\n",
       "      <td>100</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>391</th>\n",
       "      <td>23.0</td>\n",
       "      <td>695.043</td>\n",
       "      <td>67.356</td>\n",
       "      <td>611.0</td>\n",
       "      <td>846.0</td>\n",
       "      <td>49.891</td>\n",
       "      <td>503.022</td>\n",
       "      <td>49.882</td>\n",
       "      <td>502.979</td>\n",
       "      <td>6.454</td>\n",
       "      <td>4.537</td>\n",
       "      <td>73.243</td>\n",
       "      <td>100</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>374 rows × 14 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      Area     Mean   StdDev    Min     Max        X        Y       XM  \\\n",
       "                                                                         \n",
       "1     18.0  730.389  103.354  592.0   948.0  435.000    4.722  434.962   \n",
       "2    126.0  718.333   90.367  556.0  1046.0  388.087    8.683  388.183   \n",
       "4     68.0  686.985   61.169  571.0   880.0  126.147    8.809  126.192   \n",
       "6    669.0  697.164   72.863  539.0   957.0  471.696   26.253  471.694   \n",
       "7      5.0  658.600   49.161  607.0   710.0   28.300    8.100   28.284   \n",
       "..     ...      ...      ...    ...     ...      ...      ...      ...   \n",
       "383   94.0  746.617   85.198  550.0  1021.0  194.032  498.223  194.014   \n",
       "387  152.0  801.599  111.328  582.0  1263.0  348.487  497.632  348.451   \n",
       "389   60.0  758.033   77.309  601.0   947.0  259.000  499.300  258.990   \n",
       "390   12.0  714.833   67.294  551.0   785.0  240.167  498.167  240.179   \n",
       "391   23.0  695.043   67.356  611.0   846.0   49.891  503.022   49.882   \n",
       "\n",
       "          YM   Major   Minor    Angle  %Area Type  \n",
       "                                                   \n",
       "1      4.697   5.987   3.828  168.425    100    A  \n",
       "2      8.687  16.559   9.688  175.471    100    A  \n",
       "4      8.811  15.136   5.720  168.133    100    A  \n",
       "6     26.197  36.656  23.237  124.340    100    A  \n",
       "7      8.103   3.144   2.025  161.565    100    A  \n",
       "..       ...     ...     ...      ...    ...  ...  \n",
       "383  498.239  17.295   6.920   52.720    100    B  \n",
       "387  497.675  17.773  10.889   11.829    100    A  \n",
       "389  499.289   9.476   8.062   90.000    100    A  \n",
       "390  498.148   4.606   3.317  168.690    100    A  \n",
       "391  502.979   6.454   4.537   73.243    100    A  \n",
       "\n",
       "[374 rows x 14 columns]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_no_nan = data.dropna(how=\"any\")\n",
    "data_no_nan "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d857531-4c88-450b-ae62-9008388088ba",
   "metadata": {},
   "source": [
    "On the bottom of that table, you can see that it still contains 374 of the original 391 columns. If you remove rows, you should document in your later scientific publication, how many out of the total number of datasets were analysed.\n",
    "\n",
    "We can now also check again if NaNs are present."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "f09a2106",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_no_nan.isnull().values.any()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab941145-75fe-4d5a-80fb-ab85672d0a86",
   "metadata": {},
   "source": [
    "## Determining rows that contain NaNs\n",
    "In some use-cases it might be useful to have a list of row-indices where there are NaN values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "84addf53-beb6-4955-bad9-fe52517f64d7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>22</td>\n",
       "      <td>NaN</td>\n",
       "      <td>44.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>21</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12</td>\n",
       "      <td>12.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>23</td>\n",
       "      <td>22.0</td>\n",
       "      <td>52.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    A     B     C\n",
       "0   0   2.0   2.0\n",
       "1   1   3.0   3.0\n",
       "2  22   NaN  44.0\n",
       "3  21   2.0   2.0\n",
       "4  12  12.0   NaN\n",
       "5  23  22.0  52.0"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = {\n",
    "    'A': [0, 1, 22, 21, 12, 23],\n",
    "    'B': [2, 3, np.nan,  2,  12, 22],\n",
    "    'C': [2, 3, 44,  2,  np.nan, 52],\n",
    "}\n",
    "\n",
    "table = pd.DataFrame(data)\n",
    "table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "7b3b3292-0864-484e-b2d4-dab5c1a0b6ed",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([False, False,  True, False,  True, False])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.max(table.isnull().values, axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "780acb1c-e8f0-4678-aa2c-ca497fe26872",
   "metadata": {},
   "source": [
    "## Exercise\n",
    "Take the original `data` table and select the columns `Area` and `Mean`. Remove all rows that contain NaNs and count the remaining rows.\n",
    "Afterwards, take the original `data` table again and select the columns `Major` and `Minor`. Remove NaNs and count the remaining rows again. What do you conclude?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12c2e25b-8d38-4c21-8390-ea94742a6070",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}