{ "cells": [ { "cell_type": "markdown", "id": "78a139a0", "metadata": {}, "source": [ "# Handling NaN values\n", "When analysing tabular data, sometimes table cells are present that does not contain data. In Python this typically means the value is _Not a Number_ ([NaN](https://en.wikipedia.org/wiki/NaN)). We cannot assume these values are `0` or `-1` or any other value because that would distort descriptive statistics, for example. We need to deal with these NaN entries differently and this notebook will introduce how.\n", "\n", "To get a first view where NaNs play a role, we load again an example table and sort it." ] }, { "cell_type": "code", "execution_count": 1, "id": "189e76b0-0cc2-4baa-8290-e5a06ab2d70b", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd " ] }, { "cell_type": "code", "execution_count": 2, "id": "4e617db1-ac10-4f69-9ba9-97913d517a15", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Area | \n", "Mean | \n", "StdDev | \n", "Min | \n", "Max | \n", "X | \n", "Y | \n", "XM | \n", "YM | \n", "Major | \n", "Minor | \n", "Angle | \n", "%Area | \n", "Type | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
190 | \n", "2755.0 | \n", "859.928 | \n", "235.458 | \n", "539.0 | \n", "3880.0 | \n", "108.710 | \n", "302.158 | \n", "110.999 | \n", "300.247 | \n", "144.475 | \n", "24.280 | \n", "39.318 | \n", "100 | \n", "C | \n", "
81 | \n", "2295.0 | \n", "765.239 | \n", "96.545 | \n", "558.0 | \n", "1431.0 | \n", "375.003 | \n", "134.888 | \n", "374.982 | \n", "135.359 | \n", "65.769 | \n", "44.429 | \n", "127.247 | \n", "100 | \n", "B | \n", "
209 | \n", "1821.0 | \n", "847.761 | \n", "122.074 | \n", "600.0 | \n", "1510.0 | \n", "287.795 | \n", "321.115 | \n", "288.074 | \n", "321.824 | \n", "55.879 | \n", "41.492 | \n", "112.124 | \n", "100 | \n", "A | \n", "
252 | \n", "1528.0 | \n", "763.777 | \n", "83.183 | \n", "572.0 | \n", "1172.0 | \n", "191.969 | \n", "385.944 | \n", "192.487 | \n", "385.697 | \n", "63.150 | \n", "30.808 | \n", "34.424 | \n", "100 | \n", "B | \n", "
265 | \n", "1252.0 | \n", "793.371 | \n", "117.139 | \n", "579.0 | \n", "1668.0 | \n", "262.071 | \n", "394.497 | \n", "262.268 | \n", "394.326 | \n", "60.154 | \n", "26.500 | \n", "50.147 | \n", "100 | \n", "A | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
113 | \n", "1.0 | \n", "587.000 | \n", "0.000 | \n", "587.0 | \n", "587.0 | \n", "399.500 | \n", "117.500 | \n", "399.500 | \n", "117.500 | \n", "1.128 | \n", "1.128 | \n", "0.000 | \n", "100 | \n", "A | \n", "
310 | \n", "1.0 | \n", "866.000 | \n", "0.000 | \n", "866.0 | \n", "866.0 | \n", "343.500 | \n", "408.500 | \n", "343.500 | \n", "408.500 | \n", "1.128 | \n", "1.128 | \n", "0.000 | \n", "100 | \n", "A | \n", "
219 | \n", "1.0 | \n", "763.000 | \n", "0.000 | \n", "763.0 | \n", "763.0 | \n", "411.500 | \n", "296.500 | \n", "411.500 | \n", "296.500 | \n", "1.128 | \n", "1.128 | \n", "0.000 | \n", "100 | \n", "A | \n", "
3 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "608.0 | \n", "964.0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "7.665 | \n", "7.359 | \n", "NaN | \n", "101.121 | \n", "100 | \n", "A | \n", "
5 | \n", "NaN | \n", "NaN | \n", "69.438 | \n", "566.0 | \n", "792.0 | \n", "348.500 | \n", "7.500 | \n", "NaN | \n", "7.508 | \n", "NaN | \n", "3.088 | \n", "NaN | \n", "100 | \n", "A | \n", "
391 rows × 14 columns
\n", "\n", " | Area | \n", "Mean | \n", "StdDev | \n", "Min | \n", "Max | \n", "X | \n", "Y | \n", "XM | \n", "YM | \n", "Major | \n", "Minor | \n", "Angle | \n", "%Area | \n", "Type | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
1 | \n", "18.0 | \n", "730.389 | \n", "103.354 | \n", "592.0 | \n", "948.0 | \n", "435.000 | \n", "4.722 | \n", "434.962 | \n", "4.697 | \n", "5.987 | \n", "3.828 | \n", "168.425 | \n", "100 | \n", "A | \n", "
2 | \n", "126.0 | \n", "718.333 | \n", "90.367 | \n", "556.0 | \n", "1046.0 | \n", "388.087 | \n", "8.683 | \n", "388.183 | \n", "8.687 | \n", "16.559 | \n", "9.688 | \n", "175.471 | \n", "100 | \n", "A | \n", "
4 | \n", "68.0 | \n", "686.985 | \n", "61.169 | \n", "571.0 | \n", "880.0 | \n", "126.147 | \n", "8.809 | \n", "126.192 | \n", "8.811 | \n", "15.136 | \n", "5.720 | \n", "168.133 | \n", "100 | \n", "A | \n", "
6 | \n", "669.0 | \n", "697.164 | \n", "72.863 | \n", "539.0 | \n", "957.0 | \n", "471.696 | \n", "26.253 | \n", "471.694 | \n", "26.197 | \n", "36.656 | \n", "23.237 | \n", "124.340 | \n", "100 | \n", "A | \n", "
7 | \n", "5.0 | \n", "658.600 | \n", "49.161 | \n", "607.0 | \n", "710.0 | \n", "28.300 | \n", "8.100 | \n", "28.284 | \n", "8.103 | \n", "3.144 | \n", "2.025 | \n", "161.565 | \n", "100 | \n", "A | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
383 | \n", "94.0 | \n", "746.617 | \n", "85.198 | \n", "550.0 | \n", "1021.0 | \n", "194.032 | \n", "498.223 | \n", "194.014 | \n", "498.239 | \n", "17.295 | \n", "6.920 | \n", "52.720 | \n", "100 | \n", "B | \n", "
387 | \n", "152.0 | \n", "801.599 | \n", "111.328 | \n", "582.0 | \n", "1263.0 | \n", "348.487 | \n", "497.632 | \n", "348.451 | \n", "497.675 | \n", "17.773 | \n", "10.889 | \n", "11.829 | \n", "100 | \n", "A | \n", "
389 | \n", "60.0 | \n", "758.033 | \n", "77.309 | \n", "601.0 | \n", "947.0 | \n", "259.000 | \n", "499.300 | \n", "258.990 | \n", "499.289 | \n", "9.476 | \n", "8.062 | \n", "90.000 | \n", "100 | \n", "A | \n", "
390 | \n", "12.0 | \n", "714.833 | \n", "67.294 | \n", "551.0 | \n", "785.0 | \n", "240.167 | \n", "498.167 | \n", "240.179 | \n", "498.148 | \n", "4.606 | \n", "3.317 | \n", "168.690 | \n", "100 | \n", "A | \n", "
391 | \n", "23.0 | \n", "695.043 | \n", "67.356 | \n", "611.0 | \n", "846.0 | \n", "49.891 | \n", "503.022 | \n", "49.882 | \n", "502.979 | \n", "6.454 | \n", "4.537 | \n", "73.243 | \n", "100 | \n", "A | \n", "
374 rows × 14 columns
\n", "\n", " | A | \n", "B | \n", "C | \n", "
---|---|---|---|
0 | \n", "0 | \n", "2.0 | \n", "2.0 | \n", "
1 | \n", "1 | \n", "3.0 | \n", "3.0 | \n", "
2 | \n", "22 | \n", "NaN | \n", "44.0 | \n", "
3 | \n", "21 | \n", "2.0 | \n", "2.0 | \n", "
4 | \n", "12 | \n", "12.0 | \n", "NaN | \n", "
5 | \n", "23 | \n", "22.0 | \n", "52.0 | \n", "