{ "cells": [ { "cell_type": "markdown", "id": "99c606f8-037f-4258-81e7-a9f4ac511242", "metadata": {}, "source": [ "# Introduction to working with DataFrames\n", "In basic python, we often use dictionaries containing our measurements as vectors. While these basic structures are handy for collecting data, they are suboptimal for further data processing. For that we introduce [panda DataFrames](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) which are more handy in the next steps. In Python, scientists often call tables \"DataFrames\". " ] }, { "cell_type": "code", "execution_count": 1, "id": "0cfceb6c-1acc-4632-b084-8b0871a7c50a", "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "id": "8b77888b-c9a8-4a67-a4eb-f7df46eda970", "metadata": {}, "source": [ "## Creating DataFrames from a dictionary of lists\n", "Assume we did some image processing and have some results in available in a dictionary that contains lists of numbers:" ] }, { "cell_type": "code", "execution_count": 2, "id": "ff80484f-657b-4231-8d8f-cdc26577542b", "metadata": {}, "outputs": [], "source": [ "measurements = {\n", " \"labels\": [1, 2, 3],\n", " \"area\": [45, 23, 68],\n", " \"minor_axis\": [2, 4, 4],\n", " \"major_axis\": [3, 4, 5],\n", "}" ] }, { "cell_type": "markdown", "id": "b2afa6a9-e15c-4147-bdd4-ec4d4f87fb36", "metadata": {}, "source": [ "This data structure can be nicely visualized using a DataFrame:" ] }, { "cell_type": "code", "execution_count": 3, "id": "8bf4e4b5-ef72-4f63-84d2-48cc3a77c297", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelsareaminor_axismajor_axis
014523
122344
236845
\n", "
" ], "text/plain": [ " labels area minor_axis major_axis\n", "0 1 45 2 3\n", "1 2 23 4 4\n", "2 3 68 4 5" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(measurements)\n", "df" ] }, { "cell_type": "markdown", "id": "930c082b-8f16-4711-b3e0-e56a7ec6d272", "metadata": {}, "source": [ "Using these DataFrames, data modification is straighforward. For example one can append a new column and compute its values from existing columns:" ] }, { "cell_type": "code", "execution_count": 4, "id": "a34866ff-a2cb-4a7c-a4e8-4544559b634c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelsareaminor_axismajor_axisaspect_ratio
0145231.50
1223441.00
2368451.25
\n", "
" ], "text/plain": [ " labels area minor_axis major_axis aspect_ratio\n", "0 1 45 2 3 1.50\n", "1 2 23 4 4 1.00\n", "2 3 68 4 5 1.25" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[\"aspect_ratio\"] = df[\"major_axis\"] / df[\"minor_axis\"]\n", "df" ] }, { "cell_type": "markdown", "id": "201a2142-22c7-4607-bc2d-f1dfce4c7e26", "metadata": {}, "source": [ "## Saving data frames\n", "We can also save this table for continuing to work with it." ] }, { "cell_type": "code", "execution_count": 5, "id": "fb01d2d9-4d8b-4b6a-b158-9516a581e000", "metadata": {}, "outputs": [], "source": [ "df.to_csv(\"../../data/short_table.csv\")" ] }, { "cell_type": "markdown", "id": "0240857d-292f-4ac3-ba87-8878aa941cde", "metadata": {}, "source": [ "## Creating DataFrames from lists of lists\n", "Sometimes, we are confronted to data in form of lists of lists. To make pandas understand that form of data correctly, we also need to provide the headers in the same order as the lists" ] }, { "cell_type": "code", "execution_count": 6, "id": "c72a82b1-4da6-468d-afa6-149cb00f7d37", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012
labels123
area452368
minor_axis244
major_axis345
\n", "
" ], "text/plain": [ " 0 1 2\n", "labels 1 2 3\n", "area 45 23 68\n", "minor_axis 2 4 4\n", "major_axis 3 4 5" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "header = ['labels', 'area', 'minor_axis', 'major_axis']\n", "\n", "data = [\n", " [1, 2, 3],\n", " [45, 23, 68],\n", " [2, 4, 4],\n", " [3, 4, 5],\n", "]\n", " \n", "# convert the data and header arrays in a pandas data frame\n", "data_frame = pd.DataFrame(data, header)\n", "\n", "# show it\n", "data_frame" ] }, { "cell_type": "markdown", "id": "a8b1b6b0-027c-4536-8710-e3f87aca1896", "metadata": {}, "source": [ "As you can see, this tabls is _rotated_. We can bring it in the usual form like this:" ] }, { "cell_type": "code", "execution_count": 7, "id": "40669e82-4264-4883-9c4e-8a366b061610", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelsareaminor_axismajor_axis
014523
122344
236845
\n", "
" ], "text/plain": [ " labels area minor_axis major_axis\n", "0 1 45 2 3\n", "1 2 23 4 4\n", "2 3 68 4 5" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# rotate/flip it\n", "data_frame = data_frame.transpose()\n", "\n", "# show it\n", "data_frame" ] }, { "cell_type": "markdown", "id": "ccf08662-fccf-4dc1-91c2-3365fa85a96b", "metadata": {}, "source": [ "## Loading data frames\n", "Tables can also be read from CSV files." ] }, { "cell_type": "code", "execution_count": 8, "id": "aa7c74db-68ab-4004-aa5e-01ba1ad88c79", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0areamean_intensityminor_axis_lengthmajor_axis_lengtheccentricityextentferet_diameter_maxequivalent_diameter_areabbox-0bbox-1bbox-2bbox-3
00422192.37914716.48855034.5667890.8789000.58611135.22783023.1798850113035
11182180.13186811.73607420.8026970.8256650.78787921.37755815.2226670531174
22661205.21633928.40950230.2084330.3399340.87433932.75667929.01053809528122
33437216.58581223.14399624.6061300.3395760.82608726.92582423.588253014423167
44476212.30252119.85288231.0751060.7693170.86388431.38471024.618327023729256
..........................................
5656211185.06161114.52276218.4891380.6188930.78148118.97366616.3906542323925054
575778185.2307696.02863817.5797990.9393610.72222218.0277569.965575248170254188
585886183.7209305.42687121.2614270.9668760.78181822.00000010.464158249117254139
595951190.4313735.03241413.7420790.9305340.72857114.0356698.058239249228254242
606046175.3043483.80398215.9487140.9711390.76666715.0332967.6530402506725482
\n", "

61 rows × 13 columns

\n", "
" ], "text/plain": [ " Unnamed: 0 area mean_intensity minor_axis_length major_axis_length \\\n", "0 0 422 192.379147 16.488550 34.566789 \n", "1 1 182 180.131868 11.736074 20.802697 \n", "2 2 661 205.216339 28.409502 30.208433 \n", "3 3 437 216.585812 23.143996 24.606130 \n", "4 4 476 212.302521 19.852882 31.075106 \n", ".. ... ... ... ... ... \n", "56 56 211 185.061611 14.522762 18.489138 \n", "57 57 78 185.230769 6.028638 17.579799 \n", "58 58 86 183.720930 5.426871 21.261427 \n", "59 59 51 190.431373 5.032414 13.742079 \n", "60 60 46 175.304348 3.803982 15.948714 \n", "\n", " eccentricity extent feret_diameter_max equivalent_diameter_area \\\n", "0 0.878900 0.586111 35.227830 23.179885 \n", "1 0.825665 0.787879 21.377558 15.222667 \n", "2 0.339934 0.874339 32.756679 29.010538 \n", "3 0.339576 0.826087 26.925824 23.588253 \n", "4 0.769317 0.863884 31.384710 24.618327 \n", ".. ... ... ... ... \n", "56 0.618893 0.781481 18.973666 16.390654 \n", "57 0.939361 0.722222 18.027756 9.965575 \n", "58 0.966876 0.781818 22.000000 10.464158 \n", "59 0.930534 0.728571 14.035669 8.058239 \n", "60 0.971139 0.766667 15.033296 7.653040 \n", "\n", " bbox-0 bbox-1 bbox-2 bbox-3 \n", "0 0 11 30 35 \n", "1 0 53 11 74 \n", "2 0 95 28 122 \n", "3 0 144 23 167 \n", "4 0 237 29 256 \n", ".. ... ... ... ... \n", "56 232 39 250 54 \n", "57 248 170 254 188 \n", "58 249 117 254 139 \n", "59 249 228 254 242 \n", "60 250 67 254 82 \n", "\n", "[61 rows x 13 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_csv = pd.read_csv('../../data/blobs_statistics.csv')\n", "df_csv" ] }, { "cell_type": "markdown", "id": "01732b57-35d9-4b25-9c1b-d322487d2757", "metadata": {}, "source": [ "Typically, we don't need all the information in these tables and thus, it makes sense to reduce the table. For that, we print out the column names first." ] }, { "cell_type": "code", "execution_count": 9, "id": "cc7d6cbe-6487-49a6-84b2-e837f7070f25", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['Unnamed: 0', 'area', 'mean_intensity', 'minor_axis_length',\n", " 'major_axis_length', 'eccentricity', 'extent', 'feret_diameter_max',\n", " 'equivalent_diameter_area', 'bbox-0', 'bbox-1', 'bbox-2', 'bbox-3'],\n", " dtype='object')" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_csv.keys()" ] }, { "cell_type": "markdown", "id": "ff187a52-9fc0-4f6f-b143-f872dfe620c2", "metadata": {}, "source": [ "## Selecting columns\n", "We can then copy&paste the colum names we're interested in and create a new data frame. This is recommended especially when tables are overwhelmingly large." ] }, { "cell_type": "code", "execution_count": 18, "id": "b1f03533-e9d0-4880-af3f-c9766df56f29", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
areamean_intensity
0422192.379147
1182180.131868
2661205.216339
3437216.585812
4476212.302521
.........
56211185.061611
5778185.230769
5886183.720930
5951190.431373
6046175.304348
\n", "

61 rows × 2 columns

\n", "
" ], "text/plain": [ " area mean_intensity\n", "0 422 192.379147\n", "1 182 180.131868\n", "2 661 205.216339\n", "3 437 216.585812\n", "4 476 212.302521\n", ".. ... ...\n", "56 211 185.061611\n", "57 78 185.230769\n", "58 86 183.720930\n", "59 51 190.431373\n", "60 46 175.304348\n", "\n", "[61 rows x 2 columns]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_analysis = df_csv[['area', 'mean_intensity']]\n", "df_analysis" ] }, { "cell_type": "markdown", "id": "6fe65e75-8003-4175-96c6-21cb26eb5d31", "metadata": {}, "source": [ "## Selecting rows\n", "In case we want to focus our further analysis on cells that have a certain minimum area. We can do this by selecting rows. The process is also sometimes call masking." ] }, { "cell_type": "code", "execution_count": 16, "id": "a4eadd9b-e287-4ca8-b1ff-d1278c24151c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
areamean_intensitytotal_intensity
0422192.37914781184.0
1182180.13186832784.0
2661205.216339135648.0
3437216.58581294648.0
4476212.302521101056.0
5277206.46931457192.0
6259178.00772246104.0
7219191.59817441960.0
867167.52238811224.0
10486190.94650292800.0
11630173.600000109368.0
12221197.93665243744.0
1378173.12820513504.0
14449208.76614793736.0
15516194.403101100312.0
16390180.77948770504.0
17419196.84964282480.0
18267200.95880153656.0
19353189.77903766992.0
20151186.22516628120.0
21400187.96000075184.0
22426201.57746585872.0
23246182.11382144800.0
24503198.64811199920.0
25278190.18705052872.0
26681198.308370135048.0
27176195.27272734368.0
28358197.78770970808.0
29544198.455882107960.0
30597190.954774114000.0
31181184.88397833464.0
32629193.666137121816.0
33596210.067114125200.0
35263190.02281449976.0
36899198.291435178264.0
37476204.92437097544.0
38233193.16738245008.0
39164184.63414630280.0
40394181.40101571472.0
41411200.25304182304.0
42235189.14042644448.0
43375195.49866773312.0
44654199.706422130608.0
45376208.63829878448.0
46579200.649396116176.0
4764190.25000012176.0
48161183.95031129616.0
49457168.21006676872.0
50625217.894400136184.0
51535189.936449101616.0
52205199.18048840832.0
53562215.928826121352.0
54845198.295858167560.0
55280189.80000053144.0
56211185.06161139048.0
5778185.23076914448.0
5886183.72093015800.0
5951190.4313739712.0
\n", "
" ], "text/plain": [ " area mean_intensity total_intensity\n", "0 422 192.379147 81184.0\n", "1 182 180.131868 32784.0\n", "2 661 205.216339 135648.0\n", "3 437 216.585812 94648.0\n", "4 476 212.302521 101056.0\n", "5 277 206.469314 57192.0\n", "6 259 178.007722 46104.0\n", "7 219 191.598174 41960.0\n", "8 67 167.522388 11224.0\n", "10 486 190.946502 92800.0\n", "11 630 173.600000 109368.0\n", "12 221 197.936652 43744.0\n", "13 78 173.128205 13504.0\n", "14 449 208.766147 93736.0\n", "15 516 194.403101 100312.0\n", "16 390 180.779487 70504.0\n", "17 419 196.849642 82480.0\n", "18 267 200.958801 53656.0\n", "19 353 189.779037 66992.0\n", "20 151 186.225166 28120.0\n", "21 400 187.960000 75184.0\n", "22 426 201.577465 85872.0\n", "23 246 182.113821 44800.0\n", "24 503 198.648111 99920.0\n", "25 278 190.187050 52872.0\n", "26 681 198.308370 135048.0\n", "27 176 195.272727 34368.0\n", "28 358 197.787709 70808.0\n", "29 544 198.455882 107960.0\n", "30 597 190.954774 114000.0\n", "31 181 184.883978 33464.0\n", "32 629 193.666137 121816.0\n", "33 596 210.067114 125200.0\n", "35 263 190.022814 49976.0\n", "36 899 198.291435 178264.0\n", "37 476 204.924370 97544.0\n", "38 233 193.167382 45008.0\n", "39 164 184.634146 30280.0\n", "40 394 181.401015 71472.0\n", "41 411 200.253041 82304.0\n", "42 235 189.140426 44448.0\n", "43 375 195.498667 73312.0\n", "44 654 199.706422 130608.0\n", "45 376 208.638298 78448.0\n", "46 579 200.649396 116176.0\n", "47 64 190.250000 12176.0\n", "48 161 183.950311 29616.0\n", "49 457 168.210066 76872.0\n", "50 625 217.894400 136184.0\n", "51 535 189.936449 101616.0\n", "52 205 199.180488 40832.0\n", "53 562 215.928826 121352.0\n", "54 845 198.295858 167560.0\n", "55 280 189.800000 53144.0\n", "56 211 185.061611 39048.0\n", "57 78 185.230769 14448.0\n", "58 86 183.720930 15800.0\n", "59 51 190.431373 9712.0" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "selected_data = df_analysis[ df_analysis[\"area\"] > 50]\n", "selected_data" ] }, { "cell_type": "markdown", "id": "64eb1086-ebc8-4905-afc2-ed0dc01620b9", "metadata": {}, "source": [ "## Adding new columns\n", "You can then access columns and add new columns." ] }, { "cell_type": "code", "execution_count": 15, "id": "402892eb-b1ea-4f11-b272-9c44207f7991", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\rober\\AppData\\Local\\Temp\\ipykernel_25216\\206920941.py:1: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df_analysis['total_intensity'] = df_analysis['area'] * df_analysis['mean_intensity']\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
areamean_intensitytotal_intensity
0422192.37914781184.0
1182180.13186832784.0
2661205.216339135648.0
3437216.58581294648.0
4476212.302521101056.0
............
56211185.06161139048.0
5778185.23076914448.0
5886183.72093015800.0
5951190.4313739712.0
6046175.3043488064.0
\n", "

61 rows × 3 columns

\n", "
" ], "text/plain": [ " area mean_intensity total_intensity\n", "0 422 192.379147 81184.0\n", "1 182 180.131868 32784.0\n", "2 661 205.216339 135648.0\n", "3 437 216.585812 94648.0\n", "4 476 212.302521 101056.0\n", ".. ... ... ...\n", "56 211 185.061611 39048.0\n", "57 78 185.230769 14448.0\n", "58 86 183.720930 15800.0\n", "59 51 190.431373 9712.0\n", "60 46 175.304348 8064.0\n", "\n", "[61 rows x 3 columns]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_analysis['total_intensity'] = df_analysis['area'] * df_analysis['mean_intensity']\n", "df_analysis" ] }, { "cell_type": "markdown", "id": "9db24255-2290-4e83-ac74-93d780378175", "metadata": {}, "source": [ "## Exercise\n", "From the loaded CSV file, create a table that only contains these columns:\n", "* `minor_axis_length`\n", "* `major_axis_length`\n", "* `aspect_ratio`" ] }, { "cell_type": "code", "execution_count": null, "id": "87f226cd-721b-43e3-a31a-faed5e8a6733", "metadata": {}, "outputs": [], "source": [ "df_shape = pd.read_csv('../../data/blobs_statistics.csv')\n", "df_shape" ] }, { "cell_type": "code", "execution_count": null, "id": "f0254fc9-b321-4a4a-be35-7d17216bb517", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 5 }