{
"cells": [
{
"cell_type": "markdown",
"id": "99c606f8-037f-4258-81e7-a9f4ac511242",
"metadata": {},
"source": [
"# Introduction to working with DataFrames\n",
"In basic python, we often use dictionaries containing our measurements as vectors. While these basic structures are handy for collecting data, they are suboptimal for further data processing. For that, we introduce [panda DataFrames](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), the primary tool in the Python ecosystem for handling data. Its primary object, the \"DataFrame\" is extremely useful in wrangling data. which are more handy in the next steps."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "0cfceb6c-1acc-4632-b084-8b0871a7c50a",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"id": "8b77888b-c9a8-4a67-a4eb-f7df46eda970",
"metadata": {},
"source": [
"## Creating DataFrames from a dictionary of lists\n",
"Assume we did some image processing and have some results available in a dictionary that contains lists of numbers:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "ff80484f-657b-4231-8d8f-cdc26577542b",
"metadata": {},
"outputs": [],
"source": [
"measurements = {\n",
" \"labels\": [1, 2, 3],\n",
" \"area\": [45, 23, 68],\n",
" \"minor_axis\": [2, 4, 4],\n",
" \"major_axis\": [3, 4, 5],\n",
"}"
]
},
{
"cell_type": "markdown",
"id": "b2afa6a9-e15c-4147-bdd4-ec4d4f87fb36",
"metadata": {},
"source": [
"This data structure can be nicely visualized using a DataFrame:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "8bf4e4b5-ef72-4f63-84d2-48cc3a77c297",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" labels | \n",
" area | \n",
" minor_axis | \n",
" major_axis | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 45 | \n",
" 2 | \n",
" 3 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 23 | \n",
" 4 | \n",
" 4 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 68 | \n",
" 4 | \n",
" 5 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" labels area minor_axis major_axis\n",
"0 1 45 2 3\n",
"1 2 23 4 4\n",
"2 3 68 4 5"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame(measurements)\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "930c082b-8f16-4711-b3e0-e56a7ec6d272",
"metadata": {},
"source": [
"Using these DataFrames, data modification is straighforward. For example one can append a new column and compute its values from existing columns. This is done elementwise."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "a34866ff-a2cb-4a7c-a4e8-4544559b634c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" labels | \n",
" area | \n",
" minor_axis | \n",
" major_axis | \n",
" aspect_ratio | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 45 | \n",
" 2 | \n",
" 3 | \n",
" 1.50 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 23 | \n",
" 4 | \n",
" 4 | \n",
" 1.00 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 68 | \n",
" 4 | \n",
" 5 | \n",
" 1.25 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" labels area minor_axis major_axis aspect_ratio\n",
"0 1 45 2 3 1.50\n",
"1 2 23 4 4 1.00\n",
"2 3 68 4 5 1.25"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[\"aspect_ratio\"] = df[\"major_axis\"] / df[\"minor_axis\"]\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "201a2142-22c7-4607-bc2d-f1dfce4c7e26",
"metadata": {},
"source": [
"## Saving data frames\n",
"We can also save this table for continuing to work with it. We chose to save it as a CSV file, where CSV stands for comma-separated value. This is a text file that is easily read into data structures in many programming languages."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "fb01d2d9-4d8b-4b6a-b158-9516a581e000",
"metadata": {},
"outputs": [],
"source": [
"df.to_csv(\"../../data/short_table.csv\")"
]
},
{
"cell_type": "markdown",
"id": "2677e5a9-8d1b-4454-b009-4ac26e549b2d",
"metadata": {},
"source": [
"You should generally always store your data in such a format, not necessarily CSV, but a format that is open, has a well-defined specification, and is readable in many contexts. Excel files do not meet these criteria. Neither do .mat files."
]
},
{
"cell_type": "markdown",
"id": "0240857d-292f-4ac3-ba87-8878aa941cde",
"metadata": {},
"source": [
"## Creating DataFrames from lists of lists\n",
"Sometimes, we are confronted to data in form of lists of lists. To make pandas understand that form of data correctly, we also need to provide the headers in the same order as the lists"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "c72a82b1-4da6-468d-afa6-149cb00f7d37",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
"
\n",
" \n",
" \n",
" \n",
" labels | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
"
\n",
" \n",
" area | \n",
" 45 | \n",
" 23 | \n",
" 68 | \n",
"
\n",
" \n",
" minor_axis | \n",
" 2 | \n",
" 4 | \n",
" 4 | \n",
"
\n",
" \n",
" major_axis | \n",
" 3 | \n",
" 4 | \n",
" 5 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2\n",
"labels 1 2 3\n",
"area 45 23 68\n",
"minor_axis 2 4 4\n",
"major_axis 3 4 5"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"header = ['labels', 'area', 'minor_axis', 'major_axis']\n",
"\n",
"data = [\n",
" [1, 2, 3],\n",
" [45, 23, 68],\n",
" [2, 4, 4],\n",
" [3, 4, 5],\n",
"]\n",
" \n",
"# convert the data and header arrays in a pandas data frame\n",
"data_frame = pd.DataFrame(data, header)\n",
"\n",
"# show it\n",
"data_frame"
]
},
{
"cell_type": "markdown",
"id": "a8b1b6b0-027c-4536-8710-e3f87aca1896",
"metadata": {},
"source": [
"As you can see, this tabls is _rotated_. We can bring it in the usual form like this:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "40669e82-4264-4883-9c4e-8a366b061610",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" labels | \n",
" area | \n",
" minor_axis | \n",
" major_axis | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 45 | \n",
" 2 | \n",
" 3 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 23 | \n",
" 4 | \n",
" 4 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 68 | \n",
" 4 | \n",
" 5 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" labels area minor_axis major_axis\n",
"0 1 45 2 3\n",
"1 2 23 4 4\n",
"2 3 68 4 5"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# rotate/flip it\n",
"data_frame = data_frame.transpose()\n",
"\n",
"# show it\n",
"data_frame"
]
},
{
"cell_type": "markdown",
"id": "ccf08662-fccf-4dc1-91c2-3365fa85a96b",
"metadata": {},
"source": [
"## Loading data frames\n",
"Tables can be read from CSV files with [pd.read_csv](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html?highlight=read_csv#pandas.read_csv)."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "aa7c74db-68ab-4004-aa5e-01ba1ad88c79",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Unnamed: 0 | \n",
" area | \n",
" mean_intensity | \n",
" minor_axis_length | \n",
" major_axis_length | \n",
" eccentricity | \n",
" extent | \n",
" feret_diameter_max | \n",
" equivalent_diameter_area | \n",
" bbox-0 | \n",
" bbox-1 | \n",
" bbox-2 | \n",
" bbox-3 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 422 | \n",
" 192.379147 | \n",
" 16.488550 | \n",
" 34.566789 | \n",
" 0.878900 | \n",
" 0.586111 | \n",
" 35.227830 | \n",
" 23.179885 | \n",
" 0 | \n",
" 11 | \n",
" 30 | \n",
" 35 | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 182 | \n",
" 180.131868 | \n",
" 11.736074 | \n",
" 20.802697 | \n",
" 0.825665 | \n",
" 0.787879 | \n",
" 21.377558 | \n",
" 15.222667 | \n",
" 0 | \n",
" 53 | \n",
" 11 | \n",
" 74 | \n",
"
\n",
" \n",
" 2 | \n",
" 2 | \n",
" 661 | \n",
" 205.216339 | \n",
" 28.409502 | \n",
" 30.208433 | \n",
" 0.339934 | \n",
" 0.874339 | \n",
" 32.756679 | \n",
" 29.010538 | \n",
" 0 | \n",
" 95 | \n",
" 28 | \n",
" 122 | \n",
"
\n",
" \n",
" 3 | \n",
" 3 | \n",
" 437 | \n",
" 216.585812 | \n",
" 23.143996 | \n",
" 24.606130 | \n",
" 0.339576 | \n",
" 0.826087 | \n",
" 26.925824 | \n",
" 23.588253 | \n",
" 0 | \n",
" 144 | \n",
" 23 | \n",
" 167 | \n",
"
\n",
" \n",
" 4 | \n",
" 4 | \n",
" 476 | \n",
" 212.302521 | \n",
" 19.852882 | \n",
" 31.075106 | \n",
" 0.769317 | \n",
" 0.863884 | \n",
" 31.384710 | \n",
" 24.618327 | \n",
" 0 | \n",
" 237 | \n",
" 29 | \n",
" 256 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 56 | \n",
" 56 | \n",
" 211 | \n",
" 185.061611 | \n",
" 14.522762 | \n",
" 18.489138 | \n",
" 0.618893 | \n",
" 0.781481 | \n",
" 18.973666 | \n",
" 16.390654 | \n",
" 232 | \n",
" 39 | \n",
" 250 | \n",
" 54 | \n",
"
\n",
" \n",
" 57 | \n",
" 57 | \n",
" 78 | \n",
" 185.230769 | \n",
" 6.028638 | \n",
" 17.579799 | \n",
" 0.939361 | \n",
" 0.722222 | \n",
" 18.027756 | \n",
" 9.965575 | \n",
" 248 | \n",
" 170 | \n",
" 254 | \n",
" 188 | \n",
"
\n",
" \n",
" 58 | \n",
" 58 | \n",
" 86 | \n",
" 183.720930 | \n",
" 5.426871 | \n",
" 21.261427 | \n",
" 0.966876 | \n",
" 0.781818 | \n",
" 22.000000 | \n",
" 10.464158 | \n",
" 249 | \n",
" 117 | \n",
" 254 | \n",
" 139 | \n",
"
\n",
" \n",
" 59 | \n",
" 59 | \n",
" 51 | \n",
" 190.431373 | \n",
" 5.032414 | \n",
" 13.742079 | \n",
" 0.930534 | \n",
" 0.728571 | \n",
" 14.035669 | \n",
" 8.058239 | \n",
" 249 | \n",
" 228 | \n",
" 254 | \n",
" 242 | \n",
"
\n",
" \n",
" 60 | \n",
" 60 | \n",
" 46 | \n",
" 175.304348 | \n",
" 3.803982 | \n",
" 15.948714 | \n",
" 0.971139 | \n",
" 0.766667 | \n",
" 15.033296 | \n",
" 7.653040 | \n",
" 250 | \n",
" 67 | \n",
" 254 | \n",
" 82 | \n",
"
\n",
" \n",
"
\n",
"
61 rows × 13 columns
\n",
"
"
],
"text/plain": [
" Unnamed: 0 area mean_intensity minor_axis_length major_axis_length \\\n",
"0 0 422 192.379147 16.488550 34.566789 \n",
"1 1 182 180.131868 11.736074 20.802697 \n",
"2 2 661 205.216339 28.409502 30.208433 \n",
"3 3 437 216.585812 23.143996 24.606130 \n",
"4 4 476 212.302521 19.852882 31.075106 \n",
".. ... ... ... ... ... \n",
"56 56 211 185.061611 14.522762 18.489138 \n",
"57 57 78 185.230769 6.028638 17.579799 \n",
"58 58 86 183.720930 5.426871 21.261427 \n",
"59 59 51 190.431373 5.032414 13.742079 \n",
"60 60 46 175.304348 3.803982 15.948714 \n",
"\n",
" eccentricity extent feret_diameter_max equivalent_diameter_area \\\n",
"0 0.878900 0.586111 35.227830 23.179885 \n",
"1 0.825665 0.787879 21.377558 15.222667 \n",
"2 0.339934 0.874339 32.756679 29.010538 \n",
"3 0.339576 0.826087 26.925824 23.588253 \n",
"4 0.769317 0.863884 31.384710 24.618327 \n",
".. ... ... ... ... \n",
"56 0.618893 0.781481 18.973666 16.390654 \n",
"57 0.939361 0.722222 18.027756 9.965575 \n",
"58 0.966876 0.781818 22.000000 10.464158 \n",
"59 0.930534 0.728571 14.035669 8.058239 \n",
"60 0.971139 0.766667 15.033296 7.653040 \n",
"\n",
" bbox-0 bbox-1 bbox-2 bbox-3 \n",
"0 0 11 30 35 \n",
"1 0 53 11 74 \n",
"2 0 95 28 122 \n",
"3 0 144 23 167 \n",
"4 0 237 29 256 \n",
".. ... ... ... ... \n",
"56 232 39 250 54 \n",
"57 248 170 254 188 \n",
"58 249 117 254 139 \n",
"59 249 228 254 242 \n",
"60 250 67 254 82 \n",
"\n",
"[61 rows x 13 columns]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_csv = pd.read_csv('../../data/blobs_statistics.csv')\n",
"df_csv"
]
},
{
"cell_type": "markdown",
"id": "da74dce2-609a-4e52-bc16-77ddf45efc98",
"metadata": {},
"source": [
"That's a bit too much information. We can use the `.head()` method of data frames to look at the first few rows (or the `.tail()` to check the last rows)."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "08d22b50-f860-428f-9edc-41c749ba5ae7",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Unnamed: 0 | \n",
" area | \n",
" mean_intensity | \n",
" minor_axis_length | \n",
" major_axis_length | \n",
" eccentricity | \n",
" extent | \n",
" feret_diameter_max | \n",
" equivalent_diameter_area | \n",
" bbox-0 | \n",
" bbox-1 | \n",
" bbox-2 | \n",
" bbox-3 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 422 | \n",
" 192.379147 | \n",
" 16.488550 | \n",
" 34.566789 | \n",
" 0.878900 | \n",
" 0.586111 | \n",
" 35.227830 | \n",
" 23.179885 | \n",
" 0 | \n",
" 11 | \n",
" 30 | \n",
" 35 | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 182 | \n",
" 180.131868 | \n",
" 11.736074 | \n",
" 20.802697 | \n",
" 0.825665 | \n",
" 0.787879 | \n",
" 21.377558 | \n",
" 15.222667 | \n",
" 0 | \n",
" 53 | \n",
" 11 | \n",
" 74 | \n",
"
\n",
" \n",
" 2 | \n",
" 2 | \n",
" 661 | \n",
" 205.216339 | \n",
" 28.409502 | \n",
" 30.208433 | \n",
" 0.339934 | \n",
" 0.874339 | \n",
" 32.756679 | \n",
" 29.010538 | \n",
" 0 | \n",
" 95 | \n",
" 28 | \n",
" 122 | \n",
"
\n",
" \n",
" 3 | \n",
" 3 | \n",
" 437 | \n",
" 216.585812 | \n",
" 23.143996 | \n",
" 24.606130 | \n",
" 0.339576 | \n",
" 0.826087 | \n",
" 26.925824 | \n",
" 23.588253 | \n",
" 0 | \n",
" 144 | \n",
" 23 | \n",
" 167 | \n",
"
\n",
" \n",
" 4 | \n",
" 4 | \n",
" 476 | \n",
" 212.302521 | \n",
" 19.852882 | \n",
" 31.075106 | \n",
" 0.769317 | \n",
" 0.863884 | \n",
" 31.384710 | \n",
" 24.618327 | \n",
" 0 | \n",
" 237 | \n",
" 29 | \n",
" 256 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Unnamed: 0 area mean_intensity minor_axis_length major_axis_length \\\n",
"0 0 422 192.379147 16.488550 34.566789 \n",
"1 1 182 180.131868 11.736074 20.802697 \n",
"2 2 661 205.216339 28.409502 30.208433 \n",
"3 3 437 216.585812 23.143996 24.606130 \n",
"4 4 476 212.302521 19.852882 31.075106 \n",
"\n",
" eccentricity extent feret_diameter_max equivalent_diameter_area \\\n",
"0 0.878900 0.586111 35.227830 23.179885 \n",
"1 0.825665 0.787879 21.377558 15.222667 \n",
"2 0.339934 0.874339 32.756679 29.010538 \n",
"3 0.339576 0.826087 26.925824 23.588253 \n",
"4 0.769317 0.863884 31.384710 24.618327 \n",
"\n",
" bbox-0 bbox-1 bbox-2 bbox-3 \n",
"0 0 11 30 35 \n",
"1 0 53 11 74 \n",
"2 0 95 28 122 \n",
"3 0 144 23 167 \n",
"4 0 237 29 256 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_csv.head()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "c3c35f55-146f-4cad-bda6-3d83b049a8f9",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Unnamed: 0 | \n",
" area | \n",
" mean_intensity | \n",
" minor_axis_length | \n",
" major_axis_length | \n",
" eccentricity | \n",
" extent | \n",
" feret_diameter_max | \n",
" equivalent_diameter_area | \n",
" bbox-0 | \n",
" bbox-1 | \n",
" bbox-2 | \n",
" bbox-3 | \n",
"
\n",
" \n",
" \n",
" \n",
" 56 | \n",
" 56 | \n",
" 211 | \n",
" 185.061611 | \n",
" 14.522762 | \n",
" 18.489138 | \n",
" 0.618893 | \n",
" 0.781481 | \n",
" 18.973666 | \n",
" 16.390654 | \n",
" 232 | \n",
" 39 | \n",
" 250 | \n",
" 54 | \n",
"
\n",
" \n",
" 57 | \n",
" 57 | \n",
" 78 | \n",
" 185.230769 | \n",
" 6.028638 | \n",
" 17.579799 | \n",
" 0.939361 | \n",
" 0.722222 | \n",
" 18.027756 | \n",
" 9.965575 | \n",
" 248 | \n",
" 170 | \n",
" 254 | \n",
" 188 | \n",
"
\n",
" \n",
" 58 | \n",
" 58 | \n",
" 86 | \n",
" 183.720930 | \n",
" 5.426871 | \n",
" 21.261427 | \n",
" 0.966876 | \n",
" 0.781818 | \n",
" 22.000000 | \n",
" 10.464158 | \n",
" 249 | \n",
" 117 | \n",
" 254 | \n",
" 139 | \n",
"
\n",
" \n",
" 59 | \n",
" 59 | \n",
" 51 | \n",
" 190.431373 | \n",
" 5.032414 | \n",
" 13.742079 | \n",
" 0.930534 | \n",
" 0.728571 | \n",
" 14.035669 | \n",
" 8.058239 | \n",
" 249 | \n",
" 228 | \n",
" 254 | \n",
" 242 | \n",
"
\n",
" \n",
" 60 | \n",
" 60 | \n",
" 46 | \n",
" 175.304348 | \n",
" 3.803982 | \n",
" 15.948714 | \n",
" 0.971139 | \n",
" 0.766667 | \n",
" 15.033296 | \n",
" 7.653040 | \n",
" 250 | \n",
" 67 | \n",
" 254 | \n",
" 82 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Unnamed: 0 area mean_intensity minor_axis_length major_axis_length \\\n",
"56 56 211 185.061611 14.522762 18.489138 \n",
"57 57 78 185.230769 6.028638 17.579799 \n",
"58 58 86 183.720930 5.426871 21.261427 \n",
"59 59 51 190.431373 5.032414 13.742079 \n",
"60 60 46 175.304348 3.803982 15.948714 \n",
"\n",
" eccentricity extent feret_diameter_max equivalent_diameter_area \\\n",
"56 0.618893 0.781481 18.973666 16.390654 \n",
"57 0.939361 0.722222 18.027756 9.965575 \n",
"58 0.966876 0.781818 22.000000 10.464158 \n",
"59 0.930534 0.728571 14.035669 8.058239 \n",
"60 0.971139 0.766667 15.033296 7.653040 \n",
"\n",
" bbox-0 bbox-1 bbox-2 bbox-3 \n",
"56 232 39 250 54 \n",
"57 248 170 254 188 \n",
"58 249 117 254 139 \n",
"59 249 228 254 242 \n",
"60 250 67 254 82 "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_csv.tail()"
]
},
{
"cell_type": "markdown",
"id": "01732b57-35d9-4b25-9c1b-d322487d2757",
"metadata": {},
"source": [
"We can also get column names with the DataFrame attribute `columns`."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "cc7d6cbe-6487-49a6-84b2-e837f7070f25",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Unnamed: 0', 'area', 'mean_intensity', 'minor_axis_length',\n",
" 'major_axis_length', 'eccentricity', 'extent', 'feret_diameter_max',\n",
" 'equivalent_diameter_area', 'bbox-0', 'bbox-1', 'bbox-2', 'bbox-3'],\n",
" dtype='object')"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_csv.columns"
]
},
{
"cell_type": "markdown",
"id": "ff187a52-9fc0-4f6f-b143-f872dfe620c2",
"metadata": {},
"source": [
"## Selecting rows and columns"
]
},
{
"cell_type": "markdown",
"id": "19fe76e8-5ea4-40c5-8bcd-1f9e12c9d85f",
"metadata": {},
"source": [
"Ok, let's get the dataframe first row:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "932789a8-b9f8-4306-87c4-37a5f12bcb2b",
"metadata": {},
"outputs": [
{
"ename": "KeyError",
"evalue": "0",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mKeyError\u001b[0m Traceback (most recent call last)",
"File \u001b[1;32m~\\anaconda3\\envs\\devbio-napari-env\\lib\\site-packages\\pandas\\core\\indexes\\base.py:3800\u001b[0m, in \u001b[0;36mIndex.get_loc\u001b[1;34m(self, key, method, tolerance)\u001b[0m\n\u001b[0;32m 3799\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m-> 3800\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_engine\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget_loc\u001b[49m\u001b[43m(\u001b[49m\u001b[43mcasted_key\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 3801\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m err:\n",
"File \u001b[1;32m~\\anaconda3\\envs\\devbio-napari-env\\lib\\site-packages\\pandas\\_libs\\index.pyx:138\u001b[0m, in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[1;34m()\u001b[0m\n",
"File \u001b[1;32m~\\anaconda3\\envs\\devbio-napari-env\\lib\\site-packages\\pandas\\_libs\\index.pyx:165\u001b[0m, in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[1;34m()\u001b[0m\n",
"File \u001b[1;32mpandas\\_libs\\hashtable_class_helper.pxi:5745\u001b[0m, in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[1;34m()\u001b[0m\n",
"File \u001b[1;32mpandas\\_libs\\hashtable_class_helper.pxi:5753\u001b[0m, in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[1;34m()\u001b[0m\n",
"\u001b[1;31mKeyError\u001b[0m: 0",
"\nThe above exception was the direct cause of the following exception:\n",
"\u001b[1;31mKeyError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[1;32mIn [12], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m df_csv[\u001b[38;5;241m0\u001b[39m]\n",
"File \u001b[1;32m~\\anaconda3\\envs\\devbio-napari-env\\lib\\site-packages\\pandas\\core\\frame.py:3805\u001b[0m, in \u001b[0;36mDataFrame.__getitem__\u001b[1;34m(self, key)\u001b[0m\n\u001b[0;32m 3803\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mcolumns\u001b[38;5;241m.\u001b[39mnlevels \u001b[38;5;241m>\u001b[39m \u001b[38;5;241m1\u001b[39m:\n\u001b[0;32m 3804\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_getitem_multilevel(key)\n\u001b[1;32m-> 3805\u001b[0m indexer \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcolumns\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget_loc\u001b[49m\u001b[43m(\u001b[49m\u001b[43mkey\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 3806\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m is_integer(indexer):\n\u001b[0;32m 3807\u001b[0m indexer \u001b[38;5;241m=\u001b[39m [indexer]\n",
"File \u001b[1;32m~\\anaconda3\\envs\\devbio-napari-env\\lib\\site-packages\\pandas\\core\\indexes\\base.py:3802\u001b[0m, in \u001b[0;36mIndex.get_loc\u001b[1;34m(self, key, method, tolerance)\u001b[0m\n\u001b[0;32m 3800\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_engine\u001b[38;5;241m.\u001b[39mget_loc(casted_key)\n\u001b[0;32m 3801\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m err:\n\u001b[1;32m-> 3802\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m(key) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01merr\u001b[39;00m\n\u001b[0;32m 3803\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mTypeError\u001b[39;00m:\n\u001b[0;32m 3804\u001b[0m \u001b[38;5;66;03m# If we have a listlike key, _check_indexing_error will raise\u001b[39;00m\n\u001b[0;32m 3805\u001b[0m \u001b[38;5;66;03m# InvalidIndexError. Otherwise we fall through and re-raise\u001b[39;00m\n\u001b[0;32m 3806\u001b[0m \u001b[38;5;66;03m# the TypeError.\u001b[39;00m\n\u001b[0;32m 3807\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_check_indexing_error(key)\n",
"\u001b[1;31mKeyError\u001b[0m: 0"
]
}
],
"source": [
"df_csv[0]"
]
},
{
"cell_type": "markdown",
"id": "dff3ff1f-50a0-475a-9bec-891f555aa7e8",
"metadata": {},
"source": [
"Ooops... We got a big error. That's because **we index dataframes by columns**.\n",
"\n",
"We can then copy&paste the colum names we're interested in and create a new data frame. This is recommended especially when tables are overwhelmingly large."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "bb65a716-5e56-466b-9a19-e388e6900d8e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 422\n",
"1 182\n",
"2 661\n",
"3 437\n",
"4 476\n",
" ... \n",
"56 211\n",
"57 78\n",
"58 86\n",
"59 51\n",
"60 46\n",
"Name: area, Length: 61, dtype: int64"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_csv['area']"
]
},
{
"cell_type": "markdown",
"id": "62ba3625-5206-44c6-b79f-0554fc55bdb7",
"metadata": {},
"source": [
"Notice that when it was printed, the index of the rows came along with it. That's beacuse a Pandas DataFrame with one column is a [Pandas Series](https://pandas.pydata.org/docs/reference/api/pandas.Series.html?highlight=series#pandas.Series)."
]
},
{
"cell_type": "markdown",
"id": "3cb46414-d625-4f6a-85ba-8e05e521b8a5",
"metadata": {},
"source": [
"We can get more columns by passing their names as a list. Furthermore, we can store this \"sub-dataframe\" in a new variable."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "ffb0453b-a0d5-4ae9-88ae-da79e332bb0c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" area | \n",
" mean_intensity | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 422 | \n",
" 192.379147 | \n",
"
\n",
" \n",
" 1 | \n",
" 182 | \n",
" 180.131868 | \n",
"
\n",
" \n",
" 2 | \n",
" 661 | \n",
" 205.216339 | \n",
"
\n",
" \n",
" 3 | \n",
" 437 | \n",
" 216.585812 | \n",
"
\n",
" \n",
" 4 | \n",
" 476 | \n",
" 212.302521 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 56 | \n",
" 211 | \n",
" 185.061611 | \n",
"
\n",
" \n",
" 57 | \n",
" 78 | \n",
" 185.230769 | \n",
"
\n",
" \n",
" 58 | \n",
" 86 | \n",
" 183.720930 | \n",
"
\n",
" \n",
" 59 | \n",
" 51 | \n",
" 190.431373 | \n",
"
\n",
" \n",
" 60 | \n",
" 46 | \n",
" 175.304348 | \n",
"
\n",
" \n",
"
\n",
"
61 rows × 2 columns
\n",
"
"
],
"text/plain": [
" area mean_intensity\n",
"0 422 192.379147\n",
"1 182 180.131868\n",
"2 661 205.216339\n",
"3 437 216.585812\n",
"4 476 212.302521\n",
".. ... ...\n",
"56 211 185.061611\n",
"57 78 185.230769\n",
"58 86 183.720930\n",
"59 51 190.431373\n",
"60 46 175.304348\n",
"\n",
"[61 rows x 2 columns]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_analysis = df_csv[ ['area', 'mean_intensity'] ]\n",
"df_analysis"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "ecada278-aa8a-42f1-95aa-1ae5e157d15f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" area | \n",
" mean_intensity | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 422 | \n",
" 192.379147 | \n",
"
\n",
" \n",
" 1 | \n",
" 182 | \n",
" 180.131868 | \n",
"
\n",
" \n",
" 2 | \n",
" 661 | \n",
" 205.216339 | \n",
"
\n",
" \n",
" 3 | \n",
" 437 | \n",
" 216.585812 | \n",
"
\n",
" \n",
" 4 | \n",
" 476 | \n",
" 212.302521 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 56 | \n",
" 211 | \n",
" 185.061611 | \n",
"
\n",
" \n",
" 57 | \n",
" 78 | \n",
" 185.230769 | \n",
"
\n",
" \n",
" 58 | \n",
" 86 | \n",
" 183.720930 | \n",
"
\n",
" \n",
" 59 | \n",
" 51 | \n",
" 190.431373 | \n",
"
\n",
" \n",
" 60 | \n",
" 46 | \n",
" 175.304348 | \n",
"
\n",
" \n",
"
\n",
"
61 rows × 2 columns
\n",
"
"
],
"text/plain": [
" area mean_intensity\n",
"0 422 192.379147\n",
"1 182 180.131868\n",
"2 661 205.216339\n",
"3 437 216.585812\n",
"4 476 212.302521\n",
".. ... ...\n",
"56 211 185.061611\n",
"57 78 185.230769\n",
"58 86 183.720930\n",
"59 51 190.431373\n",
"60 46 175.304348\n",
"\n",
"[61 rows x 2 columns]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_analysis"
]
},
{
"cell_type": "markdown",
"id": "361726eb-c73d-4c3a-a5ed-530e9517abb5",
"metadata": {},
"source": [
"This gave us the area measurements we were after. \n",
"\n",
"If we want to get a single row, the proper way of doing that is to use the `.loc` method:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "6df270e8-5f3c-4f23-a934-3c554575dd09",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"area 661.000000\n",
"mean_intensity 205.216339\n",
"Name: 2, dtype: float64"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_csv.loc[ 2, ['area', 'mean_intensity']]"
]
},
{
"cell_type": "markdown",
"id": "95b4a73b-f03b-484a-8ed9-82a06ae38bc5",
"metadata": {},
"source": [
"Note that following `.loc`, we have the index by row then column, separated by a comma, in brackets. It is also important to note that row indices need not be integers. And you should not count on them being integers.\n",
"\n",
"In case you really want to access elements in a dataframe by integer indices, like in a numpy array, you can use the `.iloc` method. "
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "5a6cf822-dd51-4955-94ab-ece56bc7ac45",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" area | \n",
" mean_intensity | \n",
"
\n",
" \n",
" \n",
" \n",
" 2 | \n",
" 661 | \n",
" 205.216339 | \n",
"
\n",
" \n",
" 3 | \n",
" 437 | \n",
" 216.585812 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" area mean_intensity\n",
"2 661 205.216339\n",
"3 437 216.585812"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_csv.iloc[2:4, 1:3]"
]
},
{
"cell_type": "markdown",
"id": "d576e7e3-db2b-4a73-9237-0deec1f4116d",
"metadata": {},
"source": [
"The downside is that the code becomes less explicit when column names are absent.\n",
"\n",
"In practice you will almost never use row indices, but rather use Boolean indexing or Masking."
]
},
{
"cell_type": "markdown",
"id": "6fe65e75-8003-4175-96c6-21cb26eb5d31",
"metadata": {},
"source": [
"## Boolean indexing of data frames\n",
"In case we want to focus our further analysis on cells that have a certain minimum area, we can do this by passing boolean indices to the rows. This process is also sometimes call masking.\n",
"\n",
"Suppose we want the rows for which `df_analysis[\"area\"] > 50`. We can essentially plug this syntax directly when using `.loc`."
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "a4eadd9b-e287-4ca8-b1ff-d1278c24151c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" area | \n",
" mean_intensity | \n",
"
\n",
" \n",
" \n",
" \n",
" 55 | \n",
" 280 | \n",
" 189.800000 | \n",
"
\n",
" \n",
" 56 | \n",
" 211 | \n",
" 185.061611 | \n",
"
\n",
" \n",
" 57 | \n",
" 78 | \n",
" 185.230769 | \n",
"
\n",
" \n",
" 58 | \n",
" 86 | \n",
" 183.720930 | \n",
"
\n",
" \n",
" 59 | \n",
" 51 | \n",
" 190.431373 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" area mean_intensity\n",
"55 280 189.800000\n",
"56 211 185.061611\n",
"57 78 185.230769\n",
"58 86 183.720930\n",
"59 51 190.431373"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_analysis_filtered = df_analysis[ df_analysis[\"area\"] > 50]\n",
"df_analysis_filtered.tail()"
]
},
{
"cell_type": "markdown",
"id": "64eb1086-ebc8-4905-afc2-ed0dc01620b9",
"metadata": {},
"source": [
"## Adding new columns\n",
"In Pandas, it is very easy to generate new columns from existing columns. We just pass some operation between other columns to a new column name."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "402892eb-b1ea-4f11-b272-9c44207f7991",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\Marcelo_Researcher\\AppData\\Local\\Temp\\ipykernel_19228\\206920941.py:1: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" df_analysis['total_intensity'] = df_analysis['area'] * df_analysis['mean_intensity']\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" area | \n",
" mean_intensity | \n",
" total_intensity | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 422 | \n",
" 192.379147 | \n",
" 81184.0 | \n",
"
\n",
" \n",
" 1 | \n",
" 182 | \n",
" 180.131868 | \n",
" 32784.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 661 | \n",
" 205.216339 | \n",
" 135648.0 | \n",
"
\n",
" \n",
" 3 | \n",
" 437 | \n",
" 216.585812 | \n",
" 94648.0 | \n",
"
\n",
" \n",
" 4 | \n",
" 476 | \n",
" 212.302521 | \n",
" 101056.0 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 56 | \n",
" 211 | \n",
" 185.061611 | \n",
" 39048.0 | \n",
"
\n",
" \n",
" 57 | \n",
" 78 | \n",
" 185.230769 | \n",
" 14448.0 | \n",
"
\n",
" \n",
" 58 | \n",
" 86 | \n",
" 183.720930 | \n",
" 15800.0 | \n",
"
\n",
" \n",
" 59 | \n",
" 51 | \n",
" 190.431373 | \n",
" 9712.0 | \n",
"
\n",
" \n",
" 60 | \n",
" 46 | \n",
" 175.304348 | \n",
" 8064.0 | \n",
"
\n",
" \n",
"
\n",
"
61 rows × 3 columns
\n",
"
"
],
"text/plain": [
" area mean_intensity total_intensity\n",
"0 422 192.379147 81184.0\n",
"1 182 180.131868 32784.0\n",
"2 661 205.216339 135648.0\n",
"3 437 216.585812 94648.0\n",
"4 476 212.302521 101056.0\n",
".. ... ... ...\n",
"56 211 185.061611 39048.0\n",
"57 78 185.230769 14448.0\n",
"58 86 183.720930 15800.0\n",
"59 51 190.431373 9712.0\n",
"60 46 175.304348 8064.0\n",
"\n",
"[61 rows x 3 columns]"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_analysis['total_intensity'] = df_analysis['area'] * df_analysis['mean_intensity']\n",
"df_analysis"
]
},
{
"cell_type": "markdown",
"id": "9db24255-2290-4e83-ac74-93d780378175",
"metadata": {},
"source": [
"## Exercise\n",
"From the loaded CSV file, create a table that only contains these columns:\n",
"* `minor_axis_length`\n",
"* `major_axis_length`\n",
"* `aspect_ratio`"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "87f226cd-721b-43e3-a31a-faed5e8a6733",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Unnamed: 0 | \n",
" area | \n",
" mean_intensity | \n",
" minor_axis_length | \n",
" major_axis_length | \n",
" eccentricity | \n",
" extent | \n",
" feret_diameter_max | \n",
" equivalent_diameter_area | \n",
" bbox-0 | \n",
" bbox-1 | \n",
" bbox-2 | \n",
" bbox-3 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 422 | \n",
" 192.379147 | \n",
" 16.488550 | \n",
" 34.566789 | \n",
" 0.878900 | \n",
" 0.586111 | \n",
" 35.227830 | \n",
" 23.179885 | \n",
" 0 | \n",
" 11 | \n",
" 30 | \n",
" 35 | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 182 | \n",
" 180.131868 | \n",
" 11.736074 | \n",
" 20.802697 | \n",
" 0.825665 | \n",
" 0.787879 | \n",
" 21.377558 | \n",
" 15.222667 | \n",
" 0 | \n",
" 53 | \n",
" 11 | \n",
" 74 | \n",
"
\n",
" \n",
" 2 | \n",
" 2 | \n",
" 661 | \n",
" 205.216339 | \n",
" 28.409502 | \n",
" 30.208433 | \n",
" 0.339934 | \n",
" 0.874339 | \n",
" 32.756679 | \n",
" 29.010538 | \n",
" 0 | \n",
" 95 | \n",
" 28 | \n",
" 122 | \n",
"
\n",
" \n",
" 3 | \n",
" 3 | \n",
" 437 | \n",
" 216.585812 | \n",
" 23.143996 | \n",
" 24.606130 | \n",
" 0.339576 | \n",
" 0.826087 | \n",
" 26.925824 | \n",
" 23.588253 | \n",
" 0 | \n",
" 144 | \n",
" 23 | \n",
" 167 | \n",
"
\n",
" \n",
" 4 | \n",
" 4 | \n",
" 476 | \n",
" 212.302521 | \n",
" 19.852882 | \n",
" 31.075106 | \n",
" 0.769317 | \n",
" 0.863884 | \n",
" 31.384710 | \n",
" 24.618327 | \n",
" 0 | \n",
" 237 | \n",
" 29 | \n",
" 256 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 56 | \n",
" 56 | \n",
" 211 | \n",
" 185.061611 | \n",
" 14.522762 | \n",
" 18.489138 | \n",
" 0.618893 | \n",
" 0.781481 | \n",
" 18.973666 | \n",
" 16.390654 | \n",
" 232 | \n",
" 39 | \n",
" 250 | \n",
" 54 | \n",
"
\n",
" \n",
" 57 | \n",
" 57 | \n",
" 78 | \n",
" 185.230769 | \n",
" 6.028638 | \n",
" 17.579799 | \n",
" 0.939361 | \n",
" 0.722222 | \n",
" 18.027756 | \n",
" 9.965575 | \n",
" 248 | \n",
" 170 | \n",
" 254 | \n",
" 188 | \n",
"
\n",
" \n",
" 58 | \n",
" 58 | \n",
" 86 | \n",
" 183.720930 | \n",
" 5.426871 | \n",
" 21.261427 | \n",
" 0.966876 | \n",
" 0.781818 | \n",
" 22.000000 | \n",
" 10.464158 | \n",
" 249 | \n",
" 117 | \n",
" 254 | \n",
" 139 | \n",
"
\n",
" \n",
" 59 | \n",
" 59 | \n",
" 51 | \n",
" 190.431373 | \n",
" 5.032414 | \n",
" 13.742079 | \n",
" 0.930534 | \n",
" 0.728571 | \n",
" 14.035669 | \n",
" 8.058239 | \n",
" 249 | \n",
" 228 | \n",
" 254 | \n",
" 242 | \n",
"
\n",
" \n",
" 60 | \n",
" 60 | \n",
" 46 | \n",
" 175.304348 | \n",
" 3.803982 | \n",
" 15.948714 | \n",
" 0.971139 | \n",
" 0.766667 | \n",
" 15.033296 | \n",
" 7.653040 | \n",
" 250 | \n",
" 67 | \n",
" 254 | \n",
" 82 | \n",
"
\n",
" \n",
"
\n",
"
61 rows × 13 columns
\n",
"
"
],
"text/plain": [
" Unnamed: 0 area mean_intensity minor_axis_length major_axis_length \\\n",
"0 0 422 192.379147 16.488550 34.566789 \n",
"1 1 182 180.131868 11.736074 20.802697 \n",
"2 2 661 205.216339 28.409502 30.208433 \n",
"3 3 437 216.585812 23.143996 24.606130 \n",
"4 4 476 212.302521 19.852882 31.075106 \n",
".. ... ... ... ... ... \n",
"56 56 211 185.061611 14.522762 18.489138 \n",
"57 57 78 185.230769 6.028638 17.579799 \n",
"58 58 86 183.720930 5.426871 21.261427 \n",
"59 59 51 190.431373 5.032414 13.742079 \n",
"60 60 46 175.304348 3.803982 15.948714 \n",
"\n",
" eccentricity extent feret_diameter_max equivalent_diameter_area \\\n",
"0 0.878900 0.586111 35.227830 23.179885 \n",
"1 0.825665 0.787879 21.377558 15.222667 \n",
"2 0.339934 0.874339 32.756679 29.010538 \n",
"3 0.339576 0.826087 26.925824 23.588253 \n",
"4 0.769317 0.863884 31.384710 24.618327 \n",
".. ... ... ... ... \n",
"56 0.618893 0.781481 18.973666 16.390654 \n",
"57 0.939361 0.722222 18.027756 9.965575 \n",
"58 0.966876 0.781818 22.000000 10.464158 \n",
"59 0.930534 0.728571 14.035669 8.058239 \n",
"60 0.971139 0.766667 15.033296 7.653040 \n",
"\n",
" bbox-0 bbox-1 bbox-2 bbox-3 \n",
"0 0 11 30 35 \n",
"1 0 53 11 74 \n",
"2 0 95 28 122 \n",
"3 0 144 23 167 \n",
"4 0 237 29 256 \n",
".. ... ... ... ... \n",
"56 232 39 250 54 \n",
"57 248 170 254 188 \n",
"58 249 117 254 139 \n",
"59 249 228 254 242 \n",
"60 250 67 254 82 \n",
"\n",
"[61 rows x 13 columns]"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_shape = pd.read_csv('../../data/blobs_statistics.csv')\n",
"df_shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "59916b36-cdd1-4af6-852e-25b0806ac11c",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}