Folder Structures#

In the previous notebook, we saw how to access images if they are stored in a single folder. In this notebook, we will show two more organized examples of folder structures: an Omero-like structure and a machine learning suited structure.

from skimage.io import imread, imsave
import matplotlib.pyplot as plt
from pathlib import Path
import numpy as np

Omero-like Structure#

Omero is an image data management platform that handles image data and metadata, allowing to remotely view, organize, analyze and share images. Below is a screenshot of an Omero server.

On the left side, the folder structure can be obeserved. Omero operates with two levels of hierarchy: images can be put inside directories called Datasets, and Datasets can be put inside directories called Projects. Further differentiation of files is made via metadata, by means of image tags and image key-value pairs.

Mimicking that structure, our local Project directory called “Project2_Omero_like” contains two folders (Datasets): Control and Group1, each containing images and other files.

Project2_Omero_like
|
├─ Control
|    ├─ Readme.txt
|    ├─ A9 p5d.tif
|    ⁞
|
├─ Group1
|    ├─ Readme.txt
|    ├─ 17P1_POS0006_D_1UL.tif
|    ⁞ 
|     
└─ Readme.txt

Opening multiple images from folders#

We start by providing the path to the highest level folder.

data_folder2 = '..\..\data\Folder_Structures\Project2_Omero_like' 
data_path = Path(data_folder2)

Since here we have a two-level hierarchy of directories, we need 2 for loops to iterate over each level. The first for loop iterates over the top level, .i.e, we get the paths to folders/files inside the “Project” folder.

for path in data_path.iterdir():
    print(path)
..\..\data\Folder_Structures\Project2_Omero_like\Control
..\..\data\Folder_Structures\Project2_Omero_like\Group1
..\..\data\Folder_Structures\Project2_Omero_like\Readme.txt

To access the lower levels, we need two things:

1. Check if the path leads to another folder
2. If yes, iterate over this folder

We can do that by putting an if condition inside the first for loop and a second for loop to be run if the condition is met.

# Fisrt for loop: iterates over Project folder
for path in data_path.iterdir():
    print('Project folder path: \n', path)
    # Check if path leads to another folder
    if path.is_dir():
        # In case the condition is met, iterate over the new path
        for file_path in sorted(path.iterdir()):
            print('Dataset folder path: ', file_path)
Project folder path: 
 ..\..\data\Folder_Structures\Project2_Omero_like\Control
Dataset folder path:  ..\..\data\Folder_Structures\Project2_Omero_like\Control\A9 p10d.tif
Dataset folder path:  ..\..\data\Folder_Structures\Project2_Omero_like\Control\A9 p5d.tif
Dataset folder path:  ..\..\data\Folder_Structures\Project2_Omero_like\Control\A9 p7d.tif
Dataset folder path:  ..\..\data\Folder_Structures\Project2_Omero_like\Control\A9 p9d.tif
Dataset folder path:  ..\..\data\Folder_Structures\Project2_Omero_like\Control\Readme.txt
Project folder path: 
 ..\..\data\Folder_Structures\Project2_Omero_like\Group1
Dataset folder path:  ..\..\data\Folder_Structures\Project2_Omero_like\Group1\17P1_POS0006_D_1UL.tif
Dataset folder path:  ..\..\data\Folder_Structures\Project2_Omero_like\Group1\17P1_POS0007_D_1UL.tif
Dataset folder path:  ..\..\data\Folder_Structures\Project2_Omero_like\Group1\17P1_POS0011_D_1UL.tif
Dataset folder path:  ..\..\data\Folder_Structures\Project2_Omero_like\Group1\17P1_POS0013_D_1UL.tif
Dataset folder path:  ..\..\data\Folder_Structures\Project2_Omero_like\Group1\17P1_POS0014_D_1UL.tif
Dataset folder path:  ..\..\data\Folder_Structures\Project2_Omero_like\Group1\Readme.txt
Project folder path: 
 ..\..\data\Folder_Structures\Project2_Omero_like\Readme.txt

As usual, to filter out files that are not images, we can add extra conditions. We also store paths from different folders in separated lists by checking the path stem (the final path component, without its suffix).

image_path_list_control = []
image_path_list_group1 = []
for path in data_path.iterdir():
    
    if path.is_dir():
        
        for file_path in sorted(path.iterdir()):
            # Check if file is image (ends with .tif) and if current folder name is 'Control'
            if (file_path.suffix == '.tif') and (path.stem == 'Control'):
                image_path_list_control += [file_path]
                
            # Check if file is image (ends with .tif) and if current folder name is 'Group1'
            elif (file_path.suffix == '.tif') and (path.stem == 'Group1'):
                image_path_list_group1 += [file_path]
image_path_list_control
[WindowsPath('../../data/Folder_Structures/Project2_Omero_like/Control/A9 p10d.tif'),
 WindowsPath('../../data/Folder_Structures/Project2_Omero_like/Control/A9 p5d.tif'),
 WindowsPath('../../data/Folder_Structures/Project2_Omero_like/Control/A9 p7d.tif'),
 WindowsPath('../../data/Folder_Structures/Project2_Omero_like/Control/A9 p9d.tif')]
image_path_list_group1
[WindowsPath('../../data/Folder_Structures/Project2_Omero_like/Group1/17P1_POS0006_D_1UL.tif'),
 WindowsPath('../../data/Folder_Structures/Project2_Omero_like/Group1/17P1_POS0007_D_1UL.tif'),
 WindowsPath('../../data/Folder_Structures/Project2_Omero_like/Group1/17P1_POS0011_D_1UL.tif'),
 WindowsPath('../../data/Folder_Structures/Project2_Omero_like/Group1/17P1_POS0013_D_1UL.tif'),
 WindowsPath('../../data/Folder_Structures/Project2_Omero_like/Group1/17P1_POS0014_D_1UL.tif')]

Machine Learning Style Folder Structure#

With machine learning or deep learning, we typically have a folder with intensity images and another folder with labeled images or masks with the same file name. Images in both folders must be read in pairs, so storing their paths in an ordered list is important.

There are a few variations to this initial structure. One of them could be having another level on top separating these images into train, test and validation groups.

Another possiblity is having a third folder with manual annotations for the labeled objects. This last structure is the one we replicated below, in our local folder called “Project3_Machine_Learning_style”.

Project3_Machine_learning_style
|
├─ Annotations
|    ├─ image_01.tif
|    ├─ image_02.tif
|    ⁞
|
├─ Label_Images
|    ├─ image_01.tif
|    ├─ image_02.tif
|    ⁞ 
|     
├─ Raw_Images
|    ├─ image_01.tif
|    ├─ image_02.tif
|    ⁞ 
└─ Readme.md
data_folder3 = '..\..\data\Folder_Structures\Project3_Machine_Learning_style'
data_path = Path(data_folder3)
data_path
WindowsPath('../../data/Folder_Structures/Project3_Machine_Learning_style')

Exercise#

Iterate over this image repository and store each type of image path (raw, label and annotation) in a different list.

In another cell, display the first image from each list.