Developer Guide for Cytotoxicity Pipeline

The Cytotoxicity pipeline is a modular, CLI-based workflow designed for flexibility and reusability. The workflow consists of the following main steps:

  1. Preprocessing: Transforms raw images into processed images.
    Input: image → Output: image

  2. Segmentation: Converts images to segmentation labels, or refines existing labels.
    Input: image → label / label → label

  3. Tabulation: Converts images or labels into a sparse tabular format for downstream analysis.
    Input: image → dataframe / label → dataframe

  4. Tracking: Tracks objects over time using either dense (label-based) or sparse (feature point-based) methods.
    Input: label → dataframe / dataframe → dataframe

  5. Analysis: Performs analysis and visualization using any combination of images, labels, or tabular data.
    Input: dataframe → dataframe / arbitrary outputs

Each step is implemented as an independent module, allowing for easy customization and extension of the pipeline.

For each workflow process it only accept specific type of input-output pair. To enhance module reusability the data is interfaced as a python dictionary object, e.g. in preprocessing we provide a dict input with keyword “image”. Then return file is in the same format.

For individual steps check the table for input-output dictionary key-pairs:

Step

Input Key

Output Key

Preprocessing

“image”

“image”

Segmentation

“image”/”label”

“label”

Tabulate

“image”/”label”

“dataframe”

Tracking

“dataframe”

“dataframe”

Analysis

“image”/”label”/”dataframe”

arbitrary

For short, following the template class for the modular workflow:

class MySegmentationClass(object):
    def __init__(self,arg1=[0,1,2],arg2="foo", verbose=True) -> None:
        """
        Function documentation comes to here

        Args:
            arg1 (tuple or int): Tuple or integer argument input
            arg2 (str): String input
            verbose (bool): Turn on or off the processing printout
        """
        self.name = "MyPreprocessingClass"
        self.arg1 = arg1
        self.arg2 = arg2
        self.verbose = verbose

    def __call__(self, data) -> Any:
        image = data["image"]

        if self.verbose:
            tqdm.write("Class args: [{},{}]".format(self.arg1,self.arg2))

        # some processing here
        ...
        label = awesome_segmentation(image)

        return {"image": image, "label": label}

A full example of the class can be found in ../preprocessing/normalization.py

⚠️ Important: Add your package dependency to requirements.txt ⚠️

⚠️ Add notes to README.md and/or ./doc/setup.md when necessary, particularly conda/mamba specific dependencies⚠️

To load the class back to the main function you only need to add corresponding header import and edit the pipeline YAML file.

Custom Intermediate Output

TODO

Dask Support

For better big data management we recommended the usage of Dask array than numpy, though in some cases cupy and numpy may do the work.