# Developer Guide for Cytotoxicity Pipeline

The Cytotoxicity pipeline is a modular, CLI-based workflow designed for flexibility and reusability. The workflow consists of the following main steps:

1. **Preprocessing**: Transforms raw images into processed images.  
    *Input*: image → *Output*: image

2. **Segmentation**: Converts images to segmentation labels, or refines existing labels.  
    *Input*: image → label / label → label

3. **Tabulation**: Converts images or labels into a sparse tabular format for downstream analysis.  
    *Input*: image → dataframe / label → dataframe

4. **Tracking**: Tracks objects over time using either dense (label-based) or sparse (feature point-based) methods.  
    *Input*: label → dataframe / dataframe → dataframe

5. **Analysis**: Performs analysis and visualization using any combination of images, labels, or tabular data.  
    *Input*: dataframe → dataframe / arbitrary outputs

Each step is implemented as an independent module, allowing for easy customization and extension of the pipeline.

For each workflow process it only accept specific type of input-output pair. To enhance module reusability the data is interfaced as a python dictionary object, e.g. in preprocessing we provide a dict input with keyword "image". Then return file is in the same format.

For individual steps check the table for input-output dictionary key-pairs:

| Step          | Input Key           | Output Key          |
|---------------|---------------------|---------------------|
| Preprocessing | "image"             | "image"             |
| Segmentation  | "image"/"label"     | "label"             |
| Tabulate      | "image"/"label"     | "dataframe"         |
| Tracking      | "dataframe"         | "dataframe"         |
| Analysis      | "image"/"label"/"dataframe"         | arbitrary           |

For short, following the template class for the modular workflow:

```python
class MySegmentationClass(object):
    def __init__(self,arg1=[0,1,2],arg2="foo", verbose=True) -> None:
        """
        Function documentation comes to here

        Args:
            arg1 (tuple or int): Tuple or integer argument input
            arg2 (str): String input
            verbose (bool): Turn on or off the processing printout
        """
        self.name = "MyPreprocessingClass"
        self.arg1 = arg1
        self.arg2 = arg2
        self.verbose = verbose

    def __call__(self, data) -> Any:
        image = data["image"]

        if self.verbose:
            tqdm.write("Class args: [{},{}]".format(self.arg1,self.arg2))

        # some processing here
        ...
        label = awesome_segmentation(image)

        return {"image": image, "label": label}

```

A full example of the class can be found in [../preprocessing/normalization.py](../preprocessing/normalization.py)

⚠️ **Important**: Add your package dependency to [requirements.txt](../requirements.txt) ⚠️

⚠️ Add notes to [README.md](../README.md) and/or [./doc/setup.md](./setup.md) when necessary, particularly conda/mamba specific dependencies⚠️

To load the class back to the main function you only need to add corresponding header import and edit the pipeline YAML file.

## Custom Intermediate Output
TODO

## Dask Support
For better big data management we recommended the usage of [Dask array](https://docs.dask.org/en/stable/array.html) than numpy, though in some cases cupy and numpy may do the work.