Architecture

pyCyto is designed as a layered suite — a set of tools that work together at different levels of abstraction, from interactive exploration to fully automated HPC execution.


Layered Suite Model

        graph TD
    L1["Layer 1 — Jupyter Notebooks (interactive exploration)"]
    L2["Layer 2 — Core Python API (algorithm logic)"]
    L3["Layer 3 — YAML + CLI (declarative automation)"]
    L4["Layer 4 — SLURM Distributed (parallel HPC scale)"]

    L1 --> L2
    L2 --> L3
    L3 --> L4

    classDef layer fill:#0d7377,color:#fff,stroke:#0a5c60
    class L1,L2,L3,L4 layer
    

Each layer calls the one below — notebooks instantiate Python classes; the CLI reads YAML and calls the same classes; the SLURM wrapper submits the CLI call across many nodes. Adding a new algorithm only requires touching Layer 2 — the other layers pick it up automatically through YAML dispatch.


Compute Node Model

Every pipeline stage is a pure function:

output = f(input, params)
  • input — a Python dict with standardized keys (see Data Types below)

  • params — algorithm hyperparameters set in __init__ (channel names, thresholds, model types)

  • output — a Python dict with standardized keys

Compute specs are sideloaded separately. The algorithm class has no knowledge of memory limits, partition names, or GPU count — those are declared in the companion pipeline-resources.yaml. This means you can run the same algorithm class:

  • locally in a notebook (no SLURM)

  • via pixi run cyto --pipeline my.yaml (single node)

  • distributed across 64 SLURM array jobs

without changing any algorithm code.

configs/pipelines/my.yaml        ← WHAT: algorithm class + params
configs/distributed/res.yaml     ← HOW: SLURM partition, GPU, memory, batch size

See Dev Guide → Compute Node Model for the full API contract.


The Five Data Types

All data flowing between stages is one of five canonical types:

Type

Dict Key

Format

Produced by

Consumed by

Image

"image"

Dask/NumPy array (T, Y, X) or (T, Z, Y, X), float32

File I/O, Preprocessing

Preprocessing, Segmentation, Postprocessing

Label

"label"

uint32 integer mask; background = 0

Segmentation

Tabulation, Postprocessing

Table

"dataframe"

pandas/Dask DataFrame; one row per cell per frame

Tabulation

Tracking, Postprocessing

Network

"network"

NetworkX Graph; nodes = cell IDs

Postprocessing

Postprocessing, export

Arbitrary

any key

Any pickleable object

any stage

any stage

Note

The Table type is the pivot of the pipeline. Once cells are tabulated into a sparse DataFrame, all downstream analytics (tracking, contact tracing, kinematics) operate on this lightweight representation rather than on the full image arrays.

A future URI-based I/O abstraction will allow these types to be addressed by URI (file://, db://, ceph://, s3://) across distributed compute nodes.


Six Analysis Categories

pyCyto’s modules cover six categories of cytotoxicity analysis:

Category

Stage

Representative module

Output type

File I/O

Input loading

aicsimageio, OME-TIFF reader

Image

Preprocessing

preprocessing

RegisterDenoise, PercentileNormalization

Image

Segmentation

segmentation

CellPose, StarDist

Label

Tracking

tracking

TrackMate, Ultrack

Table

Kinematics & Contact Analysis

postprocessing

CrossCellContactMeasures

Table, Network

Cell Network Triangulation

postprocessing

CellTriangulation

Network


Three Interfaces

The same pipeline code is accessible through three user-facing interfaces. Choose based on the level of control you need:

Interface

Use when

Entry point

Jupyter notebooks

Exploring data, developing algorithms, producing figures

pixi run jupyter

YAML + CLI

Automating repeatable runs on a single node

pixi run cyto --pipeline my.yaml

SLURM distributed

Scaling across many nodes or large FOV datasets

distributed/submit_batch_jobs.py

See Interfaces for details on each.


Module Registration

The YAML dispatcher finds module classes by the name field in the pipeline YAML:

- name: CellPose          # resolved to cyto.segmentation.cellpose.CellPose
  tag: Cellpose_TCell
  args:
    model_type: cpsam

The lookup follows the convention:

cyto.<stage>.<module_file>.<ClassName>

where <stage> is one of preprocessing, segmentation, tracking, postprocessing.

When adding a new module, register it in main.py and ensure the class name matches the name field in user YAML files. See Plugin Integration for the full registration pattern.