Architecture¶

pyCyto is designed as a layered suite — a set of tools that work together at different levels of abstraction, from interactive exploration to fully automated HPC execution.

Layered Suite Model¶

        graph TD
    L1["Layer 1 — Jupyter Notebooks (interactive exploration)"]
    L2["Layer 2 — Core Python API (algorithm logic)"]
    L3["Layer 3 — YAML + CLI (declarative automation)"]
    L4["Layer 4 — SLURM Distributed (parallel HPC scale)"]

    L1 --> L2
    L2 --> L3
    L3 --> L4

    classDef layer fill:#0d7377,color:#fff,stroke:#0a5c60
    class L1,L2,L3,L4 layer

Each layer calls the one below — notebooks instantiate Python classes; the CLI reads YAML and calls the same classes; the SLURM wrapper submits the CLI call across many nodes. Adding a new algorithm only requires touching Layer 2 — the other layers pick it up automatically through YAML dispatch.

Compute Node Model¶

Every pipeline stage is a pure function:

output = f(input, params)

input — a Python dict with standardized keys (see Data Types below)
params — algorithm hyperparameters set in __init__ (channel names, thresholds, model types)
output — a Python dict with standardized keys

Compute specs are sideloaded separately. The algorithm class has no knowledge of memory limits, partition names, or GPU count — those are declared in the companion pipeline-resources.yaml. This means you can run the same algorithm class:

locally in a notebook (no SLURM)
via pixi run cyto --pipeline my.yaml (single node)
distributed across 64 SLURM array jobs

without changing any algorithm code.

configs/pipelines/my.yaml        ← WHAT: algorithm class + params
configs/distributed/res.yaml     ← HOW: SLURM partition, GPU, memory, batch size

See Dev Guide → Compute Node Model for the full API contract.

The Five Data Types¶

All data flowing between stages is one of five canonical types:

Type	Dict Key	Format	Produced by	Consumed by
Image	`"image"`	Dask/NumPy array `(T, Y, X)` or `(T, Z, Y, X)`, float32	File I/O, Preprocessing	Preprocessing, Segmentation, Postprocessing
Label	`"label"`	uint32 integer mask; background = 0	Segmentation	Tabulation, Postprocessing
Table	`"dataframe"`	pandas/Dask DataFrame; one row per cell per frame	Tabulation	Tracking, Postprocessing
Network	`"network"`	NetworkX `Graph`; nodes = cell IDs	Postprocessing	Postprocessing, export
Arbitrary	any key	Any pickleable object	any stage	any stage

Note

The Table type is the pivot of the pipeline. Once cells are tabulated into a sparse DataFrame, all downstream analytics (tracking, contact tracing, kinematics) operate on this lightweight representation rather than on the full image arrays.

A future URI-based I/O abstraction will allow these types to be addressed by URI (file://, db://, ceph://, s3://) across distributed compute nodes.

Six Analysis Categories¶

pyCyto’s modules cover six categories of cytotoxicity analysis:

Category	Stage	Representative module	Output type
File I/O	Input loading	`aicsimageio`, OME-TIFF reader	Image
Preprocessing	`preprocessing`	`RegisterDenoise`, `PercentileNormalization`	Image
Segmentation	`segmentation`	`CellPose`, `StarDist`	Label
Tracking	`tracking`	`TrackMate`, `Ultrack`	Table
Kinematics & Contact Analysis	`postprocessing`	`CrossCellContactMeasures`	Table, Network
Cell Network Triangulation	`postprocessing`	`CellTriangulation`	Network

Three Interfaces¶

The same pipeline code is accessible through three user-facing interfaces. Choose based on the level of control you need:

Interface	Use when	Entry point
Jupyter notebooks	Exploring data, developing algorithms, producing figures	`pixi run jupyter`
YAML + CLI	Automating repeatable runs on a single node	`pixi run cyto --pipeline my.yaml`
SLURM distributed	Scaling across many nodes or large FOV datasets	`distributed/submit_batch_jobs.py`

See Interfaces for details on each.

Module Registration¶

The YAML dispatcher finds module classes by the name field in the pipeline YAML:

- name: CellPose          # resolved to cyto.segmentation.cellpose.CellPose
  tag: Cellpose_TCell
  args:
    model_type: cpsam

The lookup follows the convention:

cyto.<stage>.<module_file>.<ClassName>

where <stage> is one of preprocessing, segmentation, tracking, postprocessing.

When adding a new module, register it in main.py and ensure the class name matches the name field in user YAML files. See Plugin Integration for the full registration pattern.