Architecture¶
pyCyto is designed as a layered suite — a set of tools that work together at different levels of abstraction, from interactive exploration to fully automated HPC execution.
Layered Suite Model¶
graph TD
L1["Layer 1 — Jupyter Notebooks (interactive exploration)"]
L2["Layer 2 — Core Python API (algorithm logic)"]
L3["Layer 3 — YAML + CLI (declarative automation)"]
L4["Layer 4 — SLURM Distributed (parallel HPC scale)"]
L1 --> L2
L2 --> L3
L3 --> L4
classDef layer fill:#0d7377,color:#fff,stroke:#0a5c60
class L1,L2,L3,L4 layer
Each layer calls the one below — notebooks instantiate Python classes; the CLI reads YAML and calls the same classes; the SLURM wrapper submits the CLI call across many nodes. Adding a new algorithm only requires touching Layer 2 — the other layers pick it up automatically through YAML dispatch.
Compute Node Model¶
Every pipeline stage is a pure function:
output = f(input, params)
input— a Python dict with standardized keys (see Data Types below)params— algorithm hyperparameters set in__init__(channel names, thresholds, model types)output— a Python dict with standardized keys
Compute specs are sideloaded separately. The algorithm class has no knowledge of memory limits, partition names, or GPU count — those are declared in the companion pipeline-resources.yaml. This means you can run the same algorithm class:
locally in a notebook (no SLURM)
via
pixi run cyto --pipeline my.yaml(single node)distributed across 64 SLURM array jobs
without changing any algorithm code.
configs/pipelines/my.yaml ← WHAT: algorithm class + params
configs/distributed/res.yaml ← HOW: SLURM partition, GPU, memory, batch size
See Dev Guide → Compute Node Model for the full API contract.
The Five Data Types¶
All data flowing between stages is one of five canonical types:
Type |
Dict Key |
Format |
Produced by |
Consumed by |
|---|---|---|---|---|
Image |
|
Dask/NumPy array |
File I/O, Preprocessing |
Preprocessing, Segmentation, Postprocessing |
Label |
|
uint32 integer mask; background = 0 |
Segmentation |
Tabulation, Postprocessing |
Table |
|
pandas/Dask DataFrame; one row per cell per frame |
Tabulation |
Tracking, Postprocessing |
Network |
|
NetworkX |
Postprocessing |
Postprocessing, export |
Arbitrary |
any key |
Any pickleable object |
any stage |
any stage |
Note
The Table type is the pivot of the pipeline. Once cells are tabulated into a sparse DataFrame, all downstream analytics (tracking, contact tracing, kinematics) operate on this lightweight representation rather than on the full image arrays.
A future URI-based I/O abstraction will allow these types to be addressed by URI (file://, db://, ceph://, s3://) across distributed compute nodes.
Six Analysis Categories¶
pyCyto’s modules cover six categories of cytotoxicity analysis:
Category |
Stage |
Representative module |
Output type |
|---|---|---|---|
File I/O |
Input loading |
|
Image |
Preprocessing |
|
|
Image |
Segmentation |
|
|
Label |
Tracking |
|
|
Table |
Kinematics & Contact Analysis |
|
|
Table, Network |
Cell Network Triangulation |
|
|
Network |
Three Interfaces¶
The same pipeline code is accessible through three user-facing interfaces. Choose based on the level of control you need:
Interface |
Use when |
Entry point |
|---|---|---|
Jupyter notebooks |
Exploring data, developing algorithms, producing figures |
|
YAML + CLI |
Automating repeatable runs on a single node |
|
SLURM distributed |
Scaling across many nodes or large FOV datasets |
|
See Interfaces for details on each.
Module Registration¶
The YAML dispatcher finds module classes by the name field in the pipeline YAML:
- name: CellPose # resolved to cyto.segmentation.cellpose.CellPose
tag: Cellpose_TCell
args:
model_type: cpsam
The lookup follows the convention:
cyto.<stage>.<module_file>.<ClassName>
where <stage> is one of preprocessing, segmentation, tracking, postprocessing.
When adding a new module, register it in main.py and ensure the class name matches the name field in user YAML files. See Plugin Integration for the full registration pattern.