Developer Guide¶
pyCyto is built around a single architectural principle: every pipeline stage is a stateless Python callable that maps a dictionary to a dictionary. This page explains how to extend the pipeline, how the compute node model works, and how to package and containerize new modules.
Sub-pages:
The Compute Node Model¶
Each pipeline stage is a self-contained unit of computation:
input_dict ──► Module(params) ──► output_dict
The module receives all inputs through a dictionary, performs computation, and returns outputs through a dictionary. No global state, no file I/O inside the module class itself (I/O is handled by the pipeline orchestrator). This design means the same module runs identically whether it is called:
interactively in a notebook
via
cyto --pipeline my_pipeline.yamlinside a SLURM sbatch job
inside a container (Docker or Apptainer)
Dictionary key contract¶
Stage |
Input key(s) |
Output key(s) |
|---|---|---|
Preprocessing |
|
|
Segmentation |
|
|
Tabulation |
|
|
Tracking |
|
|
Contact |
|
|
Postprocessing |
any |
arbitrary |
Always use these exact string keys. The pipeline YAML router uses them to wire stages together.
Module Template¶
All pipeline modules follow this class structure:
class MyModule(object):
def __init__(self, param1=default_value, verbose=True) -> None:
"""
Short description.
Args:
param1: Description.
verbose (bool): Enable progress logging.
"""
self.name = "MyModule"
self.param1 = param1
self.verbose = verbose
def __call__(self, data: dict) -> dict:
"""
Process data.
Args:
data (dict): Input with standardized keys.
Returns:
dict: Output with standardized keys.
"""
image = data["image"]
if self.verbose:
tqdm.write(f"[{self.name}] processing ...")
result = _my_algorithm(image, self.param1)
return {"image": result}
Rules:
Use
tqdm.write()for logging — notprint()— so progress bars are not broken.Prefer Dask arrays over NumPy for inputs/outputs so lazy evaluation propagates.
Do not open files inside
__call__. File paths belong in__init__or in the orchestrator.Raise
ValueError(notAssertionError) for invalid inputs.
Adding to the pipeline¶
Place the module in the appropriate
cyto/<stage>/subpackage.Add an import in
cyto/<stage>/__init__.py.Add dependencies to
pixi.toml([dependencies]for universal,[feature.<env>.dependencies]for optional).Add a YAML block in an example pipeline under
pipelines/.For distributed execution, add a corresponding sbatch template in
distributed/<stage>/.
Plugin Integration Guide¶
Some analysis steps rely on external tools (Fiji/TrackMate, pyclesperanto, ANTs). The integration pattern is the same in each case: wrap the external tool in a module class that satisfies the dictionary contract.
Baremetal integration¶
The external tool is called directly via Python bindings or subprocess:
# TrackMate via PyImageJ
import imagej
ij = imagej.init(fiji_dir or 'sc.fiji:fiji', headless=True)
Container integration¶
For tools that have conflicting dependencies, run them inside a container and pass data via temporary files or shared memory:
import subprocess, tempfile, numpy as np
with tempfile.NamedTemporaryFile(suffix=".npy") as f:
np.save(f.name, data["image"])
subprocess.run(["apptainer", "exec", "--nv", "tool.sif", "python", "run_tool.py", f.name])
SLURM job integration¶
For tools that need their own SLURM job (e.g. multi-GPU), write a batch script template and submit via the distributed/ orchestrator:
# distributed/tracking/batch_trackmate.sbatch
#SBATCH --partition=short
#SBATCH --cpus-per-task=8
#SBATCH --mem=64G
pixi run -e imagej python distributed/tracking/batch_trackmate.py "$@"
Docker → Apptainer Migration¶
HPC clusters typically prohibit Docker (requires root). Use Apptainer (formerly Singularity) instead.
Converting a Docker image to Apptainer SIF¶
# Build Docker image locally first
docker build -f containers/Dockerfile -t cyto-gpu:latest .
# Convert to SIF (can also pull directly from Docker Hub)
apptainer build containers/images/cyto-gpu.sif docker-daemon://cyto-gpu:latest
# Or build directly from a definition file (no Docker required)
apptainer build containers/images/cyto-gpu.sif containers/apptainer/cyto-gpu.def
Running with Apptainer¶
# GPU-enabled run
apptainer exec --nv containers/images/cyto-gpu.sif \
pixi run -e gpu python scripts/benchmark/run_benchmark.py
# Interactive shell
apptainer shell --nv containers/images/cyto-gpu.sif
Build script¶
# Automated build (from repo root)
bash containers/apptainer/build.sh
The SIF path is configured via gpu_sif in scripts/benchmark/config/benchmark.def.toml. Set it in your benchmark.user.toml once built.
Dependency Management (pixi.toml)¶
⚠️ Add all new dependencies to
pixi.toml, not to a requirements file or env YAML.
# Always-required (in default env)
[dependencies]
scipy = ">=1.10"
# PyPI-only dependency
[pypi-dependencies]
my-package = ">=1.0"
# Optional: only in the cellpose feature env
[feature.cellpose.pypi-dependencies]
cellpose = ">=3.0"
After editing pixi.toml, run pixi install (or pixi install -e <env>) to rebuild. Run pixi run pytest tests/ to verify.
Dask Support¶
Prefer Dask arrays over NumPy for large datasets. Dask enables lazy evaluation — data is only read/computed when explicitly requested via .compute().
import dask.array as da
# Lazy load — no memory allocated yet
arr = da.from_array(np.zeros((10000, 2048, 2048)), chunks=(1, 2048, 2048))
# Only compute what you need
result = arr[0:10].compute()
Avoid calling .compute() inside module __call__ unless strictly necessary — let the orchestrator decide when to materialize.
Testing¶
All new modules should have unit tests in tests/:
# Run all tests
pixi run pytest tests/
# Run a specific test file
pixi run pytest tests/test_preprocessing.py -v
Tests should not require GPU access unless decorated with a skip marker:
import pytest
pytest.importorskip("torch") # skip if PyTorch not available
Documentation¶
Build the Sphinx docs locally:
cd doc/
pixi run make html
# Open: doc/_build/html/index.html
All new public classes and functions must have NumPy-style docstrings. Sphinx autodoc picks them up automatically.
Documentation Impact Classification¶
When opening a pull request or committing changes, classify the docs impact so reviewers know what to update:
Classification |
Meaning |
Required docs action |
|---|---|---|
|
Internal implementation only; no user-visible behavior changes |
No docs update required |
|
Adds a parameter, changes a default, or fixes a bug |
Update the relevant docstring and YAML example |
|
New module, new stage type, new interface, or breaking change |
Update API reference, pipeline.md stage section, and cross-links |
Minimum required updates for major docs-impact changes¶
Canonical page update — add or update the relevant section in
doc/source/(pipeline stage, setup step, etc.)API docstring — NumPy-style docstring on the class and all public methods
YAML example — add an annotated YAML snippet to
configs/pipelines/pipeline.template.yamlwith a commentCross-link check — verify all pages that reference the changed module or path still resolve correctly (run
make htmland check for warnings)Changelog entry — add a bullet to the relevant OpenSpec change tasks file if the change is tracked there
Cross-change alignment¶
Before merging, check with team members working on overlapping areas. If two people are editing the same doc page or API module at the same time, coordinate to avoid contradictory guidance or overwritten work.
Pre-merge checklist¶
Before opening a pull request for any major docs-impact change, verify:
[ ] Docs impact classified: is this none / minor / major?
[ ] Docstrings added or updated (NumPy style) on any new or changed public API
[ ] YAML example updated in
configs/pipelines/pipeline.template.yamlif a new module was added[ ]
cd doc && pixi run make htmlpasses with 0 new warnings[ ] Cross-links work: no 404s in the rendered HTML for pages you touched