Interface Guide¶
pyCyto exposes the same pipeline through three interfaces. Choose the one that matches your workflow:
Interface |
Best for |
Entry point |
|---|---|---|
Jupyter notebooks |
Exploration, parameter tuning, interactive QC |
|
YAML + CLI |
Automated reproducible runs, CI, scripting |
|
SLURM distributed |
Production scale, HPC cluster, large datasets |
|
All three interfaces share the same underlying cyto modules and the same dictionary-based data contract.
Interface 1: Jupyter Notebooks¶
Notebooks are the primary way to explore data, tune parameters, and inspect intermediate results interactively.
Location¶
notebooks/
├── 01_preprocessing/ # normalization, registration
├── 02_segmentation/ # Cellpose, StarDist examples
├── 03_tracking/ # TrackMate, trackpy
├── 04_postprocessing/ # contact tracing, kinematics
├── 05_analysis/ # statistics, survival curves
└── 06_spatiomics/ # cell network triangulation
Starting notebooks¶
# Local (CPU)
pixi run jupyter
# On HPC compute node
srun -p gpu_short --gres=gpu:1 --cpus-per-task=4 --mem=16G --time=04:00:00 --pty bash
pixi run -e gpu jupyter
See Cluster Jupyter Setup for the full SSH tunnel workflow.
Configuration¶
All notebooks load configuration via:
from cyto.utils import load_notebook_config
cfg = load_notebook_config() # loads config.def.toml + config.user.toml
Override paths locally in notebooks/config.user.toml (gitignored) — never hardcode paths in notebooks.
Handoff to CLI¶
Once parameters are validated interactively, export them to a pipeline YAML and run non-interactively:
pixi run cyto --pipeline pipelines/my_validated_pipeline.yaml -v
Interface 2: YAML + CLI¶
The CLI interface runs a complete pipeline end-to-end from a declarative YAML configuration file. Suitable for reproducible automated runs, CI validation, and batch reprocessing.
Running¶
# From repo root
pixi run cyto --pipeline pipelines/pipeline.yaml -v
# Or directly
pixi run python main.py --pipeline pipelines/pipeline.yaml -v
Pipeline YAML structure¶
description: My cytotoxicity analysis
channels:
TCell: /data/tcell_ch0.tif
Cancer: /data/cancer_ch1.tif
image_range:
t: [0, 100, 1] # frames 0–99
spacing: [0.83, 0.83] # microns/pixel
pipeline:
preprocessing:
- name: PercentileNormalization
channels: all
args: {lp: 5, up: 95}
segmentation:
- name: CellPose
channels: [TCell, Cancer]
args: {model_type: cyto2, diameter: 25}
tracking:
- name: TrackMate
channels: [TCell]
Dask dashboard¶
During execution, monitor task progress at http://localhost:8787/status.
Handoff to SLURM¶
For large datasets (hundreds of frames, multiple patches), hand off to distributed execution:
pixi run python distributed/submit_batch_jobs.py \
--pipeline pipelines/pipeline.yaml \
--resources pipelines/pipeline-resources.yaml -v
Interface 3: SLURM Distributed Execution¶
The distributed interface parallelizes the pipeline across SLURM jobs — splitting frames into batches and spatial FOV into patches.
Architecture¶
submit_batch_jobs.py
├── Preprocessing jobs (array: one job per batch of frames)
├── Segmentation jobs (array: one job per batch × patch)
├── Merge job (collects patch outputs)
├── Tracking job (array: one job per patch)
└── Postprocessing job (array: one job per patch)
Resources YAML¶
Each stage specifies its own SLURM resources:
pipeline:
preprocessing:
normalization:
partition: short
mem: 16G
time: 01:00:00
batch_size: 50
segmentation:
cellpose:
partition: gpu_short
gres: gpu:a100-pcie-40gb:1
mem: 32G
time: 02:00:00
batch_size: 10
dependency: [normalization]
Submitting¶
pixi run python distributed/submit_batch_jobs.py \
--pipeline my_pipeline.yaml \
--resources my_pipeline-resources.yaml \
-v
Monitoring¶
squeue --me
tail -f output/log/<stage>/<tag>/patch_XX/<jobid>.out
Handoff back to notebooks¶
After distributed runs complete, load the merged results in a notebook for QC and analysis:
import pandas as pd
tracks = pd.read_csv("output/tracking/merged_tracks.csv")
Choosing an Interface¶
New dataset / parameter exploration → Notebook
Reproducible single-node run → YAML + CLI
Large dataset (>100 frames / FOV) → SLURM distributed
Benchmark / throughput profiling → scripts/benchmark/run_benchmark.py
The three interfaces are composable: develop in notebooks, automate with CLI, scale with SLURM — all using the same pipeline YAML and cyto module classes.