Interface Guide

pyCyto exposes the same pipeline through three interfaces. Choose the one that matches your workflow:

Interface

Best for

Entry point

Jupyter notebooks

Exploration, parameter tuning, interactive QC

notebooks/

YAML + CLI

Automated reproducible runs, CI, scripting

cyto --pipeline

SLURM distributed

Production scale, HPC cluster, large datasets

distributed/submit_batch_jobs.py

All three interfaces share the same underlying cyto modules and the same dictionary-based data contract.


Interface 1: Jupyter Notebooks

Notebooks are the primary way to explore data, tune parameters, and inspect intermediate results interactively.

Location

notebooks/
├── 01_preprocessing/   # normalization, registration
├── 02_segmentation/    # Cellpose, StarDist examples
├── 03_tracking/        # TrackMate, trackpy
├── 04_postprocessing/  # contact tracing, kinematics
├── 05_analysis/        # statistics, survival curves
└── 06_spatiomics/      # cell network triangulation

Starting notebooks

# Local (CPU)
pixi run jupyter

# On HPC compute node
srun -p gpu_short --gres=gpu:1 --cpus-per-task=4 --mem=16G --time=04:00:00 --pty bash
pixi run -e gpu jupyter

See Cluster Jupyter Setup for the full SSH tunnel workflow.

Configuration

All notebooks load configuration via:

from cyto.utils import load_notebook_config
cfg = load_notebook_config()   # loads config.def.toml + config.user.toml

Override paths locally in notebooks/config.user.toml (gitignored) — never hardcode paths in notebooks.

Handoff to CLI

Once parameters are validated interactively, export them to a pipeline YAML and run non-interactively:

pixi run cyto --pipeline pipelines/my_validated_pipeline.yaml -v

Interface 2: YAML + CLI

The CLI interface runs a complete pipeline end-to-end from a declarative YAML configuration file. Suitable for reproducible automated runs, CI validation, and batch reprocessing.

Running

# From repo root
pixi run cyto --pipeline pipelines/pipeline.yaml -v

# Or directly
pixi run python main.py --pipeline pipelines/pipeline.yaml -v

Pipeline YAML structure

description: My cytotoxicity analysis
channels:
  TCell: /data/tcell_ch0.tif
  Cancer: /data/cancer_ch1.tif

image_range:
  t: [0, 100, 1]     # frames 0–99
spacing: [0.83, 0.83]  # microns/pixel

pipeline:
  preprocessing:
    - name: PercentileNormalization
      channels: all
      args: {lp: 5, up: 95}

  segmentation:
    - name: CellPose
      channels: [TCell, Cancer]
      args: {model_type: cyto2, diameter: 25}

  tracking:
    - name: TrackMate
      channels: [TCell]

Dask dashboard

During execution, monitor task progress at http://localhost:8787/status.

Handoff to SLURM

For large datasets (hundreds of frames, multiple patches), hand off to distributed execution:

pixi run python distributed/submit_batch_jobs.py \
    --pipeline pipelines/pipeline.yaml \
    --resources pipelines/pipeline-resources.yaml -v

Interface 3: SLURM Distributed Execution

The distributed interface parallelizes the pipeline across SLURM jobs — splitting frames into batches and spatial FOV into patches.

Architecture

submit_batch_jobs.py
├── Preprocessing jobs (array: one job per batch of frames)
├── Segmentation jobs (array: one job per batch × patch)
├── Merge job (collects patch outputs)
├── Tracking job (array: one job per patch)
└── Postprocessing job (array: one job per patch)

Resources YAML

Each stage specifies its own SLURM resources:

pipeline:
  preprocessing:
    normalization:
      partition: short
      mem: 16G
      time: 01:00:00
      batch_size: 50

  segmentation:
    cellpose:
      partition: gpu_short
      gres: gpu:a100-pcie-40gb:1
      mem: 32G
      time: 02:00:00
      batch_size: 10
      dependency: [normalization]

Submitting

pixi run python distributed/submit_batch_jobs.py \
    --pipeline my_pipeline.yaml \
    --resources my_pipeline-resources.yaml \
    -v

Monitoring

squeue --me
tail -f output/log/<stage>/<tag>/patch_XX/<jobid>.out

Handoff back to notebooks

After distributed runs complete, load the merged results in a notebook for QC and analysis:

import pandas as pd
tracks = pd.read_csv("output/tracking/merged_tracks.csv")

Choosing an Interface

New dataset / parameter exploration → Notebook
Reproducible single-node run       → YAML + CLI
Large dataset (>100 frames / FOV)  → SLURM distributed
Benchmark / throughput profiling   → scripts/benchmark/run_benchmark.py

The three interfaces are composable: develop in notebooks, automate with CLI, scale with SLURM — all using the same pipeline YAML and cyto module classes.