Boilerplate Templates

Copy-pasteable templates for the most common extension patterns in pyCyto.

        graph LR
    A[Define class in cyto/stage/] --> B[Register import in main.py]
    B --> C[Add to pipeline YAML]
    C --> D[Add resource spec YAML]
    D --> E[pixi install -e env]

    classDef templateStep fill:#0d7377,color:#fff,stroke:#0a5c60
    class A,B,C,D,E templateStep
    

1. Compute Node Class

Every pipeline stage is a callable class following the dict I/O contract. Copy the variant that matches your stage type.

Preprocessing node (Image → Image)

from tqdm import tqdm


class MyPreprocessing:
    def __init__(self, param_a=1.0, verbose=True):
        self.name = "MyPreprocessing"
        self.param_a = param_a
        self.verbose = verbose

    def __call__(self, data: dict) -> dict:
        image = data["image"]          # dask or numpy array (T, Y, X) or (T, Z, Y, X)

        if self.verbose:
            tqdm.write(f"[{self.name}] processing {image.shape}")

        result = image * self.param_a  # replace with your algorithm

        return {"image": result}

Segmentation node (Image → Label)

class MySegmentation:
    def __init__(self, threshold=0.5, verbose=True):
        self.name = "MySegmentation"
        self.threshold = threshold
        self.verbose = verbose

    def __call__(self, data: dict) -> dict:
        image = data["image"]

        if self.verbose:
            tqdm.write(f"[{self.name}] segmenting {image.shape}")

        labels = (image > self.threshold).astype("uint32")  # replace with real model

        return {"label": labels}

Postprocessing node (Image + Label + DataFrame → any)

class MyAnalysis:
    def __init__(self, verbose=True):
        self.name = "MyAnalysis"
        self.verbose = verbose

    def __call__(self, data: dict) -> dict:
        image = data.get("image")
        label = data.get("label")
        df    = data.get("dataframe")

        if self.verbose:
            tqdm.write(f"[{self.name}] analysing {len(df)} detections")

        # your analysis here
        result_df = df.copy()

        return {"dataframe": result_df}

2. Pipeline YAML Snippet

Add a new stage to your pipeline YAML:

pipeline:
  postprocessing:

    - name: MyAnalysis           # class name — must match Python class
      tag: MyAnalysisTag         # unique identifier; used in logs and resource config
      channels: [TCell]          # channel names to pass to this stage
      input_type: [image, label, feature]
      output_type: [feature]
      args:
        verbose: true            # passed as kwargs to __init__
      output: true               # write result to output_dir/postprocessing/<tag>/

3. Resource YAML Compute Spec Entries

Add a matching entry in configs/distributed/pipeline-resources.yaml:

# ── CPU baremetal (no GPU needed) ─────────────────────────────────────────────
pipeline:
  postprocessing:
    MyAnalysisTag:               # must match tag in pipeline YAML
      partition: short
      cpus-per-task: 4
      mem: 16G
      time: "01:00:00"
      batch_size: 50
      dependency: [Cellpose_TCell]
      dependency_type: afterok

# ── GPU stage ─────────────────────────────────────────────────────────────────
    MyGpuAnalysisTag:
      partition: gpu_short
      gres: gpu:a100-pcie-40gb:1
      cpus-per-task: 4
      mem: 32G
      time: "04:00:00"
      batch_size: 100
      dependency: singleton
      dependency_type: afterok

# ── Apptainer container ────────────────────────────────────────────────────────
    MyContainerTag:
      partition: gpu_short
      gres: gpu:a100-pcie-40gb:1
      cpus-per-task: 4
      mem: 32G
      container: containers/images/cyto-gpu.sif   # path to SIF
      dependency: [Cellpose_TCell]
      dependency_type: afterok

4. pixi.toml Feature Dependency Block

Add an optional package as a named pixi feature so it does not pollute the default environment:

# In pixi.toml — add a new feature section
[feature.myalgorithm.dependencies]
python     = ">=3.10"
mypackage  = ">=1.2"           # replace with real conda/PyPI package name

[feature.myalgorithm.pypi-dependencies]
my-pypi-package = ">=0.5"

# Wire the feature into an environment
[environments]
myalgorithm = { features = ["myalgorithm"], solve-group = "cpu" }

Install with pixi install -e myalgorithm.


5. Provenance-Aware Output Writer Stub

For modules that write files directly (rather than returning a dict):

import json
from pathlib import Path
from datetime import datetime, timezone


def write_provenance(output_dir: Path, params: dict, git_sha: str = "") -> None:
    """Write a JSON provenance sidecar next to module outputs."""
    record = {
        "generated_at": datetime.now(timezone.utc).isoformat(),
        "git_sha":       git_sha,
        "params":        params,
    }
    (output_dir / "provenance.json").write_text(
        json.dumps(record, indent=2)
    )


class MyFileWriter:
    def __init__(self, output_dir: str, verbose=True):
        self.name       = "MyFileWriter"
        self.output_dir = Path(output_dir)
        self.verbose    = verbose

    def __call__(self, data: dict) -> dict:
        self.output_dir.mkdir(parents=True, exist_ok=True)

        df = data["dataframe"]
        out_path = self.output_dir / "results.csv"
        df.to_csv(out_path, index=False)

        write_provenance(self.output_dir, params={"module": self.name})

        if self.verbose:
            tqdm.write(f"[{self.name}] wrote {out_path}")

        return {"dataframe": df}

See Also