Containerized Execution System

The pyCyto pipeline now supports hybrid execution, allowing tasks to run either locally (baremetal) or in isolated container environments. This enables reproducible analysis while maintaining flexibility for different deployment scenarios.

Overview

The containerized execution system provides:

  • Hybrid Execution: Mix container and baremetal execution within the same pipeline

  • Environment Isolation: Reproducible, consistent execution environments

  • Resource Management: Fine-grained control over compute resources

  • Multiple Runners: Support for Docker and Singularity containers

  • Backward Compatibility: Existing pipelines continue to work unchanged

Architecture

Key Components

  1. PipelineTask Base Class: Abstract base for all pipeline tasks with execution routing

  2. Runner System: Pluggable container execution backends (Docker, Singularity)

  3. Task Manager: Orchestrates task execution with dependency management

  4. Container Worker: Bridges host and container execution environments

  5. Resource Profiles: Configurable execution environments

Execution Flow

        graph TD
    A[Task Definition] --> B[Task Manager]
    B --> C{Execution Config?}
    C -->|Container| D[Container Runner]
    C -->|Baremetal| E[Local Execution]
    D --> F[Container Worker]
    F --> G[Task.run_baremetal]
    E --> G
    G --> H[Results]
    

Configuration

Task Definition Format

Pipeline tasks are defined using a new DAG-based YAML format:

# pipeline.container_example.yaml
tasks:
  tcell_segmentation:
    module: cyto.segmentation.cellpose.CellPose
    params:
      model_type: cyto2
      cellprob_thresh: -1
      gpu: true
      channels: [0, 0]
      batch_size: 8
      diameter: 25
      verbose: true
    dependencies: []
    tags:
      - segmentation
      - tcells

  cancer_segmentation:
    module: cyto.segmentation.cellpose.CellPose
    params:
      model_type: cyto2
      cellprob_thresh: -1
      gpu: true
      channels: [0, 0]
      batch_size: 8
      diameter: 30
      verbose: true
    dependencies: []
    tags:
      - segmentation
      - cancer

Resource Configuration

Execution environments and compute resources are defined separately:

# pipeline-resources.container_example.yaml
execution_configs:
  # Hybrid execution - segmentation in containers, features on baremetal
  hybrid:
    tcell_segmentation:
      type: container
      runner: docker
      image: cellpose:latest
      resources:
        memory: 8Gi
        gpu: 1
    
    tcell_features:
      type: baremetal  # Run on host system

  # Full container execution
  container:
    tcell_segmentation:
      type: container
      runner: docker
      image: cellpose:latest
      resources:
        memory: 8Gi
        gpu: 1
    
    tcell_features:
      type: container
      runner: docker
      image: python:3.9-slim

  # Local development (all baremetal)
  local:
    tcell_segmentation:
      type: baremetal
    tcell_features:
      type: baremetal

default_profile: hybrid

Usage

Command Line

# Run with hybrid execution (default)
python main.py -p pipelines/pipeline.container_example.yaml

# The system automatically loads the resources file:
# pipelines/pipeline-resources.container_example.yaml

Programmatic API

from cyto.segmentation.cellpose import CellPose

# Container execution
container_config = {
    "type": "container",
    "runner": "docker", 
    "image": "cellpose:latest"
}

cellpose_container = CellPose(
    model_type='cyto2',
    cellprob_thresh=-1,
    gpu=True,
    execution_config=container_config  # Enable container execution
)

# Baremetal execution (default)
cellpose_local = CellPose(
    model_type='cyto2',
    cellprob_thresh=-1,
    gpu=True
    # No execution_config = baremetal execution
)

# Execute with data
data = {"image": image_array}
result = cellpose_container(data)  # Runs in container

Jupyter Notebook Integration

The updated notebooks/segmentation/example_17.ipynb demonstrates both execution modes:

# Container execution for T cells
container_config = {
    "type": "container",
    "runner": "docker",
    "image": "cellpose:latest"
}

cellpose_container = CellPose(
    model_type='cyto2',
    cellprob_thresh=-1,
    execution_config=container_config
)

# Baremetal execution for cancer cells
cellpose_baremetal = CellPose(
    model_type='cyto2', 
    cellprob_thresh=-1
)

Container Setup

Docker (local / workstation)

# Build the CellPose container
docker build -f containers/docker/cellpose/Dockerfile -t cellpose:latest .

# Run GPU pipeline
docker run --gpus all \
    -u $(id -u):$(id -g) \
    -v $(pwd):/workspace \
    -v /path/to/data:/data \
    --rm -it \
    cellpose:latest bash

Apptainer / Singularity (HPC cluster)

HPC clusters do not permit Docker (requires root). Use Apptainer instead.

# Build SIF from the existing Docker image
apptainer build containers/images/cyto-gpu.sif \
    containers/apptainer/cyto-gpu.def

# Run GPU pipeline
apptainer exec --nv \
    --bind $(pwd):/workspace \
    --bind /path/to/data:/data \
    containers/images/cyto-gpu.sif \
    pixi run -e gpu python scripts/benchmark/run_benchmark.py

# Interactive shell for debugging
apptainer shell --nv containers/images/cyto-gpu.sif

Side-by-Side Comparison

Feature

Docker

Apptainer

Root required

Yes (daemon)

No

HPC clusters

Rarely available

Standard choice

GPU pass-through

--gpus all

--nv

Bind-mount data

-v host:container

--bind host:container

Run command

docker run --rm -it IMAGE CMD

apptainer exec SIF CMD

Build source

Dockerfile

.def file or pull from Docker registry

Image format

Layer tarballs

Single .sif file

Image location

containers/docker/

containers/apptainer/

Both use the same underlying algorithm code — only the execution wrapper changes.

Custom Containers

For custom containers, ensure they:

  1. Have the cyto package installed

  2. Include all required dependencies

  3. Can run python -m cyto.runners.container_worker

Execution Profiles

Available Profiles

  1. hybrid (default): Segmentation in containers, features on baremetal

  2. container: All tasks in containers

  3. local: All tasks on baremetal (development)

  4. cluster: HPC with Singularity containers

Profile Selection

Profiles are selected automatically based on:

  1. execution_profile in the pipeline YAML

  2. default_profile in the resources YAML

  3. System default (hybrid)

Advanced Configuration

HPC/Cluster Deployment

# For Singularity on HPC systems
cluster:
  tcell_segmentation:
    type: container
    runner: singularity
    image: cellpose.sif
    resources:
      mem: 32G
      partition: gpu_short
      cpus-per-task: 16
      gres: gpu:1
      batch_size: 100

Cloud Deployment

# For cloud environments
cloud:
  tcell_segmentation:
    type: container
    runner: docker
    image: your-registry/cellpose:v1.0
    resources:
      memory: 16Gi
      gpu: 1
      node_selector:
        accelerator: nvidia-tesla-v100

Migration Guide

From Legacy Pipelines

Existing YAML pipelines continue to work unchanged:

# Legacy format still supported
python main.py -p pipelines/legacy_pipeline.yaml

The system automatically detects the format and routes appropriately.

Gradual Migration

  1. Phase 1: Test new format alongside existing pipelines

  2. Phase 2: Migrate compute-intensive tasks to containers

  3. Phase 3: Full migration to new DAG-based format

Apptainer / Singularity (HPC)

HPC clusters do not allow Docker (requires root). Use Apptainer (formerly Singularity) instead. pyCyto provides Apptainer definition files under containers/apptainer/.

Build the SIF

# Automated build script (run from repo root)
bash containers/apptainer/build.sh

# Or manually
apptainer build containers/images/cyto-gpu.sif \
    containers/apptainer/cyto-gpu.def

Run with Apptainer

# GPU-enabled execution
apptainer exec --nv containers/images/cyto-gpu.sif \
    pixi run -e gpu python scripts/benchmark/run_benchmark.py

# Bind-mount data from scratch
apptainer exec --nv \
    --bind /scratch/data:/data \
    containers/images/cyto-gpu.sif \
    pixi run -e gpu python my_script.py

# Interactive shell for debugging
apptainer shell --nv containers/images/cyto-gpu.sif

Configure SIF path

Set gpu_sif in scripts/benchmark/benchmark.user.toml after building:

[containers]
gpu_sif = "containers/images/cyto-gpu.sif"

Leave empty ("") to use bare-metal pixi environments directly.

See the Developer Guide → Docker → Apptainer for full build instructions.


Troubleshooting

Common Issues

Container Not Found:

# Build the required container
docker build -f containers/segmentation/cellpose/Dockerfile -t cellpose:latest .

Docker Permission Issues:

# Add user to docker group (Linux)
sudo usermod -aG docker $USER
# Logout and login again

Resource Constraints:

# Reduce resource requirements
resources:
  memory: 4Gi  # Reduced from 8Gi
  gpu: 0       # Disable GPU if not available