Containerized Execution System¶
The pyCyto pipeline now supports hybrid execution, allowing tasks to run either locally (baremetal) or in isolated container environments. This enables reproducible analysis while maintaining flexibility for different deployment scenarios.
Overview¶
The containerized execution system provides:
Hybrid Execution: Mix container and baremetal execution within the same pipeline
Environment Isolation: Reproducible, consistent execution environments
Resource Management: Fine-grained control over compute resources
Multiple Runners: Support for Docker and Singularity containers
Backward Compatibility: Existing pipelines continue to work unchanged
Architecture¶
Key Components¶
PipelineTask Base Class: Abstract base for all pipeline tasks with execution routing
Runner System: Pluggable container execution backends (Docker, Singularity)
Task Manager: Orchestrates task execution with dependency management
Container Worker: Bridges host and container execution environments
Resource Profiles: Configurable execution environments
Execution Flow¶
graph TD
A[Task Definition] --> B[Task Manager]
B --> C{Execution Config?}
C -->|Container| D[Container Runner]
C -->|Baremetal| E[Local Execution]
D --> F[Container Worker]
F --> G[Task.run_baremetal]
E --> G
G --> H[Results]
Configuration¶
Task Definition Format¶
Pipeline tasks are defined using a new DAG-based YAML format:
# pipeline.container_example.yaml
tasks:
tcell_segmentation:
module: cyto.segmentation.cellpose.CellPose
params:
model_type: cyto2
cellprob_thresh: -1
gpu: true
channels: [0, 0]
batch_size: 8
diameter: 25
verbose: true
dependencies: []
tags:
- segmentation
- tcells
cancer_segmentation:
module: cyto.segmentation.cellpose.CellPose
params:
model_type: cyto2
cellprob_thresh: -1
gpu: true
channels: [0, 0]
batch_size: 8
diameter: 30
verbose: true
dependencies: []
tags:
- segmentation
- cancer
Resource Configuration¶
Execution environments and compute resources are defined separately:
# pipeline-resources.container_example.yaml
execution_configs:
# Hybrid execution - segmentation in containers, features on baremetal
hybrid:
tcell_segmentation:
type: container
runner: docker
image: cellpose:latest
resources:
memory: 8Gi
gpu: 1
tcell_features:
type: baremetal # Run on host system
# Full container execution
container:
tcell_segmentation:
type: container
runner: docker
image: cellpose:latest
resources:
memory: 8Gi
gpu: 1
tcell_features:
type: container
runner: docker
image: python:3.9-slim
# Local development (all baremetal)
local:
tcell_segmentation:
type: baremetal
tcell_features:
type: baremetal
default_profile: hybrid
Usage¶
Command Line¶
# Run with hybrid execution (default)
python main.py -p pipelines/pipeline.container_example.yaml
# The system automatically loads the resources file:
# pipelines/pipeline-resources.container_example.yaml
Programmatic API¶
from cyto.segmentation.cellpose import CellPose
# Container execution
container_config = {
"type": "container",
"runner": "docker",
"image": "cellpose:latest"
}
cellpose_container = CellPose(
model_type='cyto2',
cellprob_thresh=-1,
gpu=True,
execution_config=container_config # Enable container execution
)
# Baremetal execution (default)
cellpose_local = CellPose(
model_type='cyto2',
cellprob_thresh=-1,
gpu=True
# No execution_config = baremetal execution
)
# Execute with data
data = {"image": image_array}
result = cellpose_container(data) # Runs in container
Jupyter Notebook Integration¶
The updated notebooks/segmentation/example_17.ipynb demonstrates both execution modes:
# Container execution for T cells
container_config = {
"type": "container",
"runner": "docker",
"image": "cellpose:latest"
}
cellpose_container = CellPose(
model_type='cyto2',
cellprob_thresh=-1,
execution_config=container_config
)
# Baremetal execution for cancer cells
cellpose_baremetal = CellPose(
model_type='cyto2',
cellprob_thresh=-1
)
Container Setup¶
Docker (local / workstation)¶
# Build the CellPose container
docker build -f containers/docker/cellpose/Dockerfile -t cellpose:latest .
# Run GPU pipeline
docker run --gpus all \
-u $(id -u):$(id -g) \
-v $(pwd):/workspace \
-v /path/to/data:/data \
--rm -it \
cellpose:latest bash
Apptainer / Singularity (HPC cluster)¶
HPC clusters do not permit Docker (requires root). Use Apptainer instead.
# Build SIF from the existing Docker image
apptainer build containers/images/cyto-gpu.sif \
containers/apptainer/cyto-gpu.def
# Run GPU pipeline
apptainer exec --nv \
--bind $(pwd):/workspace \
--bind /path/to/data:/data \
containers/images/cyto-gpu.sif \
pixi run -e gpu python scripts/benchmark/run_benchmark.py
# Interactive shell for debugging
apptainer shell --nv containers/images/cyto-gpu.sif
Side-by-Side Comparison¶
Feature |
Docker |
Apptainer |
|---|---|---|
Root required |
Yes (daemon) |
No |
HPC clusters |
Rarely available |
Standard choice |
GPU pass-through |
|
|
Bind-mount data |
|
|
Run command |
|
|
Build source |
|
|
Image format |
Layer tarballs |
Single |
Image location |
|
|
Both use the same underlying algorithm code — only the execution wrapper changes.
Custom Containers¶
For custom containers, ensure they:
Have the
cytopackage installedInclude all required dependencies
Can run
python -m cyto.runners.container_worker
Execution Profiles¶
Available Profiles¶
hybrid (default): Segmentation in containers, features on baremetal
container: All tasks in containers
local: All tasks on baremetal (development)
cluster: HPC with Singularity containers
Profile Selection¶
Profiles are selected automatically based on:
execution_profilein the pipeline YAMLdefault_profilein the resources YAMLSystem default (
hybrid)
Advanced Configuration¶
HPC/Cluster Deployment¶
# For Singularity on HPC systems
cluster:
tcell_segmentation:
type: container
runner: singularity
image: cellpose.sif
resources:
mem: 32G
partition: gpu_short
cpus-per-task: 16
gres: gpu:1
batch_size: 100
Cloud Deployment¶
# For cloud environments
cloud:
tcell_segmentation:
type: container
runner: docker
image: your-registry/cellpose:v1.0
resources:
memory: 16Gi
gpu: 1
node_selector:
accelerator: nvidia-tesla-v100
Migration Guide¶
From Legacy Pipelines¶
Existing YAML pipelines continue to work unchanged:
# Legacy format still supported
python main.py -p pipelines/legacy_pipeline.yaml
The system automatically detects the format and routes appropriately.
Gradual Migration¶
Phase 1: Test new format alongside existing pipelines
Phase 2: Migrate compute-intensive tasks to containers
Phase 3: Full migration to new DAG-based format
Apptainer / Singularity (HPC)¶
HPC clusters do not allow Docker (requires root). Use Apptainer (formerly Singularity) instead. pyCyto provides Apptainer definition files under containers/apptainer/.
Build the SIF¶
# Automated build script (run from repo root)
bash containers/apptainer/build.sh
# Or manually
apptainer build containers/images/cyto-gpu.sif \
containers/apptainer/cyto-gpu.def
Run with Apptainer¶
# GPU-enabled execution
apptainer exec --nv containers/images/cyto-gpu.sif \
pixi run -e gpu python scripts/benchmark/run_benchmark.py
# Bind-mount data from scratch
apptainer exec --nv \
--bind /scratch/data:/data \
containers/images/cyto-gpu.sif \
pixi run -e gpu python my_script.py
# Interactive shell for debugging
apptainer shell --nv containers/images/cyto-gpu.sif
Configure SIF path¶
Set gpu_sif in scripts/benchmark/benchmark.user.toml after building:
[containers]
gpu_sif = "containers/images/cyto-gpu.sif"
Leave empty ("") to use bare-metal pixi environments directly.
See the Developer Guide → Docker → Apptainer for full build instructions.
Troubleshooting¶
Common Issues¶
Container Not Found:
# Build the required container
docker build -f containers/segmentation/cellpose/Dockerfile -t cellpose:latest .
Docker Permission Issues:
# Add user to docker group (Linux)
sudo usermod -aG docker $USER
# Logout and login again
Resource Constraints:
# Reduce resource requirements
resources:
memory: 4Gi # Reduced from 8Gi
gpu: 0 # Disable GPU if not available