cyto.utils

General utilities: label conversion, kinematics, segmentation cache, notebook config loading.

cyto.utils.config.load_db_config(configs_dir: Path | None = None) dict[source]

Load the PostgreSQL database configuration for pyCyto.

Searches upward from the current working directory (or configs_dir) for configs/db.def.toml, then overlays configs/db.user.toml if present. The user file must supply at minimum database.password.

Parameters:

configs_dir (Path, optional) – Explicit path to the configs/ directory. If None, walks upward from Path.cwd() until a directory containing db.def.toml is found.

Returns:

Merged database config with keys host, port, dbname,

user, password, and optionally admin sub-dict.

Return type:

dict

Raises:

FileNotFoundError – If db.def.toml cannot be located, or if db.user.toml does not exist (credentials are required at runtime).

Example:

from cyto.utils import load_db_config
from urllib.parse import quote_plus

db = load_db_config()
conn_str = (
    f"postgresql+psycopg2://{quote_plus(db['user'])}:"
    f"{quote_plus(db['password'])}@{db['host']}:{db['port']}/{db['dbname']}"
)
cyto.utils.config.load_notebook_config(notebooks_dir: Path | None = None) dict[source]

Load the DataOps path configuration for notebooks.

Searches upward from the current working directory (or notebooks_dir) for config.def.toml, then overlays config.user.toml if present. Returns the merged configuration dict.

Parameters:

notebooks_dir (Path, optional) – Explicit path to the notebooks/ directory containing config.def.toml. If None, the function walks upward from Path.cwd() until the file is found.

Returns:

Merged TOML config — keys match the sections in

config.def.toml (paths, dataset, datasets). Also injects _meta.notebooks_dir and _meta.output_root (resolved Path) for convenience.

Return type:

dict

Raises:

FileNotFoundError – If config.def.toml cannot be located.

DataOps retention tiers (documented in notebooks/config.def.toml):

Tier 1 — Ceph SoT: raw snapshots + promoted final artifacts (permanent) Tier 2 — output_root: figures, tables, metadata JSON (retained until promoted) Tier 3 — scratch_root: large intermediate TIFFs/arrays (volatile, delete after promotion) Tier 4 — Ephemeral: SLURM logs, tmp; auto-rotated

Example:

from cyto.utils import load_notebook_config
from pathlib import Path

cfg          = load_notebook_config()
DATA_ROOT    = Path(cfg["paths"]["data_root"])
SCRATCH_ROOT = Path(cfg["paths"]["scratch_root"])
OUTPUT_ROOT  = Path(cfg["paths"]["output_root"])  # small results
SNAPSHOT_ID  = cfg["dataset"]["snapshot_id"]
class cyto.utils.utils.ChannelMerge(weights=None, verbose=False)[source]

Bases: object

class cyto.utils.utils.ImageToLabel(verbose=True)[source]

Bases: object

cyto.utils.utils.check_gpu_memory()[source]
cyto.utils.label_to_table.extract_segment_features(image, label, frame, relabel=False, offset=0, channel='', spacing=[1, 1])[source]
cyto.utils.label_to_table.label_to_sparse(label, image=None, spacing=[1, 1], channel_name='', processes=1)[source]
cyto.utils.label_to_table.merge_dicts(x, y)[source]
cyto.utils.kinematics.cal_kinematics(tracks, x_col='x', y_col='y', frame_col='frame', track_id_col='track_id', chuck_size=2000, verbose=False)[source]
cyto.utils.kinematics.cal_msd(tracks)[source]
cyto.utils.kinematics.compute_msd_time_lag(time_lag, data)[source]
cyto.utils.kinematics.compute_msd_track_vectorized(track_data)[source]

Compute time-lagged MSD for all lags at once for a single track.

Replaces calling compute_msd_time_lag in a loop: O(T²) with vectorised numpy inner operations instead of O(T³) with pure-Python nested loops.

Parameters:

track_data (pd.DataFrame) – Single-track data with ‘frame’ and ‘displacement squared’ columns.

Returns:

(time_lags np.ndarray, msd_values np.ndarray)

Return type:

tuple

cyto.utils.seg_cache

Pickle-based segmentation cache utilities.

Cache files are named segmentation_frame_{idx:04d}_{cell_type}.pkl and live in a persistent directory that is shared across runs and ignored by git (notebooks/**/cache/).

Both the batch script and the interactive notebook import from here so the cache format is defined in one place.

cyto.utils.seg_cache.cache_exists(cache_dir, frame_idx, cell_type)[source]

Return True if a (non-failed) cache file exists for this frame.

Parameters:
  • cache_dir (str or Path)

  • frame_idx (int)

  • cell_type (str)

Return type:

bool

cyto.utils.seg_cache.get_cache_filename(cache_dir, frame_idx, cell_type)[source]

Return the canonical cache file path for a frame and cell type.

Parameters:
  • cache_dir (str or Path) – Directory containing cache files.

  • frame_idx (int) – Frame index (0-based).

  • cell_type (str) – e.g. 'cancer' or 'tcell'.

Return type:

str

cyto.utils.seg_cache.load_segmentation_cache(cache_dir, frame_idx, cell_type)[source]

Load segmentation results from a pkl file.

Parameters:
  • cache_dir (str or Path)

  • frame_idx (int)

  • cell_type (str)

Returns:

  • label (ndarray or None)

  • features (DataFrame, list, or None)

  • failed (bool or None) – None indicates the cache file does not exist or could not be read.

cyto.utils.seg_cache.save_segmentation_cache(cache_dir, frame_idx, cell_type, label, features, failed=False)[source]

Persist segmentation results to a pkl file.

Parameters:
  • cache_dir (str or Path)

  • frame_idx (int)

  • cell_type (str)

  • label (ndarray) – Segmentation label array.

  • features (DataFrame or list) – Sparse feature table (empty list for failed frames).

  • failed (bool, optional) – If True, marks this entry as a failed segmentation (default False).