Dev/better tmux running by Erotemic · Pull Request #6 · Kitware/cmd_queue

Erotemic · 2026-05-08T22:02:43Z

No description provided.

Adds cmd_queue.monitor_manifest, which lets a queue's run state be serialized to disk and reloaded by an out-of-process monitor. Each queue subclass now has _build_monitor_manifest, _write_monitor_manifest, and _from_manifest hooks so monitor() and kill() can be invoked on a queue rebuilt from the manifest alone (no jobs resubmitted). This is groundwork for letting the monitor live in its own tmux session that survives the parent shell. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Each backend now persists its monitor manifest at the start of run(), which makes the queue reattachable from a separate process. Also preserves the user-supplied SlurmQueue name on self.name (previously dropped after queue_id was constructed) so that name-based monitor lookup works for both queue_id and the friendly name. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Reattaches to a running queue by name (via the active-index that run() populates), by manifest path (--manifest), or by dpath. This is the entry point that step 3's tmux monitor backend will execute inside its own tmux session, and is also useful on its own when the original run() shell has been closed but workers are still active. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Both TMUXMultiQueue.monitor and SlurmQueue.monitor now accept onfail/onexit kwargs and perform the corresponding kill()/capture() themselves. run() simply forwards the args. This way the same finalization happens whether the monitor runs inline, in a separate tmux session (step 3), or via `cmd_queue monitor` from another shell. The semantics are preserved (onfail='kill' tears down idle tmux sessions only on a clean exit; on slurm it fires only on failure). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The new ``monitor`` kwarg on TMUXMultiQueue.run and SlurmQueue.run controls where the live status UI runs while the queue is executing. Default is ``'inline'`` (current behavior). With ``'tmux'``, the monitor is spawned in a detached tmux session via the new ``util_tmux.tmux.spawn_monitor_session`` helper, which invokes ``cmd_queue monitor --manifest=<path>`` under sys.executable. The parent process still blocks on a headless state poll, so block=True keeps its meaning even when the visible UI lives elsewhere — closing or detaching the tmux UI does not return control early. The tmux monitor session intentionally outlives the workers: workers self-clean on success, so the monitor session is what holds the final status table open for the user to read. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Single-file example covering monitor='inline', 'tmux', and 'none'. Useful as both a hands-on demo for users and a smoke test that the new monitor backend works against a small real DAG. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The parent shell no longer pulls the user's tty into the spawned monitor session. Instead, after spawning the monitor it prints a prompt explaining how to attach (or switch-client when already inside tmux) and a manual reattach hint, then enters a cbreak keypress loop: [a] attach (or switch-client) to the monitor session — user can detach with the usual binding and we re-enter the loop. [q]/[d] stop watching from this shell (queue keeps running). Non-TTY stdin falls back to a silent polling loop, so the path remains usable in scripts and CI. Also drops the synthetic "press enter to close" prompt at the end of the monitor pane in favour of `exec bash`, so the pane stays open without needing user input but doesn't trap the user behind a read. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ple DAG After block_with_attach_prompt and _headless_block_until_done exit, print a Rich-formatted summary line showing pass/fail/skip/total so the user gets a clear completion signal in their original shell. The tmux_example DAG is expanded to 11 jobs across 4 dependency levels (prep → proc → merge → final) with 4 workers and 2-8s sleeps, making parallel execution and dependency fan-in clearly visible in the monitor. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…bled When the queue finishes with failures, list each failing job by name along with its log file path (if log=True was passed). For any failed job that doesn't have a log on disk, emit a single hint that logs were not enabled — so the user knows where the gap is rather than seeing the same hint repeated per job. Hooked into the inline monitor path as well so all three monitor modes (inline, tmux, none) produce the same summary. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Add --failures (default 1) to the tmux example so the failure summary and dependency-skip cascade are visible by default. The first N proc-* jobs exit non-zero, which causes their downstream merge/final jobs to be skipped. Pass --failures=0 for a clean run. Also enable log capture by default (--no-logs to disable) so the failed-job log paths printed by the new done-summary actually exist. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Render a 'Failed jobs' table directly below the per-worker status table while the queue is still running, so failures are visible the moment they happen rather than only in the post-run summary. Each row shows the job name and its log path (or '(no log)' when log capture wasn't enabled); a one-line note reminds the user to pass log=True if any failed jobs lack a log on disk. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…es failures When the monitor is rehydrated via cmd_queue monitor --manifest=..., the reconstructed workers had empty .jobs lists, so the failed-jobs panel and post-run summary couldn't surface any failing job names — even when fail markers existed on disk. Serialize each job's name, log flag, and fail/log paths into the manifest, and rebuild lightweight SimpleNamespace stubs on each reconstructed worker. Enough surface for the failure renderer; we don't need the full BashJob since the monitor never re-runs anything. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… monitor The bash boilerplate generated by BashJob.finalize_text ran an unconditional if-RC-0-on_pass-else-on_fail after the deps-check, so a skipped job (RC=126, on_skip already ran) ALSO had fail_fpath written and NUM_FAILED incremented. The status agg therefore showed a skipped+failed double-count. Fix: * Add a skip_fpath marker (printed by the on_skip block). * Make the post-RC dispatch 3-way: on_pass for RC=0, no-op for RC=126, on_fail otherwise. Monitor: * Carry skip_fpath and dependency names in the rehydration manifest. * Replace single failed panel with Failed + Skipped tables; skipped rows show a reason like dep X failed. * Same split applied to the post-run summary. Update tests/test_bash_variants.py: the prior test asserted the bug. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Asserts that Queue.submit(..., log=True) lands on BashJob.log and that the rendered command section gets the expected ``(<cmd>) 2>&1 | tee <log_fpath>`` wrapper. Also covers log=False and the current default (False). Catches a regression class where ``submit`` drops or shadows the ``log`` kwarg without other tests noticing — log files would just silently stop being written. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Press [a] from the live status table to attach (or switch-client) to a detached cmd_queue monitor tmux session running alongside; [q] stops watching while the queue keeps running. The side session is killed when the inline monitor exits. Reorganizes the monitor mode taxonomy so each value names a single intent: 'hybrid' (default) for inline+tmux, 'inline' for current-shell only, 'tmux' for detached-only, 'none' for headless block. 'hybrid' warns and falls back to inline when tmux is unavailable. Wires the [a] keybind into both the simple-rich (rich.Live + cbreak) and textual monitor paths, mirrors the new mode through the slurm backend, and adds plumbing-layer tests that mock the tmux helpers so the suite runs without a tmux server. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Erotemic and others added 27 commits April 30, 2026 17:10

manual example updates

435ceab

Add type and ruff configs

60837a0

Update tests. Drop 3.9 support

45fa370

Update xcookie

7bdf8ac

Add monitor to boilerplate cli

85df7c6

Use pytest 8+ everywhere

96bb14c

Fix type errors

2736398

Fix type errors

b1c3869

Fix types

18b2b03

Ruff format

96565f0

Ruff check fix

5355ca0

Tweaks

725f3f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev/better tmux running#6

Dev/better tmux running#6
Erotemic wants to merge 27 commits intomainfrom
dev/better_tmux_running

Erotemic commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Erotemic commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant