Skip to content

Dev/better tmux running#6

Open
Erotemic wants to merge 27 commits intomainfrom
dev/better_tmux_running
Open

Dev/better tmux running#6
Erotemic wants to merge 27 commits intomainfrom
dev/better_tmux_running

Conversation

@Erotemic
Copy link
Copy Markdown
Member

@Erotemic Erotemic commented May 8, 2026

No description provided.

Erotemic and others added 27 commits April 30, 2026 17:10
Adds cmd_queue.monitor_manifest, which lets a queue's run state be
serialized to disk and reloaded by an out-of-process monitor. Each
queue subclass now has _build_monitor_manifest, _write_monitor_manifest,
and _from_manifest hooks so monitor() and kill() can be invoked on a
queue rebuilt from the manifest alone (no jobs resubmitted).

This is groundwork for letting the monitor live in its own tmux session
that survives the parent shell.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Each backend now persists its monitor manifest at the start of run(),
which makes the queue reattachable from a separate process. Also
preserves the user-supplied SlurmQueue name on self.name (previously
dropped after queue_id was constructed) so that name-based monitor
lookup works for both queue_id and the friendly name.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reattaches to a running queue by name (via the active-index that run()
populates), by manifest path (--manifest), or by dpath. This is the
entry point that step 3's tmux monitor backend will execute inside its
own tmux session, and is also useful on its own when the original
run() shell has been closed but workers are still active.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Both TMUXMultiQueue.monitor and SlurmQueue.monitor now accept
onfail/onexit kwargs and perform the corresponding kill()/capture()
themselves. run() simply forwards the args. This way the same
finalization happens whether the monitor runs inline, in a separate
tmux session (step 3), or via `cmd_queue monitor` from another shell.

The semantics are preserved (onfail='kill' tears down idle tmux
sessions only on a clean exit; on slurm it fires only on failure).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The new ``monitor`` kwarg on TMUXMultiQueue.run and SlurmQueue.run
controls where the live status UI runs while the queue is executing.
Default is ``'inline'`` (current behavior). With ``'tmux'``, the
monitor is spawned in a detached tmux session via the new
``util_tmux.tmux.spawn_monitor_session`` helper, which invokes
``cmd_queue monitor --manifest=<path>`` under sys.executable. The
parent process still blocks on a headless state poll, so block=True
keeps its meaning even when the visible UI lives elsewhere — closing
or detaching the tmux UI does not return control early.

The tmux monitor session intentionally outlives the workers: workers
self-clean on success, so the monitor session is what holds the
final status table open for the user to read.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Single-file example covering monitor='inline', 'tmux', and 'none'.
Useful as both a hands-on demo for users and a smoke test that the
new monitor backend works against a small real DAG.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The parent shell no longer pulls the user's tty into the spawned
monitor session. Instead, after spawning the monitor it prints a
prompt explaining how to attach (or switch-client when already
inside tmux) and a manual reattach hint, then enters a cbreak
keypress loop:

    [a] attach (or switch-client) to the monitor session — user can
        detach with the usual binding and we re-enter the loop.
    [q]/[d] stop watching from this shell (queue keeps running).

Non-TTY stdin falls back to a silent polling loop, so the path
remains usable in scripts and CI.

Also drops the synthetic "press enter to close" prompt at the end of
the monitor pane in favour of `exec bash`, so the pane stays open
without needing user input but doesn't trap the user behind a read.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ple DAG

After block_with_attach_prompt and _headless_block_until_done exit, print
a Rich-formatted summary line showing pass/fail/skip/total so the user
gets a clear completion signal in their original shell.

The tmux_example DAG is expanded to 11 jobs across 4 dependency levels
(prep → proc → merge → final) with 4 workers and 2-8s sleeps, making
parallel execution and dependency fan-in clearly visible in the monitor.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…bled

When the queue finishes with failures, list each failing job by name
along with its log file path (if log=True was passed). For any failed
job that doesn't have a log on disk, emit a single hint that logs were
not enabled — so the user knows where the gap is rather than seeing
the same hint repeated per job.

Hooked into the inline monitor path as well so all three monitor modes
(inline, tmux, none) produce the same summary.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add --failures (default 1) to the tmux example so the failure summary
and dependency-skip cascade are visible by default. The first N proc-*
jobs exit non-zero, which causes their downstream merge/final jobs to
be skipped. Pass --failures=0 for a clean run.

Also enable log capture by default (--no-logs to disable) so the
failed-job log paths printed by the new done-summary actually exist.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Render a 'Failed jobs' table directly below the per-worker status
table while the queue is still running, so failures are visible the
moment they happen rather than only in the post-run summary.

Each row shows the job name and its log path (or '(no log)' when
log capture wasn't enabled); a one-line note reminds the user to
pass log=True if any failed jobs lack a log on disk.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…es failures

When the monitor is rehydrated via cmd_queue monitor --manifest=..., the
reconstructed workers had empty .jobs lists, so the failed-jobs panel
and post-run summary couldn't surface any failing job names — even
when fail markers existed on disk.

Serialize each job's name, log flag, and fail/log paths into the
manifest, and rebuild lightweight SimpleNamespace stubs on each
reconstructed worker. Enough surface for the failure renderer; we
don't need the full BashJob since the monitor never re-runs anything.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… monitor

The bash boilerplate generated by BashJob.finalize_text ran an
unconditional if-RC-0-on_pass-else-on_fail after the deps-check, so a
skipped job (RC=126, on_skip already ran) ALSO had fail_fpath written
and NUM_FAILED incremented. The status agg therefore showed a
skipped+failed double-count.

Fix:
* Add a skip_fpath marker (printed by the on_skip block).
* Make the post-RC dispatch 3-way: on_pass for RC=0, no-op for RC=126,
  on_fail otherwise.

Monitor:
* Carry skip_fpath and dependency names in the rehydration manifest.
* Replace single failed panel with Failed + Skipped tables; skipped
  rows show a reason like dep X failed.
* Same split applied to the post-run summary.

Update tests/test_bash_variants.py: the prior test asserted the bug.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Asserts that Queue.submit(..., log=True) lands on BashJob.log and
that the rendered command section gets the expected
``(<cmd>) 2>&1 | tee <log_fpath>`` wrapper. Also covers log=False
and the current default (False).

Catches a regression class where ``submit`` drops or shadows the
``log`` kwarg without other tests noticing — log files would just
silently stop being written.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Press [a] from the live status table to attach (or switch-client) to a
detached cmd_queue monitor tmux session running alongside; [q] stops
watching while the queue keeps running. The side session is killed
when the inline monitor exits.

Reorganizes the monitor mode taxonomy so each value names a single
intent: 'hybrid' (default) for inline+tmux, 'inline' for current-shell
only, 'tmux' for detached-only, 'none' for headless block. 'hybrid'
warns and falls back to inline when tmux is unavailable.

Wires the [a] keybind into both the simple-rich (rich.Live + cbreak)
and textual monitor paths, mirrors the new mode through the slurm
backend, and adds plumbing-layer tests that mock the tmux helpers so
the suite runs without a tmux server.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant