Skip to content

feat(web,hub,cli): show machine health in session sidebar#962

Open
heavygee wants to merge 14 commits into
tiann:mainfrom
heavygee:feat/machine-health-display
Open

feat(web,hub,cli): show machine health in session sidebar#962
heavygee wants to merge 14 commits into
tiann:mainfrom
heavygee:feat/machine-health-display

Conversation

@heavygee

@heavygee heavygee commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • CLI: collect OS metrics (cpuPercent, memoryPercent, load1m on Unix, cpuCount) and attach optional health to existing machine-alive keepalives (~20s).
  • Hub: persist health on machine cache entries and rebroadcast when values change.
  • Web: session sidebar machine row becomes a compact bordered tile (name, OS, session count) with inline CPU/RAM meters; tooltip shows capacity status, per-metric bars, load on Linux, and "CPU across all N cores" when cpuCount is known.

Test plan

  • bun typecheck
  • bun run test — 1121/1122 pass; one pre-existing CLI runner integration version-mismatch test flakes unrelated to this diff
  • Hub alive handler tests for health payload
  • Web presentation + component tests
  • Dogfood on operator driver soup (:3006) with Linux + Windows runners reporting health

Fixes #961

heavygee and others added 9 commits June 20, 2026 16:08
Runners attach OS health snapshots to machine-alive heartbeats; the hub
caches them and the web session list renders load or CPU between the
machine label and session count.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sidebar label now combines CPU and RAM percentages for overload
signaling; load stays in the tooltip on Unix. Prime CPU sampling so
the first heartbeat includes usage, not just memory.

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace bare CPU/RAM text with labeled mini bar gauges, chip
border tint by severity, and a HoverTooltip explaining capacity
and overload guidance.

Co-authored-by: Cursor <cursoragent@cursor.com>
Allow a generous popover width and lay CPU/RAM/load out side by side
so the capacity tooltip reads wider and less tall than the chip.

Co-authored-by: Cursor <cursoragent@cursor.com>
Wide tooltip was align=end on the chip, so it grew left off-screen.
Use row-span positioning on the machine tile button instead.

Co-authored-by: Cursor <cursoragent@cursor.com>
Turn the session sidebar machine row into a bordered host panel with OS
metadata and side-by-side CPU/RAM meters embedded in the tile instead
of a flat label line matching project rows.

Co-authored-by: Cursor <cursoragent@cursor.com>
Collapse the machine header back to one py-1.5 row with OS and compact
inline health beside the name, and restore the original project indent
without the extra nested rail or second header line.

Co-authored-by: Cursor <cursoragent@cursor.com>
When the runner reports cpuCount, the tooltip reads "CPU across all 6
cores" instead of the generic all-cores label.

Co-authored-by: Cursor <cursoragent@cursor.com>
Dogfood captures for the session sidebar machine tile and capacity
tooltip, for upstream PR review.

Co-authored-by: Cursor <cursoragent@cursor.com>

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Major] Delayed heartbeat timer survives stop/shutdown — see inline comment at cli/src/api/apiMachine.ts:502.

Summary

  • Review mode: initial
  • One lifecycle issue found in the new machine health heartbeat path. It can leave a heartbeat interval running after disconnect/shutdown if the socket stops during the new 50 ms priming delay.

Testing

  • Not run (automation). Suggested coverage: fake-timer test that calls startKeepAlive(), stops/shuts down before 50 ms, advances timers, and asserts no machine-alive emit or interval remains.

HAPI Bot

Comment thread cli/src/api/apiMachine.ts Outdated
Track the 50ms CPU priming setTimeout and clear it in stopKeepAlive so
disconnect/shutdown during the delay cannot leave a stray interval alive.

Co-authored-by: Cursor <cursoragent@cursor.com>
@heavygee

Copy link
Copy Markdown
Collaborator Author

Addressed the [Major] delayed heartbeat timer finding in 4a0e4157..HEAD:

  • keepAliveStartTimeout is now tracked and cleared in stopKeepAlive() alongside the interval.
  • Added fake-timer coverage: shutdown before 50ms priming delay emits no machine-alive; shutdown after first emit stops further ticks.

Waiting on CI re-run.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • No findings.

Summary

  • Review mode: follow-up after new commits
  • No blocking issues found in the latest diff. The prior heartbeat timer leak appears addressed by tracking and clearing keepAliveStartTimeout.
  • Residual risk: validation here was static because bun is unavailable in this runner.

Testing

  • Not run (automation; bun not found in this environment).

HAPI Bot

Review evidence lives in the PR discussion only; no need to ship PNGs in
the repo long-term.

Co-authored-by: Cursor <cursoragent@cursor.com>

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Minor] Bound machine metadata width in the sidebar — the new OS/host metadata is rendered as a shrink-0 flex item, so a long hostname or raw platform string can consume more than the sidebar width and push the health chip/session count off-screen. This is introduced in the new machine header row at web/src/components/MachineGroupHeader.tsx:100.
    Suggested fix:
    <span className="min-w-0 flex-1 truncate text-sm font-semibold text-[var(--app-fg)]">
        {props.label}
    </span>
    <span className="min-w-0 max-w-[8rem] shrink truncate text-[11px] text-[var(--app-hint)]">
        {machineMeta}
    </span>

Summary

  • Review mode: follow-up after new commits
  • One minor UI regression risk found in the latest diff. Residual risk: validation here was static because bun is unavailable in this runner.

Testing

  • Not run (automation; bun not found in this environment).

HAPI Bot

Comment thread web/src/components/MachineGroupHeader.tsx Outdated
Bound the metadata span so a long hostname cannot push the health chip
or session count off-screen in narrow sidebars.

Co-authored-by: Cursor <cursoragent@cursor.com>

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Minor] Make the health tooltip reachable on keyboard focus — the health chip is rendered inside the machine header <button>, but MachineHealthIndicator only relies on group-hover via HoverTooltip; unlike the existing session-row tooltip usage, it does not pass a parent-focus reveal class. Keyboard users can focus the row but cannot reveal the capacity/status/load tooltip introduced here. Evidence: web/src/components/MachineHealthIndicator.tsx:158.
    Suggested fix:
    // MachineHealthIndicator.tsx
    export function MachineHealthIndicator(props: {
        presentation: MachineHealthPresentation
        className?: string
        layout?: 'stack' | 'inline'
        compact?: boolean
        revealOnParentFocusClass?: string
    }) {
        // ...
        return (
            <HoverTooltip
                id={tooltipId}
                target={chip}
                side="bottom"
                align="row"
                className="shrink-0"
                tooltipClassName="px-3 py-2"
                revealOnParentFocusClass={props.revealOnParentFocusClass}
            >
                <MachineHealthTooltipBody presentation={presentation} />
            </HoverTooltip>
        )
    }
    
    // MachineGroupHeader.tsx
    <button className={cn('group/machine-row relative flex w-full min-w-0 ...')}>
        <MachineHealthIndicator
            presentation={props.healthPresentation!}
            layout="inline"
            compact
            className="shrink-0"
            revealOnParentFocusClass="group-focus-visible/machine-row:opacity-100 group-focus-visible/machine-row:visible"
        />
    </button>

Questions

  • None.

Summary

  • Review mode: follow-up after new commits
  • One minor accessibility regression found in the latest diff. The previous metadata-width finding appears addressed by the follow-up commit. Residual risk: static review only; local automation could not run because bun is not installed in this runner.

Testing

  • Not run (automation; bun not found in this environment).

HAPI Bot

Comment thread web/src/components/MachineHealthIndicator.tsx
Wire MACHINE_ROW_TOOLTIP_FOCUS_CLASS and aria-describedby on the machine
header button so keyboard users can read the health tooltip like session rows.

Co-authored-by: Cursor <cursoragent@cursor.com>

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • No issues found.

Questions

  • None.

Summary

  • Review mode: follow-up after new commits
  • No blocker/major/minor findings in the full current diff. The previous keyboard-focus tooltip finding appears addressed in the follow-up commit. Residual risk: static review only; I did not rerun the full Bun test suite in this review pass.

Testing

  • Not run (automation)

HAPI Bot

Bun's os.freemem() reflects MemFree (~1% on cache-heavy hosts), which
made sidebar RAM read ~99% while btop showed ~40% used. Parse
/proc/meminfo MemAvailable instead so used percent matches operator tools.

Co-authored-by: Cursor <cursoragent@cursor.com>

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • No issues found.

Questions

  • None.

Summary

  • Review mode: follow-up after new commits
  • No blocker/major/minor findings in the full current diff. I rechecked the CLI heartbeat health sampling, hub validation/broadcast path, SSE machine cache updates, and sidebar presentation wiring. Residual risk: static review only; I did not rerun the full Bun test suite in this review pass.

Testing

  • Not run (automation)

HAPI Bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(web): show machine CPU/RAM health in session sidebar

1 participant