You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Collapse the dominant agent-loop pair (interact → observe) into one round trip: press @e2 --settle executes the press, waits for the UI to go quiet, and returns the settled observation in the same response — saving one full model inference per step, which dwarfs every daemon-side optimization available.
Design (agreed)
Opt-in flag on press/click/fill/longpress (scroll --settle deferred: infinite feeds settle-then-load, a semantic trap needing its own design).
Payload = settled diff vs the pre-action tree by default (the baseline already exists — --verify captures it). Composes with response levels: digest = changed-count summary, full = whole interactive tree. A full tree per interaction would invert the snapshot token-budget principle; the diff is what agents actually reason about.
Best-effort settle, never an action failure: quiet-window semantics from wait stable absorb finite animations; never-settling content (carousels, tickers) returns the latest capture with settled: false + a hint (generalize the tiny-tree hint machinery). The press succeeded; observation quality is advisory — same principle as --verify evidence.
--verify composition: settle's final capture doubles as the verify evidence source — --settle --verify costs no extra captures.
Per-call latency: quiet window (default ~500ms) + captures (~300ms each on iOS; Android comparable with the snapshot helper, which is the only supported production configuration — helper-less Android is an internal-testing artifact, typically agents forgetting build:all).
Side fix bundled here or separately: when the Android helper is missing on a source checkout, the degradation hint should say run pnpm build:all explicitly.
Acceptance evidence
Bluesky dogfood: a fixed multi-step flow measured as tokens + wall-clock per step, --settle vs the two-call baseline. The claim is one fewer inference per step; the measurement should show it.
Collapse the dominant agent-loop pair (interact → observe) into one round trip:
press @e2 --settleexecutes the press, waits for the UI to go quiet, and returns the settled observation in the same response — saving one full model inference per step, which dwarfs every daemon-side optimization available.Design (agreed)
scroll --settledeferred: infinite feeds settle-then-load, a semantic trap needing its own design).--verifycaptures it). Composes with response levels:digest= changed-count summary,full= whole interactive tree. A full tree per interaction would invert the snapshot token-budget principle; the diff is what agents actually reason about.wait stableabsorb finite animations; never-settling content (carousels, tickers) returns the latest capture withsettled: false+ a hint (generalize the tiny-tree hint machinery). The press succeeded; observation quality is advisory — same principle as--verifyevidence.--verifycomposition: settle's final capture doubles as the verify evidence source —--settle --verifycosts no extra captures.refsGenerationincluded; MCP re-pins): settle flows structurally dissolve the stale-ref hazard — the feat: warn when @refs outlive the session snapshot they came from #1093/feat: versioned snapshot refs with MCP auto-pinning #1096 warnings become the fallback for non-settle paths, not the main defense.appliesToscoping on the guarantee matrix, settle budget via refactor: declare command timeout policy on descriptors (ADR 0011) #1084's flag-sourced timeoutPolicy, one contract scenario per dispatch path.Costs & platform notes
build:all).run pnpm build:allexplicitly.Acceptance evidence
Bluesky dogfood: a fixed multi-step flow measured as tokens + wall-clock per step,
--settlevs the two-call baseline. The claim is one fewer inference per step; the measurement should show it.