Skip to content

Runner resilience under agent-abuse command patterns #1105

Description

@thymikee

Found by the first headless-Haiku benchmark run (#1101): a small-model agent hammering the CLI in tight act/observe loops (38 turns in ~6.7 min, including malformed and rapid-fire commands) wedged the iOS runner hard enough that snapshot requests kept timing out even across a subsequent open --relaunch. The daemon correctly survived (preserve-on-snapshot-timeout policy from #1075) but the runner never self-recovered — it needed a process kill.

This is a new workload class: adversarial-by-incompetence. Strong models pace themselves; small models retry fast, interleave commands mid-flight, and abandon sessions mid-request. Best-in-class agent reliability means the runner recovers from this without human intervention.

Investigation starters:

  • Reproduce with a scripted rapid-fire pattern (the benchmark harness in press/click/fill --settle: settled observation in the interaction response #1101 can be adapted); capture the runner state when wedged (last diag phases, whether the AX-unavailable invalidation fired and got stuck re-acquiring).
  • The --relaunch path should arguably detect a runner whose last N commands all timed out and recycle it proactively (kill + fresh start) instead of reusing the session — the keep-hot machinery has the health signals (readiness preflight, invalidate/restart-and-replay) but apparently a state exists where none of them trigger recycling.
  • Consider a daemon-side circuit breaker: M consecutive runner command timeouts → forced runner restart on next command, with a diagnostic explaining why.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions