You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Found by the first headless-Haiku benchmark run (#1101): a small-model agent hammering the CLI in tight act/observe loops (38 turns in ~6.7 min, including malformed and rapid-fire commands) wedged the iOS runner hard enough that snapshot requests kept timing out even across a subsequent open --relaunch. The daemon correctly survived (preserve-on-snapshot-timeout policy from #1075) but the runner never self-recovered — it needed a process kill.
This is a new workload class: adversarial-by-incompetence. Strong models pace themselves; small models retry fast, interleave commands mid-flight, and abandon sessions mid-request. Best-in-class agent reliability means the runner recovers from this without human intervention.
The --relaunch path should arguably detect a runner whose last N commands all timed out and recycle it proactively (kill + fresh start) instead of reusing the session — the keep-hot machinery has the health signals (readiness preflight, invalidate/restart-and-replay) but apparently a state exists where none of them trigger recycling.
Consider a daemon-side circuit breaker: M consecutive runner command timeouts → forced runner restart on next command, with a diagnostic explaining why.
Found by the first headless-Haiku benchmark run (#1101): a small-model agent hammering the CLI in tight act/observe loops (38 turns in ~6.7 min, including malformed and rapid-fire commands) wedged the iOS runner hard enough that snapshot requests kept timing out even across a subsequent
open --relaunch. The daemon correctly survived (preserve-on-snapshot-timeout policy from #1075) but the runner never self-recovered — it needed a process kill.This is a new workload class: adversarial-by-incompetence. Strong models pace themselves; small models retry fast, interleave commands mid-flight, and abandon sessions mid-request. Best-in-class agent reliability means the runner recovers from this without human intervention.
Investigation starters:
--relaunchpath should arguably detect a runner whose last N commands all timed out and recycle it proactively (kill + fresh start) instead of reusing the session — the keep-hot machinery has the health signals (readiness preflight, invalidate/restart-and-replay) but apparently a state exists where none of them trigger recycling.