Makes stalled AI streams retry without throwing, and normalize fetch errors#9
Conversation
There was a problem hiding this comment.
CI Agent Review
The idle-retry strategy introduces a critical but unenforced cross-layer configuration invariant: idleTimeoutMilliseconds (default 300s) must remain below BUN_CONFIG_HTTP_IDLE_TIMEOUT (600s in the Dockerfile). If Bun's socket timer fires first, it throws a DOMException that the new retry logic does not handle as a retryable condition — it propagates as an unhandled exception. This invariant is only documented in comments across two files maintained by different audiences, with no runtime validation at startup. Changing either value independently (or running outside the Docker image) silently breaks the retry-on-stall behavior, with failures only manifesting under specific stall conditions.
| @@ -87,22 +91,25 @@ function toWireTools(tools: readonly Tool[]): CompletionsRequest['tools'] { | |||
| } | |||
|
|
|||
There was a problem hiding this comment.
This comment documents a critical configuration invariant, but it is not enforced in code. If idleTimeoutMilliseconds ≥ BUN_CONFIG_HTTP_IDLE_TIMEOUT, Bun's socket timer fires first and throws a DOMException that bypasses the retry logic entirely — it propagates as an unhandled exception. Consider validating at startup (e.g., checking process.env.BUN_CONFIG_HTTP_IDLE_TIMEOUT against idleTimeoutMilliseconds) so misconfiguration fails loudly rather than silently under stall conditions.
|
|
||
| # disable annoying feature of BUN that makes it so source lines show up in error output | ||
| ENV BUN_DISABLE_SOURCE_CODE_PREVIEW=1 | ||
| # byte-level socket idle backstop; kept above the agent-loop idle timer (300s) so the agent-loop's non-throwing retry always fires first on stalled streams |
There was a problem hiding this comment.
This value is coupled to idleTimeoutMilliseconds in source/agent-loop.mts (default 300s) — if set too low, Bun's socket timer pre-empts the application-level retry, causing DOMException propagation instead of graceful retries. This invariant has no runtime enforcement; anyone changing either value independently (or running outside this Docker image) can silently break the stall-recovery behavior.
No description provided.