Skip to content

fix(agent): end a HITL park turn gracefully so Approve/Deny complete (F-040)#4869

Merged
mmabrouk merged 1 commit into
big-agentsfrom
fix/agent-hitl-park-terminal
Jun 26, 2026
Merged

fix(agent): end a HITL park turn gracefully so Approve/Deny complete (F-040)#4869
mmabrouk merged 1 commit into
big-agentsfrom
fix/agent-hitl-park-terminal

Conversation

@mmabrouk

Copy link
Copy Markdown
Member

Context

HITL is a headline agent capability, and on Claude it was broken end to end (QA finding F-040). On the /messages path, when a tool needs human approval the responder returns park and sends no ACP respondPermission, then the runner awaits session.prompt(). But Claude-over-ACP does not end a turn on an unanswered permission gate, so session.prompt() blocked forever. Three things followed:

  • the parked turn never terminated, so its temp sandbox leaked (the finally never ran),
  • the egress never saw the run end, so it never emitted a finish frame and the SSE stream hung,
  • the AI-SDK resume POST fired against the still-open stream and errored out as "The agent run failed".

The net user experience: Approve hung ~5 min then ERR_ABORTED; Deny ended in a red "The agent run failed" instead of a clean denial.

What changed

A park now ENDS the /run turn gracefully instead of holding the ACP connection open.

  • engines/sandbox_agent/permissions.tsattachPermissionResponder gains an onPark callback; on decision === "park" it fires onPark() (still sends no respondPermission, so no F-024 clobber).
  • engines/sandbox_agent.ts — on the first park, onPark calls sandbox.destroySession(session.id), the sandbox-agent package's managed cancel. It resolves the pending permission RPC with {outcome:"cancelled"} (not a reject) and sends session/cancel, so the in-flight prompt() returns. The orchestration races the prompt against a parkedSignal and returns stopReason:"paused", so the finally disposes the sandbox (no leak) and the egress drains to a clean finish.
  • sdks/python/.../vercel/stream.py — map paused/cancelled to the AI-SDK other finish reason (it is intentional state, not an unknown model reason).

The resume then cold-replays as a fresh turn and the stored decision resolves the gate via the name+args anchor (#4854): Approve runs the tool and completes; Deny returns a clean "User refused permission" tool-error and the model continues. The FE resume already carries the {approved} envelope (#4859); verified, unchanged.

Scope / risk

  • Runner + egress only. No protocol.ts/wire.py change, no golden/wire-contract change (stopReason is a free-form string; paused is a new value of an existing field). No FE change.
  • Headless /invoke is unchanged. It never parks (no human surface), so the new path is dead code there.
  • Normal completion is unchanged. The prompt wins the race; a real prompt rejection still surfaces as ok:false through the outer catch (the side .catch() on the orphaned prompt only suppresses a late rejection after a park).
  • Could regress: anything that depends on a parked turn staying open — but that was the bug, and there is no other consumer of an open park.

How to QA

Prerequisites: a committed Agent app with Claude Code + haiku, a gated github gateway tool, and an Ask rule (harness_kwargs.claude.permissions.ask), plus an Anthropic key in the project vault. Local runner (no Daytona). Restart the runner so it loads the change.

  1. New chat → "Call the github GET_THE_AUTHENTICATED_USER tool now and show me my login."
  2. At "Run this tool?" click Approve → the tool runs and the turn completes with a real model answer (no hang, no ERR_ABORTED).
  3. New chat, same prompt → click Deny → the tool shows "Responded", the model receives "User refused permission" and continues with a graceful message (NOT "The agent run failed").
  4. While parked and after each turn, count runner temp dirs: docker exec <runner> sh -lc 'ls -d /tmp/agenta-sandbox-agent-* | wc -l' → stays 0 (no leak). The runner log shows prompt stopReason=paused for the park turn.

Expected: park terminates (stopReason=paused), Approve completes, Deny is a clean denial, zero leaked sandboxes, and no unhandledRejection / ACP write error: other side closed / fetch failed in the runner log.

Test command:

  • Runner: cd services/agent && pnpm test && pnpm run typecheck
  • SDK egress: cd sdks/python && python -m pytest oss/tests/pytest/unit/agents/adapters/test_vercel_stream_park.py oss/tests/pytest/unit/agents/ -n0 -q

Edge cases covered by tests: the prompt never resolving (the real Claude case), the managed cancel itself rejecting (the local park signal still ends the turn), and the egress draining a parked stream to a finish.

https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

…(F-040)

On the /messages HITL path a parked permission gate left session.prompt()
blocked forever: Claude-over-ACP does not end a turn on an unanswered gate, so
the parked turn never terminated. That leaked the runner sandbox (the finally
never ran), the egress never emitted a finish frame (the SSE stream hung), and
the AI-SDK resume errored out as "The agent run failed".

Make a park END the /run turn. When the responder returns park, an onPark
callback cancels the in-flight prompt via sandbox.destroySession (the
sandbox-agent package's managed cancel: it resolves the pending permission with
{outcome:"cancelled"} — not a reject, so no F-024 clobber — and sends
session/cancel so prompt() returns). The orchestration races the prompt against
a parked signal and returns stopReason "paused", so the finally disposes the
sandbox (no leak) and the egress drains to a clean finish (paused -> AI-SDK
other). The resume cold-replays as a fresh turn and the stored decision resolves
the gate: Approve runs the tool and completes; Deny returns a clean denial and
the model continues.

Live-verified on Claude+haiku with a gated github tool and an Ask rule (local
runner, no Daytona): both park turns logged stopReason=paused, zero leaked
sandbox dirs, Approve completed with a real answer, Deny ended in a graceful
denial (not "agent run failed"), and the ACP write/fetch-failed errors are gone.

Tests: park emits a terminal paused result even when the prompt hangs; no leak;
park terminates even if the managed cancel rejects; the egress drains a parked
stream to a finish.

Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jun 26, 2026
@vercel

vercel Bot commented Jun 26, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 26, 2026 1:27am

Request Review

@dosubot dosubot Bot added the Backend label Jun 26, 2026
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 18756542-70e8-4d63-a589-e550a8e3b82b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/agent-hitl-park-terminal

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Railway Preview Environment

Status Destroyed (PR closed)

Updated at 2026-06-26T01:47:42.309Z

@mmabrouk mmabrouk merged commit 5584230 into big-agents Jun 26, 2026
48 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant