fix(agent): end a HITL park turn gracefully so Approve/Deny complete (F-040) by mmabrouk · Pull Request #4869 · Agenta-AI/agenta

mmabrouk · 2026-06-26T01:27:16Z

Context

HITL is a headline agent capability, and on Claude it was broken end to end (QA finding F-040). On the /messages path, when a tool needs human approval the responder returns park and sends no ACP respondPermission, then the runner awaits session.prompt(). But Claude-over-ACP does not end a turn on an unanswered permission gate, so session.prompt() blocked forever. Three things followed:

the parked turn never terminated, so its temp sandbox leaked (the finally never ran),
the egress never saw the run end, so it never emitted a finish frame and the SSE stream hung,
the AI-SDK resume POST fired against the still-open stream and errored out as "The agent run failed".

The net user experience: Approve hung ~5 min then ERR_ABORTED; Deny ended in a red "The agent run failed" instead of a clean denial.

What changed

A park now ENDS the /run turn gracefully instead of holding the ACP connection open.

engines/sandbox_agent/permissions.ts — attachPermissionResponder gains an onPark callback; on decision === "park" it fires onPark() (still sends no respondPermission, so no F-024 clobber).
engines/sandbox_agent.ts — on the first park, onPark calls sandbox.destroySession(session.id), the sandbox-agent package's managed cancel. It resolves the pending permission RPC with {outcome:"cancelled"} (not a reject) and sends session/cancel, so the in-flight prompt() returns. The orchestration races the prompt against a parkedSignal and returns stopReason:"paused", so the finally disposes the sandbox (no leak) and the egress drains to a clean finish.
sdks/python/.../vercel/stream.py — map paused/cancelled to the AI-SDK other finish reason (it is intentional state, not an unknown model reason).

The resume then cold-replays as a fresh turn and the stored decision resolves the gate via the name+args anchor (#4854): Approve runs the tool and completes; Deny returns a clean "User refused permission" tool-error and the model continues. The FE resume already carries the {approved} envelope (#4859); verified, unchanged.

Scope / risk

Runner + egress only. No protocol.ts/wire.py change, no golden/wire-contract change (stopReason is a free-form string; paused is a new value of an existing field). No FE change.
Headless /invoke is unchanged. It never parks (no human surface), so the new path is dead code there.
Normal completion is unchanged. The prompt wins the race; a real prompt rejection still surfaces as ok:false through the outer catch (the side .catch() on the orphaned prompt only suppresses a late rejection after a park).
Could regress: anything that depends on a parked turn staying open — but that was the bug, and there is no other consumer of an open park.

How to QA

Prerequisites: a committed Agent app with Claude Code + haiku, a gated github gateway tool, and an Ask rule (harness_kwargs.claude.permissions.ask), plus an Anthropic key in the project vault. Local runner (no Daytona). Restart the runner so it loads the change.

New chat → "Call the github GET_THE_AUTHENTICATED_USER tool now and show me my login."
At "Run this tool?" click Approve → the tool runs and the turn completes with a real model answer (no hang, no ERR_ABORTED).
New chat, same prompt → click Deny → the tool shows "Responded", the model receives "User refused permission" and continues with a graceful message (NOT "The agent run failed").
While parked and after each turn, count runner temp dirs: docker exec <runner> sh -lc 'ls -d /tmp/agenta-sandbox-agent-* | wc -l' → stays 0 (no leak). The runner log shows prompt stopReason=paused for the park turn.

Expected: park terminates (stopReason=paused), Approve completes, Deny is a clean denial, zero leaked sandboxes, and no unhandledRejection / ACP write error: other side closed / fetch failed in the runner log.

Test command:

Runner: cd services/agent && pnpm test && pnpm run typecheck
SDK egress: cd sdks/python && python -m pytest oss/tests/pytest/unit/agents/adapters/test_vercel_stream_park.py oss/tests/pytest/unit/agents/ -n0 -q

Edge cases covered by tests: the prompt never resolving (the real Claude case), the managed cancel itself rejecting (the local park signal still ends the turn), and the egress draining a parked stream to a finish.

https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

…(F-040) On the /messages HITL path a parked permission gate left session.prompt() blocked forever: Claude-over-ACP does not end a turn on an unanswered gate, so the parked turn never terminated. That leaked the runner sandbox (the finally never ran), the egress never emitted a finish frame (the SSE stream hung), and the AI-SDK resume errored out as "The agent run failed". Make a park END the /run turn. When the responder returns park, an onPark callback cancels the in-flight prompt via sandbox.destroySession (the sandbox-agent package's managed cancel: it resolves the pending permission with {outcome:"cancelled"} — not a reject, so no F-024 clobber — and sends session/cancel so prompt() returns). The orchestration races the prompt against a parked signal and returns stopReason "paused", so the finally disposes the sandbox (no leak) and the egress drains to a clean finish (paused -> AI-SDK other). The resume cold-replays as a fresh turn and the stored decision resolves the gate: Approve runs the tool and completes; Deny returns a clean denial and the model continues. Live-verified on Claude+haiku with a gated github tool and an Ask rule (local runner, no Daytona): both park turns logged stopReason=paused, zero leaked sandbox dirs, Approve completed with a real answer, Deny ended in a graceful denial (not "agent run failed"), and the ACP write/fetch-failed errors are gone. Tests: park emits a terminal paused result even when the prompt hangs; no leak; park terminates even if the managed cancel rejects; the egress drains a parked stream to a finish. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

vercel · 2026-06-26T01:27:22Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jun 26, 2026 1:27am

coderabbitai · 2026-06-26T01:27:24Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 18756542-70e8-4d63-a589-e550a8e3b82b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/agent-hitl-park-terminal

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

github-actions · 2026-06-26T01:37:06Z

Railway Preview Environment


Status	Destroyed (PR closed)

Updated at 2026-06-26T01:47:42.309Z

dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jun 26, 2026

dosubot Bot added the Backend label Jun 26, 2026

mmabrouk merged commit 5584230 into big-agents Jun 26, 2026
48 of 53 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(agent): end a HITL park turn gracefully so Approve/Deny complete (F-040)#4869

fix(agent): end a HITL park turn gracefully so Approve/Deny complete (F-040)#4869
mmabrouk merged 1 commit into
big-agentsfrom
fix/agent-hitl-park-terminal

mmabrouk commented Jun 26, 2026

Uh oh!

vercel Bot commented Jun 26, 2026

Uh oh!

coderabbitai Bot commented Jun 26, 2026

Review skipped

Uh oh!

github-actions Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mmabrouk commented Jun 26, 2026

Context

What changed

Scope / risk

How to QA

Uh oh!

vercel Bot commented Jun 26, 2026

Uh oh!

coderabbitai Bot commented Jun 26, 2026

Review skipped

Uh oh!

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Railway Preview Environment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 26, 2026 •

edited

Loading