fix(agent): park HITL approvals + keep ACP alive + fail loud on code tools#4848
Conversation
FIX 1 (HITL park, F-024): the runner parked an `ask` permission gate by replying
`reject` to the harness, which for Claude produced a failed tool call ("User
refused permission") whose tool_result{isError} overwrote the approval prompt on
the same tool-call id. Add a third responder outcome `park`: HITLResponder returns
it on a human surface with no stored decision (was `deny`), and
attachPermissionResponder sends NO respondPermission on park. The interaction_request
stays the last word on the tool call; the turn ends with the tool pending and the
next turn's stored decision resolves it. No wire change (park is runner-internal).
Pi-1: hide the Permission policy field for Pi in AgentConfigControl (Pi never gates).
FIX 2 (code-tool fail loud, F-016): buildRunPlan now refuses a run carrying a
`kind: code` tool up front with CODE_TOOL_UNSUPPORTED_MESSAGE (ok:false), the way
stdio MCP is gated. Code execution was removed for security (F-010); the per-call
runCodeTool throw became a tool result the model laundered into a 200 "success".
The per-call throw stays as a defense-in-depth backstop.
Tests: runner park unit + orchestration updated to assert park/no-reply; new
permissions.ts park regression; new run-plan code-tool gate tests; SDK egress test
locking "park does not clobber". 221 vitest + typecheck, 348 SDK agent tests green.
Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
A parked HITL turn holds the acp-http-client connection open while a human approves a tool, but the runner set no custom headersTimeout, so undici's default (~5 min) fired and killed the ACP stream with UND_ERR_HEADERS_TIMEOUT; the approve->resume turn then timed out too. Drive the ACP HTTP client through a long-timeout undici dispatcher (new acp-fetch.ts: createAcpDispatcher/createAcpFetch, headersTimeout/bodyTimeout disabled by default, overridable via SANDBOX_AGENT_ACP_*_TIMEOUT_MS). The local path now passes createAcpFetch(); the Daytona cookie fetch layers on the same dispatcher. Scoped to the ACP fetch, not the global dispatcher. undici promoted to a direct dependency. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughSummary by CodeRabbit
WalkthroughThe agent responder now parks permission requests instead of replying immediately in human-surface cases. ACP fetch handling uses timeout-aware undici dispatchers, code-tool runs are rejected in run planning, and the Pi harness hides the permission-policy control. ChangesAgent permission and tool flow
Sequence Diagram(s)Parked permission flow sequenceDiagram
participant HITLResponder
participant attachPermissionResponder
participant session
attachPermissionResponder->>HITLResponder: onPermission(request)
HITLResponder-->>attachPermissionResponder: "park"
attachPermissionResponder->>session: emit interaction_request
Note over attachPermissionResponder,session: session.respondPermission() is skipped
ACP fetch wiring sequenceDiagram
participant SandboxAgent
participant createCookieFetch
participant createAcpFetch
participant undici.fetch
SandboxAgent->>createCookieFetch: Daytona start()
createCookieFetch->>createAcpFetch: default inner fetch()
createCookieFetch->>undici.fetch: request with cookies
SandboxAgent->>createAcpFetch: non-Daytona start()
createAcpFetch->>undici.fetch: fetch(..., dispatcher)
Note over createAcpFetch,undici.fetch: dispatcher uses ACP timeout settings
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review Review focus: this fixes the broken Claude HITL approval flow (F-024) plus two related runner hardening changes. Please confirm:
Stacked on #4847; its diff is isolated to these 17 files. |
🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "== PR metadata =="
gh pr view 4848 --json number,title,baseRefName,headRefName,files --jq '{number,title,baseRefName,headRefName,files:[.files[].path]}'
echo
echo "== Changed files =="
gh pr view 4848 --json files --jq '.files[].path'
echo
echo "== Search for relevant symbols =="
rg -n -C2 --glob '!**/node_modules/**' \
-e '\bpark\b' \
-e 'respondPermission' \
-e 'interaction_request' \
-e 'createAcpDispatcher' \
-e 'createAcpFetch' \
-e 'SANDBOX_AGENT_ACP_[A-Z_]+_TIMEOUT_MS' \
-e 'CODE_TOOL_UNSUPPORTED_MESSAGE' \
-e '\bbuildRunPlan\b' \
-e '\brunCodeTool\b' \
-e '\bAgentConfigControl\b' \
-e 'pi_core' \
-e 'pi_agenta' .
echo
echo "== Candidate files =="
fd -i 'acp-fetch\.ts$' .
fd -i 'permissions\.ts$' .
fd -i 'run-plan.*\.(ts|tsx)$' .
fd -i 'AgentConfigControl\.(ts|tsx)$' .
fd -i 'responder.*\.(ts|tsx)$' .
fd -i 'orchestration.*\.(ts|tsx)$' .
for f in \
$(fd -i 'acp-fetch\.ts$' .) \
$(fd -i 'permissions\.ts$' .) \
$(fd -i 'run-plan.*\.(ts|tsx)$' .) \
$(fd -i 'AgentConfigControl\.(ts|tsx)$' .) \
$(fd -i 'responder.*\.(ts|tsx)$' .) \
$(fd -i 'orchestration.*\.(ts|tsx)$' .)
do
echo
echo "== Outline: $f =="
ast-grep outline "$f" --view expanded || true
done
echo
echo "== Targeted source excerpts =="
python - <<'PY'
from pathlib import Path
targets = {
"services/agent": [
"acp-fetch.ts",
"permissions.ts",
"run-plan.ts",
"responder.ts",
"orchestration.ts",
],
"frontend": [
"AgentConfigControl.tsx",
"AgentConfigControl.ts",
],
}
for base, names in targets.items():
for name in names:
for p in Path(".").rglob(name):
if base in str(p):
print(f"\n===== {p} =====")
try:
lines = p.read_text().splitlines()
except Exception as e:
print(f"[read error] {e}")
continue
for i, line in enumerate(lines, start=1):
if any(tok in line for tok in [
"park", "respondPermission", "interaction_request",
"createAcpDispatcher", "createAcpFetch",
"SANDBOX_AGENT_ACP_", "CODE_TOOL_UNSUPPORTED_MESSAGE",
"buildRunPlan", "runCodeTool", "AgentConfigControl",
"pi_core", "pi_agenta"
]):
start = max(1, i-12)
end = min(len(lines), i+20)
print(f"\n--- lines {start}-{end} ---")
for n in range(start, end+1):
print(f"{n:4}: {lines[n-1]}")
PY
echo
echo "== Tests mentioning park / ACP / code-tool gate / Pi control =="
rg -n -C2 --glob '!**/node_modules/**' \
-e 'park does not clobber' \
-e 'headersTimeout' \
-e 'bodyTimeout' \
-e 'CODE_TOOL_UNSUPPORTED_MESSAGE' \
-e 'pi_core' \
-e 'pi_agenta' \
-e 'respondPermission' \
-e '\bpark\b' services packages apps . 2>/dev/null || trueLength of output: 241621 🏁 Script executed: #!/bin/bash
set -euo pipefail
files=(
"services/agent/src/engines/sandbox_agent/acp-fetch.ts"
"services/agent/src/engines/sandbox_agent.ts"
"services/agent/src/engines/sandbox_agent/daytona.ts"
"services/agent/src/engines/sandbox_agent/permissions.ts"
"services/agent/src/engines/sandbox_agent/run-plan.ts"
"services/agent/src/tools/code.ts"
"services/agent/src/responder.ts"
"services/agent/tests/unit/sandbox-agent-orchestration.test.ts"
"services/agent/tests/unit/sandbox-agent-run-plan.test.ts"
"services/agent/tests/unit/responder.test.ts"
"web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/AgentConfigControl.tsx"
)
for f in "${files[@]}"; do
if [[ -f "$f" ]]; then
echo
echo "===== $f ====="
nl -ba "$f" | sed -n '1,260p'
else
echo "MISSING: $f"
fi
doneLength of output: 1260 🏁 Script executed: #!/bin/bash
set -euo pipefail
show_matches() {
local file="$1"
shift
echo
echo "===== $file ====="
for pat in "$@"; do
echo "--- pattern: $pat ---"
awk -v pat="$pat" '
index($0, pat) {
start = NR - 12; if (start < 1) start = 1
end = NR + 20
hits[++n] = start ":" end
}
{ lines[NR] = $0; max = NR }
END {
if (n == 0) {
print "[no matches]"
exit
}
lastEnd = 0
for (i = 1; i <= n; i++) {
split(hits[i], a, ":")
s = a[1]; e = a[2]
if (s <= lastEnd) s = lastEnd + 1
if (e > max) e = max
if (s > e) continue
printf("lines %d-%d\n", s, e)
for (j = s; j <= e; j++) printf("%4d: %s\n", j, lines[j])
lastEnd = e
}
}
' "$file"
done
}
show_matches "services/agent/src/engines/sandbox_agent/acp-fetch.ts" \
"createAcpDispatcher" \
"createAcpFetch" \
"SANDBOX_AGENT_ACP_HEADERS_TIMEOUT_MS" \
"SANDBOX_AGENT_ACP_BODY_TIMEOUT_MS" \
"setGlobalDispatcher"
show_matches "services/agent/src/engines/sandbox_agent.ts" \
"createAcpFetch()" \
"createCookieFetch()" \
"fetch:" \
"headersTimeout"
show_matches "services/agent/src/engines/sandbox_agent/daytona.ts" \
"createCookieFetch" \
"createAcpFetch()" \
"fetch(input" \
"cookie"
show_matches "services/agent/src/engines/sandbox_agent/permissions.ts" \
"decision === \"park\"" \
"respondPermission" \
"interaction_request"
show_matches "services/agent/src/responder.ts" \
"ResponderOutcome" \
"if (this.hasHumanSurface) return \"park\"" \
"private lookup(request" \
"toolCallId" \
"toolName"
show_matches "services/agent/tests/unit/responder.test.ts" \
"parks when there is a human surface and no stored decision" \
"stored decision wins over the park" \
"tool name"
show_matches "services/agent/tests/unit/sandbox-agent-orchestration.test.ts" \
"runSandboxAgent default HITL responder wiring" \
"parks unapproved" \
"stored decision" \
"permissionReplies" \
"interaction_request"
show_matches "services/agent/src/engines/sandbox_agent/run-plan.ts" \
"CODE_TOOL_UNSUPPORTED_MESSAGE" \
"USER_MCP_UNSUPPORTED_MESSAGE" \
"hasCodeTool(toolSpecs)" \
"return { ok: false, error: CODE_TOOL_UNSUPPORTED_MESSAGE }"
show_matches "services/agent/src/tools/code.ts" \
"CODE_TOOL_UNSUPPORTED_MESSAGE" \
"throw new Error"
show_matches "services/agent/tests/unit/sandbox-agent-run-plan.test.ts" \
"CODE_TOOL_UNSUPPORTED_MESSAGE" \
"kind: \"code\"" \
"ok: false"
show_matches "web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/AgentConfigControl.tsx" \
"isPiHarness" \
"permission_policy" \
"Permission policy" \
"claude"Length of output: 86470
Overall, the implementation matches the stated intent for F-024, the ACP timeout fix stays narrowly scoped, the F-016 code-tool gate is fail-loud with the old throw preserved as a backstop, and the FE change is isolated to Pi. ✅ Action performedReview finished.
|
Railway Preview Environment
|
Context
Human-in-the-loop tool approval was broken for Claude (F-024). The permission gate fired, but the playground never showed an Approve/Deny prompt, and a parked approval that a human did eventually grant then timed out.
Two root causes:
askpermission gate by replyingrejectto the harness. For Claude that produced a failed tool call ("User refused permission") whosetool_result{isError}overwrote the approval prompt on the same tool-call id. The prompt the user needed to act on was clobbered by an error.headersTimeout, so undici's ~5 minute default fired and killed the ACP stream withUND_ERR_HEADERS_TIMEOUT. The approve-then-resume turn then timed out too.Changes
Park instead of reject (F-024). Add a third responder outcome,
park.HITLResponderreturnsparkon a human surface with no stored decision (it used to returndeny), andattachPermissionRespondersends norespondPermissionat all on park. Theinteraction_requeststays the last word on the tool call. The turn ends with the tool pending, and the next turn's stored decision resolves it. This is runner-internal, so there is no wire change.Before: park -> reply
reject-> Claude emits a failed tool call ->tool_result{isError}overwrites the approval prompt.After: park -> no reply -> the approval prompt stays on the tool call -> the next turn's decision resolves it.
Keep the ACP connection alive across a park. New
acp-fetch.tsdrives the ACP HTTP client through a long-timeout undici dispatcher (createAcpDispatcher/createAcpFetch), withheadersTimeout/bodyTimeoutdisabled by default and overridable viaSANDBOX_AGENT_ACP_*_TIMEOUT_MS. The local path passescreateAcpFetch(); the Daytona cookie fetch layers on the same dispatcher. This is scoped to the ACP fetch, not the global dispatcher.undiciis promoted to a direct dependency.Fail loud on code tools (F-016).
buildRunPlannow refuses a run carrying akind: codetool up front withCODE_TOOL_UNSUPPORTED_MESSAGE(ok: false), the same way stdio MCP is gated. Code execution was removed for security (F-010); the per-callrunCodeToolthrow used to become a tool result the model could launder into a 200 "success". The per-call throw stays as a defense-in-depth backstop.Hide the Permission policy field for Pi (Pi-1). Pi never gates tool use, so a permission policy is meaningless for it.
AgentConfigControlhides the field forpi_core/pi_agenta. Only Claude honors it.Scope / risk
Park is runner-internal, so there is no
/runwire, protocol, SDK, or golden-fixture change for it. The ACP timeout change is scoped to the ACP fetch dispatcher, not the process-global undici dispatcher, so other HTTP traffic in the runner is unaffected. The code-tool gate mirrors the existing stdio-MCP gate; it only affects runs that carry akind: codetool. The FE change only removes a control for Pi harnesses; Claude's config form is unchanged.This PR is stacked on
feat/agent-gateway-tool-mcp(the gateway-MCP PR). Review and merge that first.Tests / notes
permissions.tspark regression and new run-plan code-tool gate tests.sandbox-agent-acp-fetchtest for the long-timeout dispatcher.services/agent, 348 SDK agent tests green.@agenta/entity-uitypecheck + eslint clean for the FE change.How to QA
Prerequisites: the EE dev stack with the
sandbox-agentsidecar, a project with an Anthropic key, and an agent config withharness: claudeand a tool whose permission policy is set to ask/prompt.Steps:
Expected result: the Approve/Deny prompt renders on the tool call (it is not replaced by a "User refused permission" error). After you approve, even minutes later, the run resumes and completes instead of timing out.
Automated tests:
Edge cases: open the agent config for a Pi harness (
pi_core/pi_agenta) and confirm the Permission policy field is hidden, while Claude still shows it. Also confirm a run carrying akind: codetool now fails up front with a clear message rather than reporting a fake success.Refs #4845 (plan).