[fix] Resume HITL Deny end-to-end + guard the dev error overlay#4859
Conversation
HITL Deny dead-ended (F-036): clicking Deny recorded the decision
("Responded") then the turn hung in `approval-responded` with no resume
request, no `output-denied`, and no model continuation. Approve resumed
fine; Deny did not.
Root cause was the SDK ingress, the documented D-007 cross-layer gap. The
verbatim UIMessage path posts the approval INLINE on the tool part
(`state: approval-responded`, `approval.approved`), but
`adapters/vercel/messages.py` `_tool_part_blocks` emitted only a
`tool_call` for any non-output state, so the runner never received the
decision and re-parked the gate. The runner side was already correct
(`extractApprovalDecisions` reads the `{approved}` envelope, `HITLResponder`
maps deny -> reject -> tool-error -> continue, the egress emits
`tool-output-denied`); the only missing piece was emitting the envelope
for the inline shape.
Fix, both in this lane's files:
- SDK ingress: `_tool_part_blocks` now translates an inline
`approval-responded` (or terminal `output-denied`) tool part into the
`{approved: bool}` `tool_result` envelope keyed by toolCallId -- the same
envelope the standalone `tool-approval-response` part produces. New
`_approval_decision` helper.
- FE resume: a named, unit-tested predicate `agentShouldResumeAfterApproval`
in @agenta/playground, used as `sendAutomaticallyWhen` in both
AgentChatPanel + AgentChatConversation, that resumes on approve AND deny.
Chose FE-driven deny resume (model continues after the tool is denied)
over a silent terminal short-circuit, per the F-036 decision.
Also F-033 (dev-only): an aborted/errored agent stream tripped the Next.js
dev Runtime Error overlay on top of the correct in-chat alert. Softened
both `onError` handlers and attached .catch() to the floating
sendMessage/regenerate calls so the rejection no longer bubbles unhandled.
In-chat error rendering is unchanged.
Tests: Python test_ui_messages.py +5 (inline deny/approve/output-denied/
pending + standalone deny); FE agentApprovalResume.test.ts +9 (resend-on-
deny is the headline). ruff + eslint + prettier clean; responder suite green.
Decisions logged: D-022 (deny resume), D-023 (dev-overlay guard).
Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughSummary by CodeRabbit
WalkthroughThe PR updates HITL approval handling across the Python message adapter, shared playground resume predicate, and web chat wiring. It also extends the QA decision log and adds unit coverage for approval-response translation and resume conditions. ChangesHITL approval resume flow
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
What this needs from you: a code review + one UX confirmation on the deny behavior. This fixes the F-036 HITL Deny dead-end and the F-033 dev-overlay nit. Two layers, both FE/SDK-side, no runner change, no Specifically, please look at:
Decisions logged as D-022 (deny resume) and D-023 (dev-overlay guard) in |
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
web/packages/agenta-playground/tests/unit/agentApprovalResume.test.ts (1)
30-147: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winSolid coverage; consider adding an
output-deniedstate case.The suite covers approve/deny/pending/unsettled/provider-executed paths well. The only gap is an explicit case for a tool part whose final state is
output-denied(referenced by the ingress layer). Adding it would pin the cross-layer contract called out onagentApprovalResume.tsand prevent a silent regression if that state ever reaches the predicate.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: ca847a0a-9bef-42a0-b67f-721e1e8fcb38
📒 Files selected for processing (10)
docs/design/agent-workflows/projects/qa/final-sweep-decisions.mdsdks/python/agenta/sdk/agents/adapters/vercel/messages.pysdks/python/oss/tests/pytest/unit/agents/test_ui_messages.pyweb/oss/src/components/AgentChatSlice/AgentChatPanel.tsxweb/oss/src/components/AgentChatSlice/components/AgentChatConversation.tsxweb/packages/agenta-playground/src/index.tsweb/packages/agenta-playground/src/state/execution/agentApprovalResume.tsweb/packages/agenta-playground/src/state/execution/index.tsweb/packages/agenta-playground/src/state/index.tsweb/packages/agenta-playground/tests/unit/agentApprovalResume.test.ts
| def test_inline_output_denied_state_emits_denied_envelope(self): | ||
| # The AI SDK terminal deny state has no `approval.approved` flag; `output-denied` | ||
| # itself means denied, so the ingress still emits `{approved: False}`. | ||
| msgs = vercel_ui_messages_to_messages( | ||
| [ | ||
| { | ||
| "id": "m7", | ||
| "role": "assistant", | ||
| "parts": [ | ||
| { | ||
| "type": "tool-deleteFile", | ||
| "toolCallId": "call_3", | ||
| "state": "output-denied", | ||
| "input": {"path": "/z"}, | ||
| "approval": {"id": "perm_3", "approved": False}, |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Exercise the output-denied fallback in this test.
This fixture still sets approval.approved: False, so _approval_decision() can return from the first branch and the state == "output-denied" path stays untested. Remove that flag here so the test covers the terminal-deny shape described in the comment.
Suggested fixture change
{
"type": "tool-deleteFile",
"toolCallId": "call_3",
"state": "output-denied",
"input": {"path": "/z"},
- "approval": {"id": "perm_3", "approved": False},
+ "approval": {"id": "perm_3"},
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_inline_output_denied_state_emits_denied_envelope(self): | |
| # The AI SDK terminal deny state has no `approval.approved` flag; `output-denied` | |
| # itself means denied, so the ingress still emits `{approved: False}`. | |
| msgs = vercel_ui_messages_to_messages( | |
| [ | |
| { | |
| "id": "m7", | |
| "role": "assistant", | |
| "parts": [ | |
| { | |
| "type": "tool-deleteFile", | |
| "toolCallId": "call_3", | |
| "state": "output-denied", | |
| "input": {"path": "/z"}, | |
| "approval": {"id": "perm_3", "approved": False}, | |
| def test_inline_output_denied_state_emits_denied_envelope(self): | |
| # The AI SDK terminal deny state has no `approval.approved` flag; `output-denied` | |
| # itself means denied, so the ingress still emits `{approved: False}`. | |
| msgs = vercel_ui_messages_to_messages( | |
| [ | |
| { | |
| "id": "m7", | |
| "role": "assistant", | |
| "parts": [ | |
| { | |
| "type": "tool-deleteFile", | |
| "toolCallId": "call_3", | |
| "state": "output-denied", | |
| "input": {"path": "/z"}, | |
| "approval": {"id": "perm_3"}, |
Railway Preview Environment
|
Context
HITL Deny dead-ended in the agent playground (F-036). When a tool gate fired and you clicked Deny, the tool card recorded the decision ("Responded") and then the turn just hung in
approval-responded: no resume request, no "Denied" state, no model continuation. Approve worked; Deny did not.The root cause was the SDK ingress, the cross-layer gap already noted in decision D-007. The production path posts the AI SDK
UIMessage[]verbatim, which keeps the approval decision INLINE on the tool part (state: "approval-responded",approval.approved). Butadapters/vercel/messages.py_tool_part_blocksonly emitted atool_callblock for any non-output tool state, so the decision was dropped on the wire. The runner never learned the user denied, re-parked the gate, and the turn deadlocked.The runner side was already correct:
extractApprovalDecisionsreads a{approved}tool_resultenvelope,HITLRespondermaps a deny to reject then tool-error, and the egress emitstool-output-denied. The only missing piece was the ingress emitting that envelope for the inline part shape.Changes
Two layers, both FE/SDK-side.
adapters/vercel/messages.py)._tool_part_blocksnow translates an inlineapproval-responded(or the terminaloutput-denied) tool part into the{approved: bool}tool_resultenvelope keyed bytoolCallId. That is the same envelope the standalonetool-approval-responsepart already produced, soextractApprovalDecisionsresolves the parked gate.Before (deny on the verbatim path) emitted only:
After:
@agenta/playground). A named, unit-tested predicateagentShouldResumeAfterApprovalreplaces the stocklastAssistantMessageIsCompleteWithApprovalResponsesassendAutomaticallyWhenin bothAgentChatPanelandAgentChatConversation. It resumes on approve AND deny. The stock predicate already returns true for a deny-only turn (verified against the installedaisource), but pinning it at the FE seam documents and tests the deny-resume contract instead of relying on SDK internals. Decision: FE-driven deny resume (the model continues after the tool is denied) over a silent terminal short-circuit, per the F-036 prompt.Also fixes F-033 (dev-only). An aborted or errored agent stream tripped the Next.js dev Runtime Error overlay on top of the correct in-chat error alert. Both
useChatonErrorhandlers are softened, and the floatingsendMessage/regeneratecalls now carry a.catch(), so the rejection no longer bubbles unhandled. The in-chat error rendering is unchanged.Scope / risk
FE and SDK-ingress only. No
/runwire change (the{approved}envelope already existed for the standalone-part path; this only adds the inline-part path that produces the same shape). No runner code touched:responder.ts, the egress, and the goldens are untouched and stay green.ToolPart.tsx/AgentMessage.tsxalready render theoutput-denied("Denied") state, so no change there.Possible regression surface: the new ingress
elsebranch runs for any tool part that is neither output-available nor output-error. It emits a decision envelope ONLY whenapproval.approvedis a real boolean (or the state isoutput-denied); a still-pendingapproval-requestedpart emits just thetool_call, never a fabricated approve (covered by a test). The FE predicate change affects only when the conversation auto-resends after an approval; a pending gate still does not resume.Tests / notes
sdks/python/oss/tests/pytest/unit/agents/test_ui_messages.py+5 cases (inline deny, inline approve,output-denied, pendingapproval-requestedemits no decision, standalonetool-approval-responsedeny). Full agents suite 369 green.web/packages/agenta-playground/tests/unit/agentApprovalResume.test.ts+9 cases (resend-on-deny is the headline; approve, pending, no-tool, last-is-user, unsettled sibling, completed sibling, empty, provider-executed). Playground suite 131 green.@agenta/playgroundtsc --noEmitclean; runnerresponder.test.ts20 green (the deny round-trip consumer).docs/design/agent-workflows/projects/qa/final-sweep-decisions.md: D-022 (deny resume), D-023 (dev-overlay guard).How to QA
Prerequisites: the live agent stack with a Claude harness and a gated tool (the F-036 repro used Claude Code +
haiku+ thegithubgateway tool with a Claude "Ask" permission rule). The runner serves the mergedservices/agent/srcovertsx, so no rebuild is needed for the SDK ingress change, but the dev web must pick up the FE change (hot reload or recreate).Steps:
haiku, add the github gateway tool, set an Ask rule that matches it.Expected result: the conversation re-sends (a second
/messagesrequest fires), the tool card settles to "Denied" (output-denied), and the model continues to a final answer. No hang in "Responded", no deadlock. Approve still resumes and runs the tool as before.Automated tests:
Edge cases: (1) Approve must still resume and run the tool (the same ingress branch handles
approved: true). (2) A pending gate you have not answered must not auto-resume. (3) For F-033, pick a Gemini model with no Gemini key and send a message: the in-chat "Stream error" alert must still appear, and in--devthe Next.js Runtime Error overlay must NOT pop.https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc