Skip to content

[fix] Resume HITL Deny end-to-end + guard the dev error overlay#4859

Merged
mmabrouk merged 1 commit into
big-agentsfrom
fix/agent-hitl-deny-resume
Jun 25, 2026
Merged

[fix] Resume HITL Deny end-to-end + guard the dev error overlay#4859
mmabrouk merged 1 commit into
big-agentsfrom
fix/agent-hitl-deny-resume

Conversation

@mmabrouk

Copy link
Copy Markdown
Member

Context

HITL Deny dead-ended in the agent playground (F-036). When a tool gate fired and you clicked Deny, the tool card recorded the decision ("Responded") and then the turn just hung in approval-responded: no resume request, no "Denied" state, no model continuation. Approve worked; Deny did not.

The root cause was the SDK ingress, the cross-layer gap already noted in decision D-007. The production path posts the AI SDK UIMessage[] verbatim, which keeps the approval decision INLINE on the tool part (state: "approval-responded", approval.approved). But adapters/vercel/messages.py _tool_part_blocks only emitted a tool_call block for any non-output tool state, so the decision was dropped on the wire. The runner never learned the user denied, re-parked the gate, and the turn deadlocked.

The runner side was already correct: extractApprovalDecisions reads a {approved} tool_result envelope, HITLResponder maps a deny to reject then tool-error, and the egress emits tool-output-denied. The only missing piece was the ingress emitting that envelope for the inline part shape.

Changes

Two layers, both FE/SDK-side.

  1. SDK ingress (adapters/vercel/messages.py). _tool_part_blocks now translates an inline approval-responded (or the terminal output-denied) tool part into the {approved: bool} tool_result envelope keyed by toolCallId. That is the same envelope the standalone tool-approval-response part already produced, so extractApprovalDecisions resolves the parked gate.

Before (deny on the verbatim path) emitted only:

[{type: "tool_call", toolCallId: "call_1", toolName: "deleteFile", input: {...}}]

After:

[{type: "tool_call",   toolCallId: "call_1", toolName: "deleteFile", input: {...}},
 {type: "tool_result", toolCallId: "call_1", toolName: "deleteFile", output: {approved: false}}]
  1. FE resume (@agenta/playground). A named, unit-tested predicate agentShouldResumeAfterApproval replaces the stock lastAssistantMessageIsCompleteWithApprovalResponses as sendAutomaticallyWhen in both AgentChatPanel and AgentChatConversation. It resumes on approve AND deny. The stock predicate already returns true for a deny-only turn (verified against the installed ai source), but pinning it at the FE seam documents and tests the deny-resume contract instead of relying on SDK internals. Decision: FE-driven deny resume (the model continues after the tool is denied) over a silent terminal short-circuit, per the F-036 prompt.

Also fixes F-033 (dev-only). An aborted or errored agent stream tripped the Next.js dev Runtime Error overlay on top of the correct in-chat error alert. Both useChat onError handlers are softened, and the floating sendMessage/regenerate calls now carry a .catch(), so the rejection no longer bubbles unhandled. The in-chat error rendering is unchanged.

Scope / risk

FE and SDK-ingress only. No /run wire change (the {approved} envelope already existed for the standalone-part path; this only adds the inline-part path that produces the same shape). No runner code touched: responder.ts, the egress, and the goldens are untouched and stay green. ToolPart.tsx/AgentMessage.tsx already render the output-denied ("Denied") state, so no change there.

Possible regression surface: the new ingress else branch runs for any tool part that is neither output-available nor output-error. It emits a decision envelope ONLY when approval.approved is a real boolean (or the state is output-denied); a still-pending approval-requested part emits just the tool_call, never a fabricated approve (covered by a test). The FE predicate change affects only when the conversation auto-resends after an approval; a pending gate still does not resume.

Tests / notes

  • Python ingress: sdks/python/oss/tests/pytest/unit/agents/test_ui_messages.py +5 cases (inline deny, inline approve, output-denied, pending approval-requested emits no decision, standalone tool-approval-response deny). Full agents suite 369 green.
  • FE: web/packages/agenta-playground/tests/unit/agentApprovalResume.test.ts +9 cases (resend-on-deny is the headline; approve, pending, no-tool, last-is-user, unsettled sibling, completed sibling, empty, provider-executed). Playground suite 131 green.
  • ruff format + check clean; eslint --fix + prettier clean; @agenta/playground tsc --noEmit clean; runner responder.test.ts 20 green (the deny round-trip consumer).
  • Decisions logged in docs/design/agent-workflows/projects/qa/final-sweep-decisions.md: D-022 (deny resume), D-023 (dev-overlay guard).

How to QA

Prerequisites: the live agent stack with a Claude harness and a gated tool (the F-036 repro used Claude Code + haiku + the github gateway tool with a Claude "Ask" permission rule). The runner serves the merged services/agent/src over tsx, so no rebuild is needed for the SDK ingress change, but the dev web must pick up the FE change (hot reload or recreate).

Steps:

  1. Open the agent playground, pick the Claude harness + haiku, add the github gateway tool, set an Ask rule that matches it.
  2. Send: "Please look up my GitHub profile using the available GitHub tool."
  3. When the "Run this tool?" prompt appears with Approve / Deny, click Deny.

Expected result: the conversation re-sends (a second /messages request fires), the tool card settles to "Denied" (output-denied), and the model continues to a final answer. No hang in "Responded", no deadlock. Approve still resumes and runs the tool as before.

Automated tests:

cd sdks/python && uv run python -m pytest oss/tests/pytest/unit/agents/test_ui_messages.py -n0 -q
cd web/packages/agenta-playground && pnpm vitest run tests/unit/agentApprovalResume.test.ts

Edge cases: (1) Approve must still resume and run the tool (the same ingress branch handles approved: true). (2) A pending gate you have not answered must not auto-resume. (3) For F-033, pick a Gemini model with no Gemini key and send a message: the in-chat "Stream error" alert must still appear, and in --dev the Next.js Runtime Error overlay must NOT pop.

https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

HITL Deny dead-ended (F-036): clicking Deny recorded the decision
("Responded") then the turn hung in `approval-responded` with no resume
request, no `output-denied`, and no model continuation. Approve resumed
fine; Deny did not.

Root cause was the SDK ingress, the documented D-007 cross-layer gap. The
verbatim UIMessage path posts the approval INLINE on the tool part
(`state: approval-responded`, `approval.approved`), but
`adapters/vercel/messages.py` `_tool_part_blocks` emitted only a
`tool_call` for any non-output state, so the runner never received the
decision and re-parked the gate. The runner side was already correct
(`extractApprovalDecisions` reads the `{approved}` envelope, `HITLResponder`
maps deny -> reject -> tool-error -> continue, the egress emits
`tool-output-denied`); the only missing piece was emitting the envelope
for the inline shape.

Fix, both in this lane's files:
- SDK ingress: `_tool_part_blocks` now translates an inline
  `approval-responded` (or terminal `output-denied`) tool part into the
  `{approved: bool}` `tool_result` envelope keyed by toolCallId -- the same
  envelope the standalone `tool-approval-response` part produces. New
  `_approval_decision` helper.
- FE resume: a named, unit-tested predicate `agentShouldResumeAfterApproval`
  in @agenta/playground, used as `sendAutomaticallyWhen` in both
  AgentChatPanel + AgentChatConversation, that resumes on approve AND deny.
  Chose FE-driven deny resume (model continues after the tool is denied)
  over a silent terminal short-circuit, per the F-036 decision.

Also F-033 (dev-only): an aborted/errored agent stream tripped the Next.js
dev Runtime Error overlay on top of the correct in-chat alert. Softened
both `onError` handlers and attached .catch() to the floating
sendMessage/regenerate calls so the rejection no longer bubbles unhandled.
In-chat error rendering is unchanged.

Tests: Python test_ui_messages.py +5 (inline deny/approve/output-denied/
pending + standalone deny); FE agentApprovalResume.test.ts +9 (resend-on-
deny is the headline). ruff + eslint + prettier clean; responder suite green.

Decisions logged: D-022 (deny resume), D-023 (dev-overlay guard).

Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
@vercel

vercel Bot commented Jun 25, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 25, 2026 9:27pm

Request Review

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. Backend bug Something isn't working Frontend labels Jun 25, 2026
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Approval and denial responses now resume chat flows automatically when all related actions are complete.
    • Inline approval decisions are recognized consistently, improving handoff between chat and approval steps.
  • Bug Fixes

    • Fixed cases where denied approvals could leave conversations stuck.
    • Reduced dev overlay interruptions by keeping stream errors handled inside the chat UI.
  • Tests

    • Added coverage for approve, deny, and pending approval scenarios.

Walkthrough

The PR updates HITL approval handling across the Python message adapter, shared playground resume predicate, and web chat wiring. It also extends the QA decision log and adds unit coverage for approval-response translation and resume conditions.

Changes

HITL approval resume flow

Layer / File(s) Summary
Approval response translation
sdks/python/agenta/sdk/agents/adapters/vercel/messages.py, sdks/python/oss/tests/pytest/unit/agents/test_ui_messages.py
tool-approval-response, inline approval-responded, and output-denied parts now emit tool_result envelopes with approved booleans, with tests covering approved, denied, and pending cases.
Shared approval resume predicate
web/packages/agenta-playground/src/state/execution/agentApprovalResume.ts, web/packages/agenta-playground/src/state/execution/index.ts, web/packages/agenta-playground/src/state/index.ts, web/packages/agenta-playground/src/index.ts, web/packages/agenta-playground/tests/unit/agentApprovalResume.test.ts
Adds agentShouldResumeAfterApproval, re-exports it through playground state, and tests approve, deny, pending, unsettled, and provider-executed tool-part cases.
Chat resume wiring and rejection handling
web/oss/src/components/AgentChatSlice/components/AgentChatConversation.tsx, web/oss/src/components/AgentChatSlice/AgentChatPanel.tsx
Both chat surfaces switch sendAutomaticallyWhen to agentShouldResumeAfterApproval and catch sendMessage/regenerate rejections while keeping in-chat error rendering.
Decision log updates
docs/design/agent-workflows/projects/qa/final-sweep-decisions.md
Extends the QA decision table with additional recorded judgments across approval handling, runner/auth, sandbox cleanup, and related workflow notes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • Agenta-AI/agenta#4848: Also updates the Python UI message adapter and tests for HITL approval/denial tool-part flows, including pending approval behavior and emitted tool frames.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the two main changes: HITL deny resume and dev error overlay guarding.
Description check ✅ Passed The description is directly aligned with the code changes and explains the SDK ingress, frontend resume, and overlay fixes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 60.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/agent-hitl-deny-resume

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@mmabrouk mmabrouk added the needs-review Agent updated; awaiting Mahmoud's review label Jun 25, 2026
@mmabrouk

Copy link
Copy Markdown
Member Author

What this needs from you: a code review + one UX confirmation on the deny behavior.

This fixes the F-036 HITL Deny dead-end and the F-033 dev-overlay nit. Two layers, both FE/SDK-side, no runner change, no /run wire change.

Specifically, please look at:

  1. The deny semantics decision (UX/functionality). I chose FE-driven deny resume: after you Deny, the conversation re-sends, the runner maps deny -> reject -> tool-error, the tool card shows "Denied", and the model continues its turn. The alternative was a silent terminal short-circuit (stop with no continuation). I went with resume per the F-036 prompt ("prefer FE-driven deny resume over a silent terminal short-circuit"). Confirm that is the behavior you want: should the model keep going after a denied tool, or should a deny end the turn?

  2. The ingress fix (code review). adapters/vercel/messages.py _tool_part_blocks now emits the {approved: bool} tool_result envelope for an inline approval-responded/output-denied tool part (the documented D-007 gap). This is the load-bearing fix; the runner's extractApprovalDecisions already consumes that envelope. Check the new else branch + _approval_decision helper don't over-fire (a pending approval-requested part must emit no decision — there's a test).

  3. The FE resume predicate (code review). agentShouldResumeAfterApproval in @agenta/playground replaces the stock lastAssistantMessageIsCompleteWithApprovalResponses. The stock one already returns true for deny-only; the wrapper just pins + tests that contract. Sanity-check the predicate logic and that wiring it in both AgentChatPanel + AgentChatConversation is right.

Decisions logged as D-022 (deny resume) and D-023 (dev-overlay guard) in final-sweep-decisions.md. NOT merging to big-agents myself.

@mmabrouk

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
web/packages/agenta-playground/tests/unit/agentApprovalResume.test.ts (1)

30-147: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Solid coverage; consider adding an output-denied state case.

The suite covers approve/deny/pending/unsettled/provider-executed paths well. The only gap is an explicit case for a tool part whose final state is output-denied (referenced by the ingress layer). Adding it would pin the cross-layer contract called out on agentApprovalResume.ts and prevent a silent regression if that state ever reaches the predicate.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: ca847a0a-9bef-42a0-b67f-721e1e8fcb38

📥 Commits

Reviewing files that changed from the base of the PR and between 6324757 and 3b77c7d.

📒 Files selected for processing (10)
  • docs/design/agent-workflows/projects/qa/final-sweep-decisions.md
  • sdks/python/agenta/sdk/agents/adapters/vercel/messages.py
  • sdks/python/oss/tests/pytest/unit/agents/test_ui_messages.py
  • web/oss/src/components/AgentChatSlice/AgentChatPanel.tsx
  • web/oss/src/components/AgentChatSlice/components/AgentChatConversation.tsx
  • web/packages/agenta-playground/src/index.ts
  • web/packages/agenta-playground/src/state/execution/agentApprovalResume.ts
  • web/packages/agenta-playground/src/state/execution/index.ts
  • web/packages/agenta-playground/src/state/index.ts
  • web/packages/agenta-playground/tests/unit/agentApprovalResume.test.ts

Comment on lines +243 to +257
def test_inline_output_denied_state_emits_denied_envelope(self):
# The AI SDK terminal deny state has no `approval.approved` flag; `output-denied`
# itself means denied, so the ingress still emits `{approved: False}`.
msgs = vercel_ui_messages_to_messages(
[
{
"id": "m7",
"role": "assistant",
"parts": [
{
"type": "tool-deleteFile",
"toolCallId": "call_3",
"state": "output-denied",
"input": {"path": "/z"},
"approval": {"id": "perm_3", "approved": False},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Exercise the output-denied fallback in this test.

This fixture still sets approval.approved: False, so _approval_decision() can return from the first branch and the state == "output-denied" path stays untested. Remove that flag here so the test covers the terminal-deny shape described in the comment.

Suggested fixture change
                         {
                             "type": "tool-deleteFile",
                             "toolCallId": "call_3",
                             "state": "output-denied",
                             "input": {"path": "/z"},
-                            "approval": {"id": "perm_3", "approved": False},
+                            "approval": {"id": "perm_3"},
                         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_inline_output_denied_state_emits_denied_envelope(self):
# The AI SDK terminal deny state has no `approval.approved` flag; `output-denied`
# itself means denied, so the ingress still emits `{approved: False}`.
msgs = vercel_ui_messages_to_messages(
[
{
"id": "m7",
"role": "assistant",
"parts": [
{
"type": "tool-deleteFile",
"toolCallId": "call_3",
"state": "output-denied",
"input": {"path": "/z"},
"approval": {"id": "perm_3", "approved": False},
def test_inline_output_denied_state_emits_denied_envelope(self):
# The AI SDK terminal deny state has no `approval.approved` flag; `output-denied`
# itself means denied, so the ingress still emits `{approved: False}`.
msgs = vercel_ui_messages_to_messages(
[
{
"id": "m7",
"role": "assistant",
"parts": [
{
"type": "tool-deleteFile",
"toolCallId": "call_3",
"state": "output-denied",
"input": {"path": "/z"},
"approval": {"id": "perm_3"},

@mmabrouk mmabrouk merged commit 10f4af8 into big-agents Jun 25, 2026
33 of 34 checks passed
@github-actions

Copy link
Copy Markdown
Contributor

Railway Preview Environment

Image tag pr-4859-763f378
Status Failed
Railway logs Open logs
Logs View workflow run
Updated at 2026-06-25T21:41:05.487Z

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend bug Something isn't working Frontend needs-review Agent updated; awaiting Mahmoud's review size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant