[fix] Resume HITL Deny end-to-end + guard the dev error overlay by mmabrouk · Pull Request #4859 · Agenta-AI/agenta

mmabrouk · 2026-06-25T21:27:36Z

Context

HITL Deny dead-ended in the agent playground (F-036). When a tool gate fired and you clicked Deny, the tool card recorded the decision ("Responded") and then the turn just hung in approval-responded: no resume request, no "Denied" state, no model continuation. Approve worked; Deny did not.

The root cause was the SDK ingress, the cross-layer gap already noted in decision D-007. The production path posts the AI SDK UIMessage[] verbatim, which keeps the approval decision INLINE on the tool part (state: "approval-responded", approval.approved). But adapters/vercel/messages.py _tool_part_blocks only emitted a tool_call block for any non-output tool state, so the decision was dropped on the wire. The runner never learned the user denied, re-parked the gate, and the turn deadlocked.

The runner side was already correct: extractApprovalDecisions reads a {approved} tool_result envelope, HITLResponder maps a deny to reject then tool-error, and the egress emits tool-output-denied. The only missing piece was the ingress emitting that envelope for the inline part shape.

Changes

Two layers, both FE/SDK-side.

SDK ingress (adapters/vercel/messages.py). _tool_part_blocks now translates an inline approval-responded (or the terminal output-denied) tool part into the {approved: bool} tool_result envelope keyed by toolCallId. That is the same envelope the standalone tool-approval-response part already produced, so extractApprovalDecisions resolves the parked gate.

Before (deny on the verbatim path) emitted only:

[{type: "tool_call", toolCallId: "call_1", toolName: "deleteFile", input: {...}}]

After:

[{type: "tool_call",   toolCallId: "call_1", toolName: "deleteFile", input: {...}},
 {type: "tool_result", toolCallId: "call_1", toolName: "deleteFile", output: {approved: false}}]

FE resume (@agenta/playground). A named, unit-tested predicate agentShouldResumeAfterApproval replaces the stock lastAssistantMessageIsCompleteWithApprovalResponses as sendAutomaticallyWhen in both AgentChatPanel and AgentChatConversation. It resumes on approve AND deny. The stock predicate already returns true for a deny-only turn (verified against the installed ai source), but pinning it at the FE seam documents and tests the deny-resume contract instead of relying on SDK internals. Decision: FE-driven deny resume (the model continues after the tool is denied) over a silent terminal short-circuit, per the F-036 prompt.

Also fixes F-033 (dev-only). An aborted or errored agent stream tripped the Next.js dev Runtime Error overlay on top of the correct in-chat error alert. Both useChat onError handlers are softened, and the floating sendMessage/regenerate calls now carry a .catch(), so the rejection no longer bubbles unhandled. The in-chat error rendering is unchanged.

Scope / risk

FE and SDK-ingress only. No /run wire change (the {approved} envelope already existed for the standalone-part path; this only adds the inline-part path that produces the same shape). No runner code touched: responder.ts, the egress, and the goldens are untouched and stay green. ToolPart.tsx/AgentMessage.tsx already render the output-denied ("Denied") state, so no change there.

Possible regression surface: the new ingress else branch runs for any tool part that is neither output-available nor output-error. It emits a decision envelope ONLY when approval.approved is a real boolean (or the state is output-denied); a still-pending approval-requested part emits just the tool_call, never a fabricated approve (covered by a test). The FE predicate change affects only when the conversation auto-resends after an approval; a pending gate still does not resume.

Tests / notes

Python ingress: sdks/python/oss/tests/pytest/unit/agents/test_ui_messages.py +5 cases (inline deny, inline approve, output-denied, pending approval-requested emits no decision, standalone tool-approval-response deny). Full agents suite 369 green.
FE: web/packages/agenta-playground/tests/unit/agentApprovalResume.test.ts +9 cases (resend-on-deny is the headline; approve, pending, no-tool, last-is-user, unsettled sibling, completed sibling, empty, provider-executed). Playground suite 131 green.
ruff format + check clean; eslint --fix + prettier clean; @agenta/playground tsc --noEmit clean; runner responder.test.ts 20 green (the deny round-trip consumer).
Decisions logged in docs/design/agent-workflows/projects/qa/final-sweep-decisions.md: D-022 (deny resume), D-023 (dev-overlay guard).

How to QA

Prerequisites: the live agent stack with a Claude harness and a gated tool (the F-036 repro used Claude Code + haiku + the github gateway tool with a Claude "Ask" permission rule). The runner serves the merged services/agent/src over tsx, so no rebuild is needed for the SDK ingress change, but the dev web must pick up the FE change (hot reload or recreate).

Steps:

Open the agent playground, pick the Claude harness + haiku, add the github gateway tool, set an Ask rule that matches it.
Send: "Please look up my GitHub profile using the available GitHub tool."
When the "Run this tool?" prompt appears with Approve / Deny, click Deny.

Expected result: the conversation re-sends (a second /messages request fires), the tool card settles to "Denied" (output-denied), and the model continues to a final answer. No hang in "Responded", no deadlock. Approve still resumes and runs the tool as before.

Automated tests:

cd sdks/python && uv run python -m pytest oss/tests/pytest/unit/agents/test_ui_messages.py -n0 -q
cd web/packages/agenta-playground && pnpm vitest run tests/unit/agentApprovalResume.test.ts

Edge cases: (1) Approve must still resume and run the tool (the same ingress branch handles approved: true). (2) A pending gate you have not answered must not auto-resume. (3) For F-033, pick a Gemini model with no Gemini key and send a message: the in-chat "Stream error" alert must still appear, and in --dev the Next.js Runtime Error overlay must NOT pop.

https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

HITL Deny dead-ended (F-036): clicking Deny recorded the decision ("Responded") then the turn hung in `approval-responded` with no resume request, no `output-denied`, and no model continuation. Approve resumed fine; Deny did not. Root cause was the SDK ingress, the documented D-007 cross-layer gap. The verbatim UIMessage path posts the approval INLINE on the tool part (`state: approval-responded`, `approval.approved`), but `adapters/vercel/messages.py` `_tool_part_blocks` emitted only a `tool_call` for any non-output state, so the runner never received the decision and re-parked the gate. The runner side was already correct (`extractApprovalDecisions` reads the `{approved}` envelope, `HITLResponder` maps deny -> reject -> tool-error -> continue, the egress emits `tool-output-denied`); the only missing piece was emitting the envelope for the inline shape. Fix, both in this lane's files: - SDK ingress: `_tool_part_blocks` now translates an inline `approval-responded` (or terminal `output-denied`) tool part into the `{approved: bool}` `tool_result` envelope keyed by toolCallId -- the same envelope the standalone `tool-approval-response` part produces. New `_approval_decision` helper. - FE resume: a named, unit-tested predicate `agentShouldResumeAfterApproval` in @agenta/playground, used as `sendAutomaticallyWhen` in both AgentChatPanel + AgentChatConversation, that resumes on approve AND deny. Chose FE-driven deny resume (model continues after the tool is denied) over a silent terminal short-circuit, per the F-036 decision. Also F-033 (dev-only): an aborted/errored agent stream tripped the Next.js dev Runtime Error overlay on top of the correct in-chat alert. Softened both `onError` handlers and attached .catch() to the floating sendMessage/regenerate calls so the rejection no longer bubbles unhandled. In-chat error rendering is unchanged. Tests: Python test_ui_messages.py +5 (inline deny/approve/output-denied/ pending + standalone deny); FE agentApprovalResume.test.ts +9 (resend-on- deny is the headline). ruff + eslint + prettier clean; responder suite green. Decisions logged: D-022 (deny resume), D-023 (dev-overlay guard). Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

vercel · 2026-06-25T21:27:42Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jun 25, 2026 9:27pm

coderabbitai · 2026-06-25T21:27:50Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Approval and denial responses now resume chat flows automatically when all related actions are complete.
- Inline approval decisions are recognized consistently, improving handoff between chat and approval steps.
Bug Fixes
- Fixed cases where denied approvals could leave conversations stuck.
- Reduced dev overlay interruptions by keeping stream errors handled inside the chat UI.
Tests
- Added coverage for approve, deny, and pending approval scenarios.

Walkthrough

The PR updates HITL approval handling across the Python message adapter, shared playground resume predicate, and web chat wiring. It also extends the QA decision log and adds unit coverage for approval-response translation and resume conditions.

Changes

HITL approval resume flow

Layer / File(s)	Summary
Approval response translation `sdks/python/agenta/sdk/agents/adapters/vercel/messages.py`, `sdks/python/oss/tests/pytest/unit/agents/test_ui_messages.py`	`tool-approval-response`, inline `approval-responded`, and `output-denied` parts now emit `tool_result` envelopes with `approved` booleans, with tests covering approved, denied, and pending cases.
Shared approval resume predicate `web/packages/agenta-playground/src/state/execution/agentApprovalResume.ts`, `web/packages/agenta-playground/src/state/execution/index.ts`, `web/packages/agenta-playground/src/state/index.ts`, `web/packages/agenta-playground/src/index.ts`, `web/packages/agenta-playground/tests/unit/agentApprovalResume.test.ts`	Adds `agentShouldResumeAfterApproval`, re-exports it through playground state, and tests approve, deny, pending, unsettled, and provider-executed tool-part cases.
Chat resume wiring and rejection handling `web/oss/src/components/AgentChatSlice/components/AgentChatConversation.tsx`, `web/oss/src/components/AgentChatSlice/AgentChatPanel.tsx`	Both chat surfaces switch `sendAutomaticallyWhen` to `agentShouldResumeAfterApproval` and catch `sendMessage`/`regenerate` rejections while keeping in-chat error rendering.
Decision log updates `docs/design/agent-workflows/projects/qa/final-sweep-decisions.md`	Extends the QA decision table with additional recorded judgments across approval handling, runner/auth, sandbox cleanup, and related workflow notes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Agenta-AI/agenta#4848: Also updates the Python UI message adapter and tests for HITL approval/denial tool-part flows, including pending approval behavior and emitted tool frames.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the two main changes: HITL deny resume and dev error overlay guarding.
Description check	✅ Passed	The description is directly aligned with the code changes and explains the SDK ingress, frontend resume, and overlay fixes.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 60.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/agent-hitl-deny-resume

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

mmabrouk · 2026-06-25T21:28:18Z

What this needs from you: a code review + one UX confirmation on the deny behavior.

This fixes the F-036 HITL Deny dead-end and the F-033 dev-overlay nit. Two layers, both FE/SDK-side, no runner change, no /run wire change.

Specifically, please look at:

The deny semantics decision (UX/functionality). I chose FE-driven deny resume: after you Deny, the conversation re-sends, the runner maps deny -> reject -> tool-error, the tool card shows "Denied", and the model continues its turn. The alternative was a silent terminal short-circuit (stop with no continuation). I went with resume per the F-036 prompt ("prefer FE-driven deny resume over a silent terminal short-circuit"). Confirm that is the behavior you want: should the model keep going after a denied tool, or should a deny end the turn?
The ingress fix (code review). adapters/vercel/messages.py _tool_part_blocks now emits the {approved: bool} tool_result envelope for an inline approval-responded/output-denied tool part (the documented D-007 gap). This is the load-bearing fix; the runner's extractApprovalDecisions already consumes that envelope. Check the new else branch + _approval_decision helper don't over-fire (a pending approval-requested part must emit no decision — there's a test).
The FE resume predicate (code review). agentShouldResumeAfterApproval in @agenta/playground replaces the stock lastAssistantMessageIsCompleteWithApprovalResponses. The stock one already returns true for deny-only; the wrapper just pins + tests that contract. Sanity-check the predicate logic and that wiring it in both AgentChatPanel + AgentChatConversation is right.

Decisions logged as D-022 (deny resume) and D-023 (dev-overlay guard) in final-sweep-decisions.md. NOT merging to big-agents myself.

mmabrouk · 2026-06-25T21:28:23Z

@coderabbitai review

coderabbitai · 2026-06-25T21:28:29Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

web/packages/agenta-playground/tests/unit/agentApprovalResume.test.ts (1)

30-147: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Solid coverage; consider adding an output-denied state case.

The suite covers approve/deny/pending/unsettled/provider-executed paths well. The only gap is an explicit case for a tool part whose final state is output-denied (referenced by the ingress layer). Adding it would pin the cross-layer contract called out on agentApprovalResume.ts and prevent a silent regression if that state ever reaches the predicate.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: ca847a0a-9bef-42a0-b67f-721e1e8fcb38

📥 Commits

Reviewing files that changed from the base of the PR and between 6324757 and 3b77c7d.

📒 Files selected for processing (10)

docs/design/agent-workflows/projects/qa/final-sweep-decisions.md
sdks/python/agenta/sdk/agents/adapters/vercel/messages.py
sdks/python/oss/tests/pytest/unit/agents/test_ui_messages.py
web/oss/src/components/AgentChatSlice/AgentChatPanel.tsx
web/oss/src/components/AgentChatSlice/components/AgentChatConversation.tsx
web/packages/agenta-playground/src/index.ts
web/packages/agenta-playground/src/state/execution/agentApprovalResume.ts
web/packages/agenta-playground/src/state/execution/index.ts
web/packages/agenta-playground/src/state/index.ts
web/packages/agenta-playground/tests/unit/agentApprovalResume.test.ts

coderabbitai · 2026-06-25T21:35:10Z

+    def test_inline_output_denied_state_emits_denied_envelope(self):
+        # The AI SDK terminal deny state has no `approval.approved` flag; `output-denied`
+        # itself means denied, so the ingress still emits `{approved: False}`.
+        msgs = vercel_ui_messages_to_messages(
+            [
+                {
+                    "id": "m7",
+                    "role": "assistant",
+                    "parts": [
+                        {
+                            "type": "tool-deleteFile",
+                            "toolCallId": "call_3",
+                            "state": "output-denied",
+                            "input": {"path": "/z"},
+                            "approval": {"id": "perm_3", "approved": False},


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Exercise the output-denied fallback in this test.

This fixture still sets approval.approved: False, so _approval_decision() can return from the first branch and the state == "output-denied" path stays untested. Remove that flag here so the test covers the terminal-deny shape described in the comment.

Suggested fixture change

{ "type": "tool-deleteFile", "toolCallId": "call_3", "state": "output-denied", "input": {"path": "/z"}, - "approval": {"id": "perm_3", "approved": False}, + "approval": {"id": "perm_3"}, }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def test_inline_output_denied_state_emits_denied_envelope(self):

# The AI SDK terminal deny state has no `approval.approved` flag; `output-denied`

# itself means denied, so the ingress still emits `{approved: False}`.

msgs = vercel_ui_messages_to_messages(

[

{

"id": "m7",

"role": "assistant",

"parts": [

{

"type": "tool-deleteFile",

"toolCallId": "call_3",

"state": "output-denied",

"input": {"path": "/z"},

"approval": {"id": "perm_3", "approved": False},

def test_inline_output_denied_state_emits_denied_envelope(self):

# The AI SDK terminal deny state has no `approval.approved` flag; `output-denied`

# itself means denied, so the ingress still emits `{approved: False}`.

msgs = vercel_ui_messages_to_messages(

[

{

"id": "m7",

"role": "assistant",

"parts": [

{

"type": "tool-deleteFile",

"toolCallId": "call_3",

"state": "output-denied",

"input": {"path": "/z"},

"approval": {"id": "perm_3"},

github-actions · 2026-06-25T21:41:06Z

Railway Preview Environment


Image tag	`pr-4859-763f378`
Status	Failed
Railway logs	Open logs
Logs	View workflow run
Updated at 2026-06-25T21:41:05.487Z

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. Backend bug Something isn't working Frontend labels Jun 25, 2026

mmabrouk added the needs-review Agent updated; awaiting Mahmoud's review label Jun 25, 2026

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

mmabrouk merged commit 10f4af8 into big-agents Jun 25, 2026
33 of 34 checks passed

mmabrouk mentioned this pull request Jun 26, 2026

fix(agent): end a HITL park turn gracefully so Approve/Deny complete (F-040) #4869

Merged

coderabbitai Bot mentioned this pull request Jun 28, 2026

feat(frontend): queue agent chat messages typed during a stream #4903

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[fix] Resume HITL Deny end-to-end + guard the dev error overlay#4859

[fix] Resume HITL Deny end-to-end + guard the dev error overlay#4859
mmabrouk merged 1 commit into
big-agentsfrom
fix/agent-hitl-deny-resume

mmabrouk commented Jun 25, 2026

Uh oh!

vercel Bot commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

mmabrouk commented Jun 25, 2026

Uh oh!

mmabrouk commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 25, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mmabrouk commented Jun 25, 2026

Context

Changes

Scope / risk

Tests / notes

How to QA

Uh oh!

vercel Bot commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

mmabrouk commented Jun 25, 2026

Uh oh!

mmabrouk commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 25, 2026

Railway Preview Environment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading