fix(agent): scope HITL approval to the specific call + enable approve/deny promptly#4854
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughSummary by CodeRabbit
WalkthroughThe responder now matches HITL approvals by tool-call id or tool name plus stable arguments, and approval extraction indexes those keys from tool_call and tool_result blocks. The chat UI now marks only the latest message as streaming and handles approval-button state locally. ChangesHITL approval matching
Agent chat streaming controls
Sequence Diagram(s)sequenceDiagram
participant AgentRunRequest.messages
participant extractApprovalDecisions
participant approvalKey
AgentRunRequest.messages->>extractApprovalDecisions: tool_call and tool_result blocks
extractApprovalDecisions->>approvalKey: derive name+args key
extractApprovalDecisions-->>AgentRunRequest.messages: decisions keyed by toolCallId and approvalKey
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
Review focus (what I need eyes on):
Do NOT merge to |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
services/agent/tests/unit/responder.test.ts (1)
82-93: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd a live-wire fallback case for
title/kind.The helper comment references the ACP shape without
name, but every permission request here still usesname. Add one test withraw.toolCall.titleorkindand nonameso the production fallback inpermissionToolNamestays covered.web/oss/src/components/AgentChatSlice/components/ToolPart.tsx (1)
92-96: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winStabilize the approval click handlers.
The new
respond(true)/respond(false)inline handlers can be hoisted withuseCallbackto match the TSX callback guideline.As per coding guidelines,
web/**/*.tsxshould “UseuseCallbackto create stable callbacks in render instead of defining inline functions.”♻️ Proposed refactor
- const respond = (approved: boolean) => { + const respond = useCallback((approved: boolean) => { if (responding || !approval?.id) return setResponding(true) onApprovalResponse({id: approval.id, approved}) - } + }, [approval?.id, onApprovalResponse, responding]) + + const handleApprove = useCallback(() => respond(true), [respond]) + const handleDeny = useCallback(() => respond(false), [respond])- onClick={() => respond(true)} + onClick={handleApprove} ... - onClick={() => respond(false)} + onClick={handleDeny}Also applies to: 173-181
Source: Coding guidelines
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 8b9ed764-5434-4c00-9285-7ae319a699a6
📒 Files selected for processing (6)
services/agent/src/responder.tsservices/agent/tests/unit/responder.test.tsweb/oss/src/components/AgentChatSlice/AgentChatPanel.tsxweb/oss/src/components/AgentChatSlice/components/AgentChatConversation.tsxweb/oss/src/components/AgentChatSlice/components/AgentMessage.tsxweb/oss/src/components/AgentChatSlice/components/ToolPart.tsx
💤 Files with no reviewable changes (3)
- web/oss/src/components/AgentChatSlice/components/AgentChatConversation.tsx
- web/oss/src/components/AgentChatSlice/components/AgentMessage.tsx
- web/oss/src/components/AgentChatSlice/AgentChatPanel.tsx
Railway Preview Environment
|
…/deny promptly Two HITL findings from the final QA sweep. Finding 1 (HIGH): a stored permission-approval decision was keyed by BOTH the ACP toolCallId AND the bare tool NAME, so approving call A auto-approved any later call B to the same tool with different args — a HITL bypass. Replace the bare-name key with approvalKey(name, args) (name + a canonical, key-sorted JSON hash of the args) on both the live-request side (permissionRequestKeys) and the stored side (extractApprovalDecisions). Keep the toolCallId key. A cold-replayed run mints a fresh ACP tool-call id for the re-raised gate, so the id no longer matches — the name+args anchor resumes the SAME call while a DIFFERENT call (same name, different args) re-prompts. Resume and precision coexist. Finding 2 (F-026): the Approve/Deny buttons were gated on the conversation-level busy state, so they stayed disabled for the whole turn (~70-140s) and read as a hang. An approval request can only appear while the stream is in flight, and the AI SDK records the decision immediately and defers the resend until the stream settles, so the gate was wrong. Enable Approve/Deny as soon as the prompt renders; guard only against a double-submit. Drops the now-dead busy prop on AgentMessage and its two callers. Tests: responder unit suite (16 tests incl. the two required — different-args re-prompts, fresh-id resume by name+args); FE eslint clean. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
a7a189b to
be881b9
Compare
|
Codex xhigh review applied (round 2). Codex found the bare-name bypass was closed but
Out of scope (sibling SDK lane, flagged not fixed): Codex Finding 3 — live end-to-end Tests: full runner suite 252/252 green; typecheck clean. |
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
services/agent/src/responder.ts (1)
76-91: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAlign the
ApprovalDecisionscontract comment with the new extractor behavior.This block still says each stored decision is indexed under both
toolCallIdandapprovalKey(name, input), butextractApprovalDecisions()now persists only the name+args key and the replayed-id assertions depend on that. Leaving the old description here makes the matching rules look broader than they are.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 2194d933-64c3-42db-88a3-41b7d7d48a1e
📒 Files selected for processing (8)
services/agent/src/responder.tsservices/agent/tests/unit/responder.test.tsservices/agent/tests/unit/sandbox-agent-orchestration.test.tsservices/agent/tests/unit/sandbox-agent-permissions.test.tsweb/oss/src/components/AgentChatSlice/AgentChatPanel.tsxweb/oss/src/components/AgentChatSlice/components/AgentChatConversation.tsxweb/oss/src/components/AgentChatSlice/components/AgentMessage.tsxweb/oss/src/components/AgentChatSlice/components/ToolPart.tsx
💤 Files with no reviewable changes (3)
- web/oss/src/components/AgentChatSlice/components/AgentMessage.tsx
- web/oss/src/components/AgentChatSlice/components/AgentChatConversation.tsx
- web/oss/src/components/AgentChatSlice/AgentChatPanel.tsx
🚧 Files skipped from review as they are similar to previous changes (1)
- web/oss/src/components/AgentChatSlice/components/ToolPart.tsx
What and why
Two HITL findings from the final agent QA sweep, both in the tool-approval path.
Finding 1 (HIGH, Codex) — approval over-authorized by tool NAME
A stored approval decision was indexed by BOTH the ACP
toolCallIdAND the bare toolNAME. The bare-name key was a HITL bypass: approving call A auto-approved any later call B
to the same tool with different args.
edit({path:"a.txt"})→ a lateredit({path:"/etc/shadow"})(freshtool-call id) falls through to the
editname key and silently auto-runs.approvalKey(name, args)— the tool name bound to acanonical (key-sorted) hash of its args — not the bare name. The same call (same name +
args) resumes; a different call to the same tool re-prompts.
The name fallback exists because
/messagesis cold-replay: each turn rebuilds the sessionfrom the replayed transcript, so the harness mints a fresh ACP tool-call id for the
re-raised gate and the precise
toolCallIdno longer matches across turns. Binding thefallback to name + args keeps the resume working while closing the hole — precision and
resume coexist (no escalation).
Finding 2 (F-026) — Approve/Deny disabled ~70-140s, reads as a hang
The buttons were gated on the conversation-level
busy(status === submitted/streaming).But an approval request can only appear while the stream is in flight — so
busyisnecessarily true when the prompt first shows, leaving the buttons disabled for the whole
turn. The AI SDK is built for mid-stream answers:
addToolApprovalResponserecords thedecision immediately and only auto-resends once the stream settles. So the gate was wrong,
not a backend-park latency.
Fix: enable Approve/Deny as soon as the
approval-requestedpart renders; guard onlyagainst a double-submit. Drops the now-dead
busyprop onAgentMessageand its twocallers.
Files
services/agent/src/responder.ts—approvalKey(name,args)+canonicalJson; reworkpermissionRequestKeys(live) andextractApprovalDecisions(stored) to drop bare-name.services/agent/tests/unit/responder.test.ts— adds the two required tests (different-argscall re-prompts; parked call resumes under a fresh id via name+args) + a bare-name-not-a-key
guard; updates the existing lookups to name+args.
web/oss/src/components/AgentChatSlice/{components/ToolPart,components/AgentMessage,components/AgentChatConversation,AgentChatPanel}.tsx— enable Approve/Deny promptly; remove the dead
busyprop.Tests
responder.test.ts16/16 green; typecheck clean for these files (a pre-existingotel.tserror from a sibling lane is unrelated).Not done here (in-scope of sibling lanes)
The SDK ingress (
adapters/vercel/messages.py) does not currently emit the{approved}envelope for a verbatim
approval-respondedtool part; live resume depends on whateverdecision shape reaches
extractApprovalDecisions. The responder is correct + preciseregardless, and the resume property is pinned at the responder layer by unit tests. Logged
as D-007 in the QA decisions log.
https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc