Escape raw HTML in assistant/tool content to fix XSS #245
Conversation
Transcript content containing raw HTML — e.g. a session where the user asks Claude to "write an E2E test that types <img src=x onerror=alert(1)><script>alert(1)</script> into the field" — rendered those tags live into the HTML output. Opening the transcript fired the payloads (alert popups on every mention), and a malicious transcript could run arbitrary script. Root cause: the HTML renderer kept two markdown renderers — escape=True for user content (#119) and escape=False for assistant/tool/web content, on the assumption that the latter is trusted. It is not: the assistant routinely echoes arbitrary user/file/web input verbatim, and WebFetch results are raw web content. The Markdown output renderer already neutralises raw HTML from every source (_protect_html_tags); the HTML path was the lone, more dangerous, inconsistency. Fix: flip the shared render_markdown renderer to escape=True. This covers all affected paths at once (assistant prose, thinking, system messages, teammate bodies, Task/WebSearch/WebFetch results, plans). Markdown, code fences, Pygments highlighting and SHA links are unaffected; unsafe link/image schemes (javascript:, data:) were already neutralised by mistune. No snapshot churn — benign content renders identically. Tests (verified to go red on the pre-fix code): - test_markdown_rendering.py: shared renderer escapes raw HTML; assistant formatter end-to-end; javascript:/data: URL-scheme guard. - test_xss_browser.py: empirical Playwright test — opens the generated file in real Chromium and asserts no alert() dialog fires and no content-supplied nodes materialise in the DOM. Also documents the "all transcript content is untrusted" escaping contract for tool-renderer authors in dev-docs.
Adds uv_venv_auto and pins the project venv path so mise/uv share the same .venv.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughEscapes transcript-derived HTML in markdown and title rendering, adds XSS regression tests for markdown and browser flows, updates escaping guidance, and configures local Python environment settings. ChangesXSS Hardening and Regression Coverage
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@claude_code_log/html/utils.py`:
- Line 467: The helper documentation is now out of sync with the shared
assistant/tool renderer behavior. Update the `_markdown_collapsible` docstring
so it no longer says assistant/tool output uses `escape=False`; instead,
describe the current `escape=True` behavior and keep the XSS contract consistent
with the renderer change in `escape=True` for assistant/tool content.
In `@dev-docs/implementing-a-tool-renderer.md`:
- Around line 180-203: The escaping guidance has a variable-name mismatch in the
example reference, which breaks the cross-reference for readers. Update the
mention in the “Building HTML with f-strings/format” bullet to match the actual
input formatter symbol, using the same escaped query variable name defined
earlier, and keep the reference consistent with the escaping example in this
section.
In `@test/test_xss_browser.py`:
- Line 132: The file navigation in the browser test is manually constructing a
file:// URL from a Path, which can break on Windows and misses proper encoding.
Update the page.goto call in the test_xss_browser flow to use html_path.as_uri()
instead, and apply the same fix to the other file-based navigation in the same
test if present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8b528d9e-a648-49f5-9e6a-54b2e26346fa
📒 Files selected for processing (5)
claude_code_log/html/utils.pydev-docs/implementing-a-tool-renderer.mdmise.tomltest/test_markdown_rendering.pytest/test_xss_browser.py
- _markdown_collapsible docstring no longer describes the removed escape=False/escape=True split; both render fns now escape (XSS). - dev-docs: escape_query -> escaped_query to match the input formatter. - test_xss_browser: page.goto(html_path.as_uri()) instead of manually building a file:// URL (correct encoding, Windows-safe).
|
@cboos I thought we already fixed this and judging by the comment this might break something? Do you remember? |
If you refer to the comment "Don't escape HTML since we want to render markdown properly", then the ball is on your court as for the why ;-) See 6079ea7. I think the change is good, and the user asks Claude to "write an E2E test that types |
|
(Claude) Security review: the core fix is correct but incomplete — there is a second, parallel class of live-HTML sink that this change does not cover. Posting findings now; we're actively WIP on the remaining fixes (not just flagging). What this PR gets right ✅
What's still exposed ❌ — the title path (
|
| Sink | Where | Field | Status |
|---|---|---|---|
| Generic / MCP / unknown tool name | base title_ToolUseMessage → return content.tool_name |
tool_use.name |
Confirmed end-to-end — {"type":"tool_use","name":"<img src=x onerror=alert(1)>"} renders live in the header |
| Hook attachment | f"Hook · {label}" |
attachment.hookName |
Confirmed end-to-end |
| Workflow phase | f"Phase: {content.title}" |
snapshot phase title | Confirmed at title layer (same ` |
| Workflow agent | f"Agent {content.label}" |
snapshot agent label | Confirmed at title layer |
The first is the most serious: tool names are directly attacker-controlled, and every tool without a specialized title method (MCP mcp__*, ToolSearch, custom/unknown) hits the unescaped fallback. (Specialized tools — Bash/Edit/Task/etc. — escape correctly via _tool_title, which is why this stayed hidden. Session headers and nav previews are not sinks; I checked.)
Root cause of the gap: the regression test scopes its payloads to "channels that flow through the shared Markdown renderer" — i.e. the fix's own scope, not the threat's — so it passes while blind to titles. And the new escaping contract is correct (it mandates escape_html() for f-string interpolation) but the existing title methods were never audited against the rule it documents.
Recommended fixes (WIP)
- Per-field
escape_html()in the four title methods (content.tool_namefallback, hook label, phase title, agent label) — per-field, not central, since titles legitimately emit structural HTML (e.g. the collapsed-transcript marker span). - Add a malicious-
tool_namepayload totest_xss_browser.py(simplest regression pin; catches the confirmed-exploitable case). - Fix the stale docstring at
claude_code_log/html/utils.py:513—_get_user_markdown_rendererstill says "Assistant content usesescape=Falsedeliberately", which contradicts this change.
Verdict: changes needed before this closes #244 as complete. The markdown-body half is solid and well-tested; the title path needs the same treatment.
The body fix switched the shared Markdown renderer to escape=True, but the
header title path is separate: the template renders `{{ message_title | safe }}`
with no central escaping (titles legitimately emit structural HTML, e.g. the
spawn-collapsed marker span), so each title_* method must escape its own
interpolated field. Four did not — all transcript-derived and attacker-influenced:
- generic / mcp__* / custom tool name (title_ToolUseMessage fallback) — the
most severe: a tool_use with name `<img src=x onerror=…>` rendered live in
the header. Specialized tools were already escaped via _tool_title.
- hook name (title_HookAttachmentMessage)
- workflow phase title (title_WorkflowPhaseMessage)
- workflow agent label (title_WorkflowAgentMessage)
These four live on the shared base Renderer (also used by the Markdown
renderer, which must not receive HTML-entity-escaped titles), so the
escaping is added as HtmlRenderer overrides — mirroring how _tool_title
already escapes the tool name for specialized tools.
Also fix the now-stale `_get_user_markdown_renderer` docstring that still
claimed assistant content uses escape=False (the shared renderer escapes now).
Tests: test_xss_titles.py pins all four sinks (RED before, GREEN after);
test_xss_browser.py gains a title-path payload (a generic tool_use whose name
is the payload) — without the fix it fires script-xss/img-xss dialogs.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@claude_code_log/html/renderer.py`:
- Around line 952-954: The title selection in renderer.py’s _dispatch_title path
is treating empty strings as missing values, so intentional empty specialized
titles fall back to escape_html(content.tool_name). Update the conditional
around self._dispatch_title(content.input, message) to distinguish None from an
empty string, preserving any handled empty title while still using the tool-name
fallback only when _dispatch_title returns None.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f6e587d0-f55d-424f-bb23-a89745e3df71
📒 Files selected for processing (4)
claude_code_log/html/renderer.pyclaude_code_log/html/utils.pytest/test_xss_browser.pytest/test_xss_titles.py
🚧 Files skipped from review as they are similar to previous changes (2)
- claude_code_log/html/utils.py
- test/test_xss_browser.py
|
(Claude) Re-review update. The latest push escapes the four title sinks from my earlier comment — verified end-to-end (generic/MCP tool names, hook attachments, workflow phase/agent labels now render escaped; specialized tools still escape correctly via
Both are being addressed now; expecting one more push. |
…tput (XSS)
Round 2 of the title-path XSS hardening:
- 5th HTML sink: title_SystemMessage rendered `System {level.title()}` raw.
`level` is FREE-TEXT from the transcript (factory: `level or "info"`), not
an enum, so a system entry with level `<img src=x onerror=…>` rendered live
in the header. Add an HtmlRenderer override that escapes — title-casing the
raw level FIRST, then escaping (escaping first lets `.title()` capitalize
the entity prefixes, `<` → `≪`, breaking them).
- Markdown output: the format-neutral heading site emitted `# {title}` raw,
so the same title fields (generic tool name, hook/phase/agent label, system
level) reached the .md unescaped — making the PR's "Markdown neutralises raw
HTML from every source" guarantee false. Neutralise the title there with
the markdown-appropriate `_protect_html_tags` (not escape_html). Gate on a
literal `<`: the mistune round-trip re-normalises markdown escaping
(`\*\*` → `\**`), so titles with no tag must pass through untouched — avoids
corrupting legit markdown titles while still closing the tag-injection path.
Tests: test_xss_titles.py gains the System-level HTML case and a Markdown-path
case (both RED before, GREEN after); the specialized-tool and no-churn
snapshot guards confirm no collateral damage.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Round 3 — close the Markdown "neutralises raw HTML from every source"
guarantee honestly. The per-message heading was gated last round, but the
page / project-index / session page headings still emitted transcript-
reachable text raw: a crafted cwd flows through get_project_display_name
(Path(cwd).name) into the `## {display_name}` index heading, and page titles
derive from session summaries.
Factor a single `_safe_md_heading(text)` helper (the `<`-gated
`_protect_html_tags`) and route EVERY `#`/`##` heading through it — the
per-message site plus the four page/project headings — so neutralisation is
one auditable structural gate rather than a per-site convention that drifts
when a new heading site is added. For the project link form, only the
display_name fragment is neutralised so the link target is preserved.
The HTML path is unchanged (its per-field HtmlRenderer title overrides are a
separate, correct escaping path; the page <title>/<h1> autoescape in the
template). The `<`-gate keeps tag-free headings byte-identical — no snapshot
churn.
Test: a crafted-cwd → display_name → index heading regression (exercises the
real path, not a synthetic title arg), plus the existing per-message and
tool_name markdown cases.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
(a few more changes coming this evening) |
The heading gate closed the `#`/`##` heading sinks, but the Markdown output has a broader class of inline link-label/list surfaces that interpolate transcript-derived text raw — so "neutralise raw HTML from every source" wasn't yet true. Rename `_safe_md_heading` → `safe_markdown_inline` (now module-public) and route EVERY Markdown interpolation surface through it: the inline surfaces (expand-paths tree label, WebSearch result link title, TOC label, index session-link label) plus the headings it already covered. In `[label](url)` forms only the label fragment is neutralised — the destination is preserved. One auditable structural gate instead of a per-site convention that drifts; the `<`-guard keeps tag-free text byte-identical (no snapshot churn). Tests: a `safe_markdown_inline` gate unit + per-surface end-to-end regressions (WebSearch link, project index heading + session link, expand-paths label). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A plugin author told to "protect for XSS" could not succeed from the API or docs: the escaping helpers weren't on the stable plugin surface, and plugins.md had no security guidance. A plugin interpolating transcript data into title()/format_html/format_markdown reproduced the exact sinks this issue closed in the core. - API: re-export `escape_html` (format_html f-strings + a plugin's own title() escaping) and `safe_markdown_inline` (transcript text built into Markdown source) from `claude_code_log.plugins` — added to the PEP-562 lazy re-export + `__all__`, alongside `render_markdown[_collapsible]`. - Docs: plugins.md §4.2 "Security-conscious rendering" — untrusted-every-source rule, a helper-per-context table covering both output paths, a sink-class self-audit checklist, the one-structural-gate principle, and a cross-link to implementing-a-tool-renderer.md. Annotated the §4 method table (title / format_html / format_markdown rows) and changed the title()/format_markdown example from constants to interpolated-and-escaped so the risk is visible. Added a §9 XSS-payload test row. Tests: API re-export + behaviour; a plugin-author-shaped subclass whose render methods follow the contract → safe on BOTH the HTML and Markdown paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Transcript content containing raw HTML — e.g. a session where the user
asks Claude to "write an E2E test that types
<img src=x onerror=alert(1)><script>alert(1)</script>into the field" —rendered those tags live into the HTML output. Opening the transcript
fired the payloads (alert popups on every mention), and a malicious
transcript could run arbitrary script.
Root cause: the HTML renderer kept two markdown renderers — escape=True
for user content (#119) and escape=False for assistant/tool/web content,
on the assumption that the latter is trusted. It is not: the assistant
routinely echoes arbitrary user/file/web input verbatim, and WebFetch
results are raw web content. The Markdown output renderer already
neutralises raw HTML from every source (_protect_html_tags); the HTML
path was the lone, more dangerous, inconsistency.
Fix: flip the shared render_markdown renderer to escape=True. This covers
all affected paths at once (assistant prose, thinking, system messages,
teammate bodies, Task/WebSearch/WebFetch results, plans). Markdown,
code fences, Pygments highlighting and SHA links are unaffected; unsafe
link/image schemes (javascript:, data:) were already neutralised by
mistune. No snapshot churn — benign content renders identically.
Tests (verified to go red on the pre-fix code):
formatter end-to-end; javascript:/data: URL-scheme guard.
file in real Chromium and asserts no alert() dialog fires and no
content-supplied nodes materialise in the DOM.
Also documents the "all transcript content is untrusted" escaping
contract for tool-renderer authors in dev-docs.
Fixes #244
Summary by CodeRabbit