Escape raw HTML in assistant/tool content to fix XSS by daaain · Pull Request #245 · daaain/claude-code-log

daaain · 2026-06-27T19:36:01Z

Transcript content containing raw HTML — e.g. a session where the user
asks Claude to "write an E2E test that types
<img src=x onerror=alert(1)><script>alert(1)</script> into the field" —
rendered those tags live into the HTML output. Opening the transcript
fired the payloads (alert popups on every mention), and a malicious
transcript could run arbitrary script.

Root cause: the HTML renderer kept two markdown renderers — escape=True
for user content (#119) and escape=False for assistant/tool/web content,
on the assumption that the latter is trusted. It is not: the assistant
routinely echoes arbitrary user/file/web input verbatim, and WebFetch
results are raw web content. The Markdown output renderer already
neutralises raw HTML from every source (_protect_html_tags); the HTML
path was the lone, more dangerous, inconsistency.

Fix: flip the shared render_markdown renderer to escape=True. This covers
all affected paths at once (assistant prose, thinking, system messages,
teammate bodies, Task/WebSearch/WebFetch results, plans). Markdown,
code fences, Pygments highlighting and SHA links are unaffected; unsafe
link/image schemes (javascript:, data:) were already neutralised by
mistune. No snapshot churn — benign content renders identically.

Tests (verified to go red on the pre-fix code):

test_markdown_rendering.py: shared renderer escapes raw HTML; assistant
formatter end-to-end; javascript:/data: URL-scheme guard.
test_xss_browser.py: empirical Playwright test — opens the generated
file in real Chromium and asserts no alert() dialog fires and no
content-supplied nodes materialise in the DOM.

Also documents the "all transcript content is untrusted" escaping
contract for tool-renderer authors in dev-docs.

Fixes #244

Summary by CodeRabbit

Bug Fixes
- Escape raw HTML in shared transcript Markdown so it renders as text (not live DOM).
- Harden HTML “message title”/tool header rendering by escaping transcript-derived labels, including system level.
- Neutralize raw HTML-like content in Markdown heading titles; project/project section headings are now protected.
Documentation
- Added guidance that all transcript-derived content is untrusted, with safe Markdown/HTML rendering paths.
Tests
- Added unit tests and Playwright regressions covering HTML, Markdown, titles, and unsafe URL/payload handling.
Chores
- Updated local Python/tooling configuration.

Transcript content containing raw HTML — e.g. a session where the user asks Claude to "write an E2E test that types <img src=x onerror=alert(1)><script>alert(1)</script> into the field" — rendered those tags live into the HTML output. Opening the transcript fired the payloads (alert popups on every mention), and a malicious transcript could run arbitrary script. Root cause: the HTML renderer kept two markdown renderers — escape=True for user content (#119) and escape=False for assistant/tool/web content, on the assumption that the latter is trusted. It is not: the assistant routinely echoes arbitrary user/file/web input verbatim, and WebFetch results are raw web content. The Markdown output renderer already neutralises raw HTML from every source (_protect_html_tags); the HTML path was the lone, more dangerous, inconsistency. Fix: flip the shared render_markdown renderer to escape=True. This covers all affected paths at once (assistant prose, thinking, system messages, teammate bodies, Task/WebSearch/WebFetch results, plans). Markdown, code fences, Pygments highlighting and SHA links are unaffected; unsafe link/image schemes (javascript:, data:) were already neutralised by mistune. No snapshot churn — benign content renders identically. Tests (verified to go red on the pre-fix code): - test_markdown_rendering.py: shared renderer escapes raw HTML; assistant formatter end-to-end; javascript:/data: URL-scheme guard. - test_xss_browser.py: empirical Playwright test — opens the generated file in real Chromium and asserts no alert() dialog fires and no content-supplied nodes materialise in the DOM. Also documents the "all transcript content is untrusted" escaping contract for tool-renderer authors in dev-docs.

Adds uv_venv_auto and pins the project venv path so mise/uv share the same .venv.

coderabbitai · 2026-06-27T19:36:15Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Escapes transcript-derived HTML in markdown and title rendering, adds XSS regression tests for markdown and browser flows, updates escaping guidance, and configures local Python environment settings.

Changes

XSS Hardening and Regression Coverage

Layer / File(s)	Summary
Escape markdown output `claude_code_log/html/utils.py`, `claude_code_log/markdown/renderer.py`, `test/test_markdown_rendering.py`	Shared markdown rendering now escapes raw HTML, markdown headings sanitize titles containing `<`, and tests cover escaped HTML, dangerous URL schemes, and assistant text output.
Escape transcript-derived titles `claude_code_log/html/renderer.py`, `test/test_xss_titles.py`	HTML title generation now escapes transcript-derived fields for tool-use, hook attachment, workflow phase, workflow agent, and system messages, with regression tests covering those title sinks and the markdown heading path.
Browser XSS regression suite `test/test_xss_browser.py`	Adds a transcript fixture with payloads across assistant, tool input, tool-result, and title-related content, plus browser assertions for no dialogs and no injected DOM nodes.
Escaping guidance and local Python settings `dev-docs/implementing-a-tool-renderer.md`, `mise.toml`	Adds escaping guidance for transcript content and configures Python venv auto-source behavior and the local `.venv` path.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

daaain/claude-code-log#165: Also modifies claude_code_log/markdown/renderer.py heading rendering on the same markdown output path.
daaain/claude-code-log#189: Also touches claude_code_log/html/renderer.py tool-title rendering at the same HTML title hook.
daaain/claude-code-log#204: Also updates the shared Markdown rendering pipeline and related escaping behavior in claude_code_log/html/utils.py.

Poem

🐇 I nibbled the script tags, one by one,
and tucked their sharp edges away from the sun.
Now transcripts stay fluffy, and safe to unfold,
with no popup surprises and no HTML gold.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Out of Scope Changes check	⚠️ Warning	mise.toml adds Python and venv configuration that is unrelated to the XSS fix.	Move the mise.toml tooling/runtime changes to a separate PR unless they are required for this fix.
Docstring Coverage	⚠️ Warning	Docstring coverage is 51.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title is concise and accurately summarizes the main XSS-hardening change.
Linked Issues check	✅ Passed	The PR addresses `#244` by escaping transcript-derived HTML and adding regressions for live tags, scripts, and unsafe URLs.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/xss-issue

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@claude_code_log/html/utils.py`:
- Line 467: The helper documentation is now out of sync with the shared
assistant/tool renderer behavior. Update the `_markdown_collapsible` docstring
so it no longer says assistant/tool output uses `escape=False`; instead,
describe the current `escape=True` behavior and keep the XSS contract consistent
with the renderer change in `escape=True` for assistant/tool content.

In `@dev-docs/implementing-a-tool-renderer.md`:
- Around line 180-203: The escaping guidance has a variable-name mismatch in the
example reference, which breaks the cross-reference for readers. Update the
mention in the “Building HTML with f-strings/format” bullet to match the actual
input formatter symbol, using the same escaped query variable name defined
earlier, and keep the reference consistent with the escaping example in this
section.

In `@test/test_xss_browser.py`:
- Line 132: The file navigation in the browser test is manually constructing a
file:// URL from a Path, which can break on Windows and misses proper encoding.
Update the page.goto call in the test_xss_browser flow to use html_path.as_uri()
instead, and apply the same fix to the other file-based navigation in the same
test if present.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8b528d9e-a648-49f5-9e6a-54b2e26346fa

📥 Commits

Reviewing files that changed from the base of the PR and between 4bd633f and b3bda89.

📒 Files selected for processing (5)

claude_code_log/html/utils.py
dev-docs/implementing-a-tool-renderer.md
mise.toml
test/test_markdown_rendering.py
test/test_xss_browser.py

- _markdown_collapsible docstring no longer describes the removed escape=False/escape=True split; both render fns now escape (XSS). - dev-docs: escape_query -> escaped_query to match the input formatter. - test_xss_browser: page.goto(html_path.as_uri()) instead of manually building a file:// URL (correct encoding, Windows-safe).

daaain · 2026-06-27T20:12:44Z

@cboos I thought we already fixed this and judging by the comment this might break something? Do you remember?

cboos · 2026-06-28T09:38:09Z

@cboos I thought we already fixed this and judging by the comment this might break something? Do you remember?

If you refer to the comment "Don't escape HTML since we want to render markdown properly", then the ball is on your court as for the why ;-) See 6079ea7. I think the change is good, and the user asks Claude to "write an E2E test that types <img src=x onerror=alert(1)><script>alert(1)</script> into the field" scenario is a convincing example of why we'd need to do this. The WebSearch/WebFetch is another, although I also was under the impression that we had already handled it?

cboos · 2026-06-28T17:38:21Z

(Claude) Security review: the core fix is correct but incomplete — there is a second, parallel class of live-HTML sink that this change does not cover. Posting findings now; we're actively WIP on the remaining fixes (not just flagging).

What this PR gets right ✅

Flipping the shared render_markdown renderer to escape=True correctly secures the message-body path ({{ html_content | safe }}, content routed through mistune): raw <script>/<img onerror>/<svg onload> escape to entities, and unsafe link/image schemes (javascript:, data:, vbscript:) are rewritten to #harmful-link. Verified with payloads.
No regressions: markdown, code fences, Pygments highlighting, SHA links, lists and links all still render; no double-escaping; snapshots unchanged; full unit suite green.
The Playwright test is genuine — reverting just the escape=True flag turns both test_xss_browser.py cases RED (alert fires, content nodes materialize), green on the fix.
WebSearch / WebFetch are fully covered. Audited every field with payloads: result link titles, URLs and summary, and the WebFetch body all flow through render_markdown_collapsible (escape=True) — HTML-escaped and scheme-sanitized; the input query/URL go through _tool_title → escape_html; the metadata badge is numeric. No web field reaches an unescaped sink.

What's still exposed ❌ — the title path (`{{ message_title | safe }}`)

There are two independent |safe DOM sinks, not one. Titles are set straight from rendered_title with no central escaping, so each title_* method must escape its own interpolation — and four of them interpolate transcript-derived text raw. A crafted transcript opened in a browser still fires alert():

Sink	Where	Field	Status
Generic / MCP / unknown tool name	base `title_ToolUseMessage` → `return content.tool_name`	`tool_use.name`	Confirmed end-to-end — `{"type":"tool_use","name":"<img src=x onerror=alert(1)>"}` renders live in the header
Hook attachment	`f"Hook · {label}"`	`attachment.hookName`	Confirmed end-to-end
Workflow phase	`f"Phase: {content.title}"`	snapshot phase title	Confirmed at title layer (same `
Workflow agent	`f"Agent {content.label}"`	snapshot agent label	Confirmed at title layer

The first is the most serious: tool names are directly attacker-controlled, and every tool without a specialized title method (MCP mcp__*, ToolSearch, custom/unknown) hits the unescaped fallback. (Specialized tools — Bash/Edit/Task/etc. — escape correctly via _tool_title, which is why this stayed hidden. Session headers and nav previews are not sinks; I checked.)

Root cause of the gap: the regression test scopes its payloads to "channels that flow through the shared Markdown renderer" — i.e. the fix's own scope, not the threat's — so it passes while blind to titles. And the new escaping contract is correct (it mandates escape_html() for f-string interpolation) but the existing title methods were never audited against the rule it documents.

Recommended fixes (WIP)

Per-field escape_html() in the four title methods (content.tool_name fallback, hook label, phase title, agent label) — per-field, not central, since titles legitimately emit structural HTML (e.g. the collapsed-transcript marker span).
Add a malicious-tool_name payload to test_xss_browser.py (simplest regression pin; catches the confirmed-exploitable case).
Fix the stale docstring at claude_code_log/html/utils.py:513 — _get_user_markdown_renderer still says "Assistant content uses escape=False deliberately", which contradicts this change.

Verdict: changes needed before this closes #244 as complete. The markdown-body half is solid and well-tested; the title path needs the same treatment.

The body fix switched the shared Markdown renderer to escape=True, but the header title path is separate: the template renders `{{ message_title | safe }}` with no central escaping (titles legitimately emit structural HTML, e.g. the spawn-collapsed marker span), so each title_* method must escape its own interpolated field. Four did not — all transcript-derived and attacker-influenced: - generic / mcp__* / custom tool name (title_ToolUseMessage fallback) — the most severe: a tool_use with name `<img src=x onerror=…>` rendered live in the header. Specialized tools were already escaped via _tool_title. - hook name (title_HookAttachmentMessage) - workflow phase title (title_WorkflowPhaseMessage) - workflow agent label (title_WorkflowAgentMessage) These four live on the shared base Renderer (also used by the Markdown renderer, which must not receive HTML-entity-escaped titles), so the escaping is added as HtmlRenderer overrides — mirroring how _tool_title already escapes the tool name for specialized tools. Also fix the now-stale `_get_user_markdown_renderer` docstring that still claimed assistant content uses escape=False (the shared renderer escapes now). Tests: test_xss_titles.py pins all four sinks (RED before, GREEN after); test_xss_browser.py gains a title-path payload (a generic tool_use whose name is the payload) — without the fix it fires script-xss/img-xss dialogs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@claude_code_log/html/renderer.py`:
- Around line 952-954: The title selection in renderer.py’s _dispatch_title path
is treating empty strings as missing values, so intentional empty specialized
titles fall back to escape_html(content.tool_name). Update the conditional
around self._dispatch_title(content.input, message) to distinguish None from an
empty string, preserving any handled empty title while still using the tool-name
fallback only when _dispatch_title returns None.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f6e587d0-f55d-424f-bb23-a89745e3df71

📥 Commits

Reviewing files that changed from the base of the PR and between ad03cb9 and eacfb88.

📒 Files selected for processing (4)

claude_code_log/html/renderer.py
claude_code_log/html/utils.py
test/test_xss_browser.py
test/test_xss_titles.py

🚧 Files skipped from review as they are similar to previous changes (2)

claude_code_log/html/utils.py
test/test_xss_browser.py

cboos · 2026-06-28T18:06:25Z

(Claude) Re-review update. The latest push escapes the four title sinks from my earlier comment — verified end-to-end (generic/MCP tool names, hook attachments, workflow phase/agent labels now render escaped; specialized tools still escape correctly via _tool_title; no regressions; tests genuine). Two items remain before this fully closes #244:

A fifth title sink I missed in my first pass (my error, not the author's): title_SystemMessage → f"System {level.title()}". The level field is free-text from the transcript (transcript.level or "info"), not an enum — a system entry with level: "<img src=x onerror=alert(1)>" still renders live in the header (.title() case-folds but the tag survives; HTML attributes are case-insensitive). Same class, same fix: an HTML-side override that escapes the field (escape after .title(), so the entities aren't mangled).
The Markdown output path has the same title gap (caught during this round): markdown/renderer.py emits headings as f"{'#'*level} {title}" with no protection, so ### <img src=x onerror=...> lands raw in the .md. Lower severity than the browser-opened HTML (it depends on the external Markdown viewer), but it makes the PR's "Markdown output neutralises raw HTML from every source" claim untrue. Recommending it be fixed here via _protect_html_tags on the title (the Markdown-appropriate neutralizer, applied at the one format-neutral title site).

Both are being addressed now; expecting one more push.

…tput (XSS) Round 2 of the title-path XSS hardening: - 5th HTML sink: title_SystemMessage rendered `System {level.title()}` raw. `level` is FREE-TEXT from the transcript (factory: `level or "info"`), not an enum, so a system entry with level `<img src=x onerror=…>` rendered live in the header. Add an HtmlRenderer override that escapes — title-casing the raw level FIRST, then escaping (escaping first lets `.title()` capitalize the entity prefixes, `<` → `&Lt;`, breaking them). - Markdown output: the format-neutral heading site emitted `# {title}` raw, so the same title fields (generic tool name, hook/phase/agent label, system level) reached the .md unescaped — making the PR's "Markdown neutralises raw HTML from every source" guarantee false. Neutralise the title there with the markdown-appropriate `_protect_html_tags` (not escape_html). Gate on a literal `<`: the mistune round-trip re-normalises markdown escaping (`\*\*` → `\**`), so titles with no tag must pass through untouched — avoids corrupting legit markdown titles while still closing the tag-injection path. Tests: test_xss_titles.py gains the System-level HTML case and a Markdown-path case (both RED before, GREEN after); the specialized-tool and no-churn snapshot guards confirm no collateral damage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Round 3 — close the Markdown "neutralises raw HTML from every source" guarantee honestly. The per-message heading was gated last round, but the page / project-index / session page headings still emitted transcript- reachable text raw: a crafted cwd flows through get_project_display_name (Path(cwd).name) into the `## {display_name}` index heading, and page titles derive from session summaries. Factor a single `_safe_md_heading(text)` helper (the `<`-gated `_protect_html_tags`) and route EVERY `#`/`##` heading through it — the per-message site plus the four page/project headings — so neutralisation is one auditable structural gate rather than a per-site convention that drifts when a new heading site is added. For the project link form, only the display_name fragment is neutralised so the link target is preserved. The HTML path is unchanged (its per-field HtmlRenderer title overrides are a separate, correct escaping path; the page <title>/<h1> autoescape in the template). The `<`-gate keeps tag-free headings byte-identical — no snapshot churn. Test: a crafted-cwd → display_name → index heading regression (exercises the real path, not a synthetic title arg), plus the existing per-message and tool_name markdown cases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cboos · 2026-06-29T12:02:57Z

(a few more changes coming this evening)

The heading gate closed the `#`/`##` heading sinks, but the Markdown output has a broader class of inline link-label/list surfaces that interpolate transcript-derived text raw — so "neutralise raw HTML from every source" wasn't yet true. Rename `_safe_md_heading` → `safe_markdown_inline` (now module-public) and route EVERY Markdown interpolation surface through it: the inline surfaces (expand-paths tree label, WebSearch result link title, TOC label, index session-link label) plus the headings it already covered. In `[label](url)` forms only the label fragment is neutralised — the destination is preserved. One auditable structural gate instead of a per-site convention that drifts; the `<`-guard keeps tag-free text byte-identical (no snapshot churn). Tests: a `safe_markdown_inline` gate unit + per-surface end-to-end regressions (WebSearch link, project index heading + session link, expand-paths label). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A plugin author told to "protect for XSS" could not succeed from the API or docs: the escaping helpers weren't on the stable plugin surface, and plugins.md had no security guidance. A plugin interpolating transcript data into title()/format_html/format_markdown reproduced the exact sinks this issue closed in the core. - API: re-export `escape_html` (format_html f-strings + a plugin's own title() escaping) and `safe_markdown_inline` (transcript text built into Markdown source) from `claude_code_log.plugins` — added to the PEP-562 lazy re-export + `__all__`, alongside `render_markdown[_collapsible]`. - Docs: plugins.md §4.2 "Security-conscious rendering" — untrusted-every-source rule, a helper-per-context table covering both output paths, a sink-class self-audit checklist, the one-structural-gate principle, and a cross-link to implementing-a-tool-renderer.md. Annotated the §4 method table (title / format_html / format_markdown rows) and changed the title()/format_markdown example from constants to interpolated-and-escaped so the risk is visible. Added a §9 XSS-payload test row. Tests: API re-export + behaviour; a plugin-author-shaped subclass whose render methods follow the contract → safe on BOTH the HTML and Markdown paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

daaain added 2 commits June 27, 2026 20:33

Configure mise to use the local .venv for uv

b3bda89

Adds uv_venv_auto and pins the project venv path so mise/uv share the same .venv.

coderabbitai Bot reviewed Jun 27, 2026

View reviewed changes

Comment thread claude_code_log/html/utils.py

Comment thread dev-docs/implementing-a-tool-renderer.md

Comment thread test/test_xss_browser.py Outdated

coderabbitai Bot reviewed Jun 28, 2026

View reviewed changes

Comment thread claude_code_log/html/renderer.py

cboos and others added 2 commits June 28, 2026 20:14

cboos and others added 2 commits June 29, 2026 19:00

cboos merged commit 8c974bf into main Jun 30, 2026
17 checks passed

Conversation

daaain commented Jun 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

daaain commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cboos commented Jun 28, 2026

Uh oh!

cboos commented Jun 28, 2026

What this PR gets right ✅

What's still exposed ❌ — the title path ({{ message_title | safe }})

Recommended fixes (WIP)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cboos commented Jun 28, 2026

Uh oh!

cboos commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

daaain commented Jun 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 27, 2026 •

edited

Loading

daaain commented Jun 27, 2026 •

edited

Loading

What's still exposed ❌ — the title path (`{{ message_title | safe }}`)