Skip to content

Speed up HTML escaping by copying safe spans in bulk#502

Open
deliro wants to merge 1 commit into
lambda-fairy:mainfrom
deliro:optimize-html-escaping
Open

Speed up HTML escaping by copying safe spans in bulk#502
deliro wants to merge 1 commit into
lambda-fairy:mainfrom
deliro:optimize-html-escaping

Conversation

@deliro

@deliro deliro commented Jun 19, 2026

Copy link
Copy Markdown

Why

escape_to_string is on the hot path for every dynamic value rendered at runtime. It currently pushes the input one byte at a time (via unsafe { output.as_mut_vec().push(b) } for the common, non-special case). This rewrites it to scan for the next character that needs escaping and copy the whole preceding run of safe bytes in a single push_str.

Side benefits:

  • Drops the unsafe block — the new code is entirely safe.
  • All four escaped characters are single-byte ASCII (< 0x80), so they never fall inside a multi-byte UTF-8 sequence; slicing at their indices is always on a character boundary.

Behaviour is unchanged (same byte-set, same output). Both copies of the function (maud/src/escape.rs and maud_macros/src/escape.rs) are kept in sync as the header comment requires, and I added edge-case tests: empty input, no specials, all specials, adjacent/boundary specials, multi-byte UTF-8, and appending to a non-empty buffer.

Benchmarks

cargo +nightly bench -p maud, Apple M-series, 3 runs each, median:

Benchmark before after
render_long_text (long user prose through the escaper) ~325 ns ~235 ns −28%
render_template (short splices) ~102 ns ~104 ns within noise
render_complicated_template ~677 ns ~688 ns within noise

The win shows up on text-heavy output (article bodies, comments, descriptions). Templates that only splice short strings are unaffected, since static markup is escaped at compile time and never reaches this function at runtime. A new render_long_text benchmark is included to cover this workload.

Note: replacing the for loop with iterator combinators does not enable autovectorization here — the side-effectful body (push_str) and the loop-carried offset block LLVM's loop vectorizer (verified in disassembly: identical scalar codegen). True SIMD would need an explicit byte-set search (e.g. memchr/jetscii/core::arch), which conflicts with the crate's no_std + stable + minimal-deps constraints, so it's intentionally left out.

Escape one contiguous run of safe bytes per `push_str` instead of
pushing each byte individually, and drop the `unsafe` `as_mut_vec()`
write in the process. Behaviour is unchanged; the byte-set and output
are identical, now covered by additional edge-case tests (empty input,
adjacent specials, multi-byte UTF-8).

Neutral on templates that only splice short strings (static markup is
escaped at compile time and never hits this path); noticeably faster
when long dynamic text is escaped at runtime.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant