perf(expression): fast-path identifier characters and short-circuit keyword scans#223
Conversation
…eyword scans Speed up expression parsing by skipping work that provably cannot match: - Add an identifier/number-character fast path to the expression loop. Such a character is never whitespace, never a terminator (no `shouldTerminate` implementation matches a word character), and is not one of the switch's cases, so it can short-circuit the termination checks and the switch dispatch entirely and just advance the position. - Bail out of the unary/binary operator keyword scans immediately when the surrounding character cannot start or end a keyword (every keyword is lowercase ASCII letters). No behavior change: the full test suite passes and parser output is identical.
🦋 Changeset detectedLatest commit: 543d209 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #223 +/- ##
==========================================
- Coverage 99.97% 99.95% -0.03%
==========================================
Files 34 34
Lines 4204 4223 +19
Branches 776 780 +4
==========================================
+ Hits 4203 4221 +18
- Misses 1 2 +1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughThree fast-path optimizations are added to 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
…ord scans Port htmljs-parser's EXPRESSION fast paths to the external scanner: - Add an identifier/number-character fast path to the expression loop. Such a character is never whitespace, never a terminator (no should_terminate case matches a word character), and is not one of the switch's cases, so it short-circuits the termination checks (including should_terminate's eager lookahead) and the switch dispatch and just advances. - Bail out of the unary/binary operator keyword scans immediately when the surrounding character cannot start or end a keyword (every keyword is lowercase ASCII letters). No behavior change: the full fixture-comparison suite passes. ~6-9% faster on expression-heavy input in an A/B build. Mirrors marko-js/htmljs-parser#223 (commit 3c95d7f).
…ord scans Port htmljs-parser's EXPRESSION fast paths to the external scanner: - Add an identifier/number-character fast path to the expression loop. Such a character is never whitespace, never a terminator (no should_terminate case matches a word character), and is not one of the switch's cases, so it short-circuits the termination checks (including should_terminate's eager lookahead) and the switch dispatch and just advances. - Bail out of the unary/binary operator keyword scans immediately when the surrounding character cannot start or end a keyword (every keyword is lowercase ASCII letters). No behavior change: the full fixture-comparison suite passes. ~6-9% faster on expression-heavy input in an A/B build. Mirrors marko-js/htmljs-parser#223 (commit 3c95d7f).
Summary
Speeds up
EXPRESSIONparsing — the hottest state in the parser (~25% of parse time) — by skipping work that provably cannot match. No behavior change.Changes (all in
src/states/EXPRESSION.ts)Identifier/number-character fast path. Inside the per-character loop, a word character (
A-Z a-z 0-9 $ _) is never whitespace, is never a terminator (noshouldTerminateimplementation matches a word character), and is not one of theswitch's cases — it always falls through todefault: pos++. A single guard clause now short-circuits the termination checks and the switch dispatch for the bulk of expression content:Operator keyword-scan short-circuit.
lookBehindForOperator/lookAheadForOperatorlooped over the unary/binary keyword lists even when the surrounding character could not possibly start or end a keyword. Since every keyword is lowercase ASCII letters, they now bail out immediately when the relevant character is nota-z.Correctness
shouldTerminateimplementation only terminates on punctuation (verified across all implementations), and word characters are not handled by the expressionswitch.Performance
Measured with a process-isolated A/B harness (each variant in its own process to avoid JIT cross-talk, alternating order, sign test over many rounds) on a corpus of real Marko fixtures:
main, with the identifier fast path alone contributing +3.8% (faster in 20/20 rounds) on top of the keyword-scan change.🤖 Generated with Claude Code
Generated by Claude Code