Skip to content

perf(expression): fast-path identifier characters and short-circuit keyword scans#223

Merged
DylanPiercey merged 1 commit into
mainfrom
claude/expression-fast-path
Jun 20, 2026
Merged

perf(expression): fast-path identifier characters and short-circuit keyword scans#223
DylanPiercey merged 1 commit into
mainfrom
claude/expression-fast-path

Conversation

@DylanPiercey

Copy link
Copy Markdown
Contributor

Summary

Speeds up EXPRESSION parsing — the hottest state in the parser (~25% of parse time) — by skipping work that provably cannot match. No behavior change.

Changes (all in src/states/EXPRESSION.ts)

  1. Identifier/number-character fast path. Inside the per-character loop, a word character (A-Z a-z 0-9 $ _) is never whitespace, is never a terminator (no shouldTerminate implementation matches a word character), and is not one of the switch's cases — it always falls through to default: pos++. A single guard clause now short-circuits the termination checks and the switch dispatch for the bulk of expression content:

    if (isWordCode(code)) {
      this.pos++;
      continue;
    }
  2. Operator keyword-scan short-circuit. lookBehindForOperator / lookAheadForOperator looped over the unary/binary keyword lists even when the surrounding character could not possibly start or end a keyword. Since every keyword is lowercase ASCII letters, they now bail out immediately when the relevant character is not a-z.

Correctness

  • No behavior change — the full test suite passes, and parser output (a checksum over every emitted range across a 1027-file corpus) is byte-for-byte identical.
  • The fast path is safe because every shouldTerminate implementation only terminates on punctuation (verified across all implementations), and word characters are not handled by the expression switch.

Performance

Measured with a process-isolated A/B harness (each variant in its own process to avoid JIT cross-talk, alternating order, sign test over many rounds) on a corpus of real Marko fixtures:

  • Steady-state throughput: ~6% faster vs main, with the identifier fast path alone contributing +3.8% (faster in 20/20 rounds) on top of the keyword-scan change.

🤖 Generated with Claude Code


Generated by Claude Code

…eyword scans

Speed up expression parsing by skipping work that provably cannot match:

- Add an identifier/number-character fast path to the expression loop. Such a
  character is never whitespace, never a terminator (no `shouldTerminate`
  implementation matches a word character), and is not one of the switch's
  cases, so it can short-circuit the termination checks and the switch dispatch
  entirely and just advance the position.
- Bail out of the unary/binary operator keyword scans immediately when the
  surrounding character cannot start or end a keyword (every keyword is
  lowercase ASCII letters).

No behavior change: the full test suite passes and parser output is identical.
@changeset-bot

changeset-bot Bot commented Jun 20, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 543d209

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
htmljs-parser Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@codecov

codecov Bot commented Jun 20, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.95%. Comparing base (221d3b7) to head (543d209).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #223      +/-   ##
==========================================
- Coverage   99.97%   99.95%   -0.03%     
==========================================
  Files          34       34              
  Lines        4204     4223      +19     
  Branches      776      780       +4     
==========================================
+ Hits         4203     4221      +18     
- Misses          1        2       +1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e77a1323-a1b5-4d9a-89fe-09d106903e79

📥 Commits

Reviewing files that changed from the base of the PR and between 221d3b7 and 543d209.

📒 Files selected for processing (2)
  • .changeset/fast-expressions-skip-scans.md
  • src/states/EXPRESSION.ts

Walkthrough

Three fast-path optimizations are added to src/states/EXPRESSION.ts. In the main expression parse loop, when the current character satisfies isWordCode, this.pos is incremented and the loop continues immediately, bypassing termination checks and the switch dispatch. In lookBehindForOperator, a check on the character preceding pos returns -1 early if it is not a lowercase letter (az), skipping the unary keyword scan. In lookAheadForOperator, the same check is applied to the character at pos before the binary keyword scan. A changeset file documents these changes.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: performance improvements through fast-path identifier character handling and short-circuit keyword scans in expression parsing.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, providing clear context about the performance optimization, specific implementation details, correctness guarantees, and measured performance improvements.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/expression-fast-path

Comment @coderabbitai help to get the list of available commands and usage tips.

@DylanPiercey DylanPiercey merged commit 3c95d7f into main Jun 20, 2026
10 of 11 checks passed
@DylanPiercey DylanPiercey deleted the claude/expression-fast-path branch June 20, 2026 21:14
@github-actions github-actions Bot mentioned this pull request Jun 20, 2026
DylanPiercey added a commit to marko-js/tree-sitter that referenced this pull request Jun 26, 2026
…ord scans

Port htmljs-parser's EXPRESSION fast paths to the external scanner:

- Add an identifier/number-character fast path to the expression loop. Such a
  character is never whitespace, never a terminator (no should_terminate case
  matches a word character), and is not one of the switch's cases, so it
  short-circuits the termination checks (including should_terminate's eager
  lookahead) and the switch dispatch and just advances.
- Bail out of the unary/binary operator keyword scans immediately when the
  surrounding character cannot start or end a keyword (every keyword is
  lowercase ASCII letters).

No behavior change: the full fixture-comparison suite passes. ~6-9% faster on
expression-heavy input in an A/B build.

Mirrors marko-js/htmljs-parser#223 (commit 3c95d7f).
DylanPiercey added a commit to marko-js/tree-sitter that referenced this pull request Jun 26, 2026
…ord scans

Port htmljs-parser's EXPRESSION fast paths to the external scanner:

- Add an identifier/number-character fast path to the expression loop. Such a
  character is never whitespace, never a terminator (no should_terminate case
  matches a word character), and is not one of the switch's cases, so it
  short-circuits the termination checks (including should_terminate's eager
  lookahead) and the switch dispatch and just advances.
- Bail out of the unary/binary operator keyword scans immediately when the
  surrounding character cannot start or end a keyword (every keyword is
  lowercase ASCII letters).

No behavior change: the full fixture-comparison suite passes. ~6-9% faster on
expression-heavy input in an A/B build.

Mirrors marko-js/htmljs-parser#223 (commit 3c95d7f).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant