Skip to content

Support Unicode identifiers in Handlebars expressions#114

Merged
daaain merged 2 commits into
masterfrom
feature/unicode-identifiers
Jun 29, 2026
Merged

Support Unicode identifiers in Handlebars expressions#114
daaain merged 2 commits into
masterfrom
feature/unicode-identifiers

Conversation

@daaain

@daaain daaain commented Jun 29, 2026

Copy link
Copy Markdown
Owner

Handlebars allows variable, helper, partial and block names in any language, but the grammar's identifier character classes were limited to ASCII (a-zA-Z0-9), so non-Latin names lost highlighting.

Replace the ASCII ranges with Oniguruma Unicode property classes \p{L} (any letter) and \p{N} (any number) in the Handlebars-specific rules: block_helper, end_block, partial_and_var, attribute name/value, layout (!<) and else_token. HTML-structural rules (tag names, entities, generic attributes) keep their ASCII ranges per the HTML spec.

This supersedes PR #90, which only added Cyrillic to a subset of rules (and missed the closing-tag rule); the review on that PR asked for full-language support instead.

Closes #90. Adds test/unicode.test.js covering Cyrillic, CJK, Arabic and Latin-with-diacritics across variables, blocks, partials, hashes and else-if.

Summary by CodeRabbit

  • New Features

    • Handlebars syntax highlighting now supports Unicode letters and digits in identifiers, helper names, partials, block tags, and attribute keys/values.
    • Improved recognition for non-ASCII forms of else if, block endings, and layout-style extends syntax.
  • Tests

    • Added automated coverage to verify correct tokenization/highlighting for Unicode identifiers across multiple scripts (including variables, helpers, parameters, partials, and attribute data).

Handlebars allows variable, helper, partial and block names in any
language, but the grammar's identifier character classes were limited to
ASCII (a-zA-Z0-9), so non-Latin names lost highlighting.

Replace the ASCII ranges with Oniguruma Unicode property classes
\p{L} (any letter) and \p{N} (any number) in the Handlebars-specific
rules: block_helper, end_block, partial_and_var, attribute name/value,
layout (!<) and else_token. HTML-structural rules (tag names, entities,
generic attributes) keep their ASCII ranges per the HTML spec.

This supersedes PR #90, which only added Cyrillic to a subset of rules
(and missed the closing-tag rule); the maintainer's review on that PR
asked for full-language support instead.

Closes #90. Adds test/unicode.test.js covering Cyrillic, CJK, Arabic
and Latin-with-diacritics across variables, blocks, partials, hashes
and else-if.
@daaain daaain mentioned this pull request Jun 29, 2026
@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 99c65b5e-c6d1-41f3-ac04-a012e6eb6a2f

📥 Commits

Reviewing files that changed from the base of the PR and between 84d8275 and 9920a4d.

📒 Files selected for processing (1)
  • test/unicode.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/unicode.test.js

📝 Walkthrough

Walkthrough

All three Handlebars grammar formats replace ASCII-only identifier classes with Unicode property escapes in matching rules for blocks, else clauses, extends syntax, attributes, and inline variables/partials. A new test file checks Unicode scoping across several script systems.

Changes

Unicode Identifier Support

Layer / File(s) Summary
Unicode regex widening across all grammar formats
grammars/Handlebars.json, grammars/Handlebars.sublime-syntax, grammars/Handlebars.tmLanguage
Seven regex patterns in each grammar file replace ASCII character classes with \p{L}/\p{N} Unicode property escapes for block_helper, else_token, end_block, extends, handlebars_attribute_name, handlebars_attribute_value, and partial_and_var rules. The .tmLanguage file also adds \b boundaries around attribute name/value matches.
Unicode grammar test suite
test/unicode.test.js
New node:test coverage adds an assertScope helper and cases for Cyrillic, CJK, Arabic, and diacritic Latin identifiers across variable, block open/close, else-if, partial, extends, and hash attribute positions.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 Hop, hop—new letters join the line,
From Cyrillic stars to scripts that shine.
Blocks and helpers now read with grace,
Unicode dances through the grammar space.
A rabbit nods: “All tongues may play!”

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: widening Handlebars identifiers to support Unicode.
Linked Issues check ✅ Passed The PR fulfills #90 by adding Unicode support for Handlebars names and related parsing across the highlighted rules.
Out of Scope Changes check ✅ Passed The changes stay focused on Unicode identifier support and matching tests, with no obvious unrelated additions.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/unicode-identifiers

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/unicode.test.js (1)

25-69: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add a Unicode {{!< ...}} regression test.

The suite covers most widened Handlebars rules, but it never asserts the extends pattern that changed in all three grammar files. That leaves the {{!< макет}} path unprotected.

Suggested test
 test('non-ASCII hash key and value', async () => {
   const src = '{{foo имя=значение}}';
   await assertScope(src, 'имя', 'entity.other.attribute-name.handlebars');
   await assertScope(src, 'значение', 'entity.other.attribute-value.handlebars');
 });
+
+test('layout extends with a non-ASCII name', async () => {
+  await assertScope('{{!< макет}}', 'макет', 'support.class.handlebars');
+});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/unicode.test.js` around lines 25 - 69, Add a regression test in the
unicode test suite for the Handlebars extends form handled by the grammar’s
`extends` pattern, since `{{!< ...}}` is not currently covered. Use
`assertScope` with a non-ASCII template name such as `{{!< макет}}` and verify
the relevant token scope on the Unicode name so the widened rule stays protected
across the grammar files.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/unicode.test.js`:
- Around line 25-69: Add a regression test in the unicode test suite for the
Handlebars extends form handled by the grammar’s `extends` pattern, since `{{!<
...}}` is not currently covered. Use `assertScope` with a non-ASCII template
name such as `{{!< макет}}` and verify the relevant token scope on the Unicode
name so the widened rule stays protected across the grammar files.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 683be924-dd7f-4e12-9ef6-21e36dee496a

📥 Commits

Reviewing files that changed from the base of the PR and between adc200e and 84d8275.

📒 Files selected for processing (4)
  • grammars/Handlebars.json
  • grammars/Handlebars.sublime-syntax
  • grammars/Handlebars.tmLanguage
  • test/unicode.test.js

The extends rule was widened to \p{L}\p{N} alongside the other
identifier rules but had no Unicode coverage; only the ASCII case in
embedding.test.js guarded it. Add a test with a non-ASCII template name
so the widened rule stays protected.
@daaain daaain merged commit a5aa65d into master Jun 29, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant