diff --git a/README.md b/README.md index 89e7ab8..8fa4ffd 100644 --- a/README.md +++ b/README.md @@ -34,7 +34,7 @@ A TextMate grammar is a pile of regexes guessing at a language's structure. It's Take `typeof x < y`. A regex highlighter has to guess whether `<` opens a generic argument list or is a less-than comparison — and it guesses wrong somewhere, forever. A **parser** doesn't guess; the grammar already decides. Monogram inverts the dependency: -1. **Write the grammar, then prove it.** The grammar is executable — Monogram runs it as a recursive-descent + [Pratt](https://en.wikipedia.org/wiki/Operator-precedence_parser) (operator-precedence) parser over the TypeScript conformance suite, measured *bidirectionally*: it must **accept** every input `tsc` accepts **and reject** every input it rejects. +1. **Write the grammar, then prove it.** The grammar is executable — Monogram runs it as a recursive-descent + [Pratt](https://en.wikipedia.org/wiki/Operator-precedence_parser) (operator-precedence) parser over the TypeScript conformance suite, measured *bidirectionally*: it **accepts** what `tsc` accepts and **rejects** what `tsc` rejects — with `tsc` the [oracle, not the definition](#correctness-the-productions-not-tsc), the two diverging only where `tsc` itself does. 2. **Derive the highlighters from that proven grammar**, never hand-write them. The TextMate, tree-sitter, and Monarch outputs are all generated from the one parser-validated definition, so their correctness is underwritten by the conformance run, not by regex tuning. @@ -49,6 +49,21 @@ Two numbers answer two different questions — read them together, not against e So the two aren't in tension: a near-tie in the broad table can sit right next to a lopsided ledger — the broad average dilutes the difference with easy tokens, while the ledger zooms in on the hard cases it buries. +### Correctness: the productions, not `tsc` + +The conformance run measures Monogram against `tsc`, but `tsc` is the **oracle, not the definition**. What the grammar models is the language's **syntactic productions** — and the parser produces a [CST](#what-you-get), which is *pre-semantic*: whether an expression is a valid assignment target, or a `using` binding is an identifier rather than a pattern, is a **static-semantic** rule. That belongs to a CST *consumer* — the CST→AST lowering, or a validator that walks the tree — not to the parser. The parser's one job is to accept exactly the strings the productions derive. + +This matters because `tsc`'s *parser* is not the same thing as the language. It draws its own parse-vs-check line, and on a handful of inputs it diverges from the grammar — and from the other engines (V8, Babel) — in **both** directions. Driving Monogram's accept/reject to *exactly* `tsc` would mean reproducing those quirks; instead it follows the productions: + +| Input | Monogram | `tsc` parser | V8 / Babel | Why | +|---|---|:--:|:--:|---| +| `obj?.#field` | accept | reject | accept | A private member in an optional chain is valid current ECMAScript — V8 and Babel both accept it; `tsc`'s parser is the lone rejecter. | +| `let v: void.x` | reject | accept | reject | A qualified type name's root is an `IdentifierReference`; `void` is a keyword type, so no production qualifies it. (`undefined.x` *is* valid — `undefined` is identifier-rooted.) | +| `using {a} = b` | reject | accept | reject | A `using` binding is a `BindingIdentifier` (`BindingList[~Pattern]`); the object pattern has no production. `using [a] = b` *is* valid — there `using` is an identifier and `[a]` is an element access. | +| `++ -x` | accept | reject | reject | `++ UnaryExpression` derives it; "operand must be a simple target" is a static-semantic early error, which the parser leaves to a consumer. | + +`tsc` rejecting the first and accepting the next two (its parser doesn't enforce those productions until the checker) is exactly why "match `tsc`" can't *be* the definition of correct — only the measurement oracle. + ### Broad agreement vs the official grammar **Parser** (Monogram vs the official parser, [`test/src-coverage.ts`](test/src-coverage.ts)) — **agree** = the same accept/reject verdict on each corpus file (for HTML, full **parse-tree equality** via parse5); **covered** = how much of the official parser's own branches the corpus exercises, so read `agree` as "on the covered portion." (For the non-HTML grammars `agree` is accept/reject; their parse-*tree* correctness is exercised by the Highlighter axis, whose roles are read off the tree.) **Highlighter** (Monogram's derived TextMate grammar vs the official one, [`test/scope-gap.ts`](test/scope-gap.ts)) — both graded against the parser's per-token roles, the [vscode#203212](https://github.com/microsoft/vscode/issues/203212) comparison. @@ -227,12 +242,29 @@ The **only-Monogram** wins above are all disambiguations that are *TextMate-expr "TextMate can't express X" is not a guess or an assertion; it is a claim to be **proven from the model**. TextMate is a line-oriented matcher whose only cross-line memory is a finite stack of scope contexts, so a proof exhibits an X whose correct highlighting provably needs memory that model lacks — unbounded lookback to a token that is not an enclosing context. A failed *attempt* to derive a pattern is not such a proof: a cleverer pattern may exist, and most "impossible for TextMate" folklore is exactly this error — the multiline / nested-generic cases turn out TM-expressible once a parser supplies the pattern, which is why the derived grammar gets them right. Where a construct provably exceeds the model, Monogram's **tree-sitter** target — a real parser over the whole tree — resolves it. +### Total parsing under edits — measured against tsc and tree-sitter + +The handle API (`createParser()`) is **total**: every text yields a tree plus `cst.errors`, with tsc-grade diagnostics (`expected ',' or ']'` where every listed token is *provably* still accepted at that position, `to match this '('` related info, zero-width `$missing` nodes that keep a call's shape when its `)` is missing). Two structural guarantees back it: + +- **The valid path is byte-identical to the strict parser** — recovery runs only after a strict pass has rejected, so error tolerance costs valid input nothing, by construction. +- **Every edited re-parse is byte-identical to a fresh parse** of the same text — tree *and* errors, broken states included, held exact by generative edit scripts across all seven grammars in CI (`test/incremental-grammars.ts`). + +One 9 MB TypeScript document, identical single-character edit scripts (`test/head-to-head.ts`, node v24, Apple silicon; ✎ = per keystroke, median): + +| engine | fresh parse | valid ✎ | breaking ✎ | while-broken ✎ | fixing ✎ | +|---|---:|---:|---:|---:|---:| +| **Monogram** | **167 ms** | 0.37 ms | 12 ms | **0.22 ms** | 2.2 ms | +| tsc `updateSourceFile` | 207 ms | 35 ms | 12.0 ms | 11.9 ms | 11.9 ms | +| tree-sitter (official) | 430 ms | **0.18 ms** | **0.29 ms** | 0.30 ms | **0.22 ms** | + +Monogram beats tsc on every phase (valid typing ~100×, while-broken ~50×) and beats or matches tree-sitter everywhere except the two **transition** edits (break/fix). Profiling attributes those almost entirely to the bench's 4.5 MB cursor jump: token-column offsets are EOF-relative-biased so that local typing never rewrites the suffix (that is what makes the valid keystroke 0.37 ms), and the bias boundary moves with the cursor — a far jump pays once, proportional to the jump distance, then repeated break/fix transitions at that position settle to **~1.6–2 ms** (the parser passes measure under 1 ms of that). + ## What you get From one grammar definition (a small TypeScript combinator API), five outputs are **fully functional**: - **A lexer** — tokenizes source straight from the grammar's token definitions; usable on its own (`createLexer(grammar).tokenize`). -- **A CST parser** — recursive descent + Pratt precedence on top of the lexer, producing a **CST** (concrete syntax tree): every token is a node, including punctuation and keywords — roughly 2× an AST's nodes, by design, which is exactly what the highlighter and lossless source reconstruction need. +- **A CST parser** — recursive descent + Pratt precedence on top of the lexer, producing a **CST** (concrete syntax tree): every token is a node, including punctuation and keywords — roughly 2× an AST's nodes, by design, which is exactly what the highlighter and lossless source reconstruction need. A CST is *pre-semantic* (it models the productions, not static semantics — see [Correctness](#correctness-the-productions-not-tsc)). - **A TextMate grammar** — a `.tmLanguage.json` for VS Code / Sublime syntax highlighting, derived from the same rules, including derived **JSDoc-body** and **regex-internal** sub-grammars. (TextMate *scopes* are the dot-separated labels — `entity.name.function`, `keyword.control` — that a theme maps to colors.) - **A VS Code language configuration** — `language-configuration.json` (comments, bracket pairs, auto-close/surround, folding) derived from the same tokens. - **CST node types** — a TypeScript discriminated union (keyed by rule) for typed tree consumers. diff --git a/ROADMAP.md b/ROADMAP.md index c8f8673..80f0664 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -26,6 +26,9 @@ Three parser-grounded layers (in `test/`), each comparing against the language's ## What's next +- **Parser-acceptance long tail vs tsc** (measured by `test/recovery-conformance.ts`: recall 61.2%, 108 conformance files we parse-accept that tsc's parser rejects). The remainder is fully enumerated, two buckets: + - **`[Await]`/`[Yield]` parameter contexts** (31 files): `await`/`yield` must be reserved *inside* async/generator bodies and parameter lists, identifiers elsewhere. Needs a context-threading mechanism in the engine — the same shape as `exclude('in', …)` for the no-`in` context, but suppressing identifier *texts* over a subtree. Designed direction, not yet built. + - **Per-shape strictness** (77 files, each class small and named): declaration-modifier ordering (`public @dec method`), private names outside classes (`const #foo`), strict-mode octal literals (`001`), member declarations with `var` (`class C { var x }`), paren-less `new` arguments (`new C0 32`), reserved words in dotted namespace tails, template-literal module names, `extends void`, `super` tagged templates. Each wants the same treatment that landed for `case`/`class`/statement keywords: fix, then prove FN=0 with the accept/reject flip-scan against the corpus. - **More vscode#203212 bundles** — low-effort first (ini, diff, git config, xml); the large ones (ruby, perl, c/c++, groovy) each need an instrumentable official parser (WASM / native-coverage) + a corpus. - **Field labels** in the grammar DSL → richer named-field AST types. - **Highlighter long tail** — the few remaining per-language divergences are documented (in the PR) as either the shared TextMate-vs-parser ceiling or proven architectural floors; where a construct provably exceeds the TextMate model, the derived **tree-sitter** target (a real whole-tree parser) resolves it. diff --git a/TOTAL-PARSING.md b/TOTAL-PARSING.md new file mode 100644 index 0000000..9583a1e --- /dev/null +++ b/TOTAL-PARSING.md @@ -0,0 +1,232 @@ +# Total parsing: the formal spine + +How the handle API (`createParser()`) parses *every* text into a tree plus +`cst.errors` while keeping two byte-identity guarantees no mainstream engine +makes, and why each piece is sound. The implementation lives in +`src/emit-parser.ts` (emitted runtime) and is held exact by the gates listed at +the end. + +## The contract + +For every input text and every edit sequence: + +1. **Totality** — `parse`/`edit` never throw on input. Every text yields a root + and a (possibly empty) `errors` list. Only API misuse throws. +2. **Strict-path identity** — a text the strict grammar accepts parses + byte-identically to the strict module-level parser, with `errors = []`. + Error tolerance costs valid input *nothing*, by construction (below), not by + testing. +3. **Edit/fresh identity** — after any edit, tree *and* errors are + byte-identical to a fresh parse of the same text — broken states included. + +## Two passes, strict first + +`parse`/`edit` run the **strict** parser first. Only when it rejects does the +text re-run with `recovering = true`. Guarantee 2 is therefore structural: the +valid path never executes a single recovery branch. The recovering run is where +everything below lives. + +## The bar discipline + +A naive "recover at any failure" breaks both identities: PEG longest-match +exploration *fails constantly* on valid arms, so an always-on recovery rescues +losing arms and perturbs valid shapes; and an incremental run that reuses old +rows explores *less* than a fresh run, so any failure-count-dependent decision +desynchronizes the two. + +Recovery instead fires only at positions a strict pass has *proven* to fail: + +- Each recovering **attempt** runs strictly except at an ordered list of + **bars** (token indices). A recovery action is allowed only inside a bar's + window (below). +- An attempt that fails *past* its bars aborts and appends a new bar at the + attempt's farthest-fail watermark (`maxPos`), monotonically increasing. +- Attempt k runs under the first k bars; the loop is capped (32), then degrades + to a deterministic free-fire pass (`recoverFree`) and, past even that, to a + zero-width `$error` root. Never a crash. + +**Determinism theorem.** The bar list is a pure function of the token stream: +bar k+1 is the strict-modulo-bars farthest-fail of a deterministic parse under +bars 1..k. Hence fresh and incremental recovering parses derive byte-identical +bar lists, which is the keystone of guarantee 3. This forces every ingredient +below to be *adoption-invariant*: nothing about reuse may change any watermark +or any fire decision. + +## Recovery actions, all position-pure + +Every action's fire condition is a pure function of `(position, bar list)` — +no counters, no budgets, no global parse state. (A budgeted design was tried +and failed exactly here: bar₂'s decisions depended on bar₁'s spending, which an +adopted region replays differently.) + +- **Skip absorption** — at a repetition whose element fails with + `recoverArmed(from, reach)` (∃ bar in `[from, reach]` with `reach ≤ bar+2`, + where `reach` is the *failing element's frame-local* probe watermark, not the + global one — a frontier parked on a far bar must not arm unrelated loops), + absorb tokens to the loop's FIRST set / threaded closer / EOF into an + `$error` row. Leaves keep text-tiling; the diagnostic quotes the first + absorbed token. +- **Missing-token synthesis** (`missTok`) — a *required* literal/token matcher + failing at `missAt(pos)` (∃ bar in `[pos, pos+2]`) materializes a zero-width + `$missing` row instead of failing: the construct completes (a call keeps its + Call shape with `)` marked missing) and the diagnostic reads `expected ')'`. +- **Missing-nonterminal synthesis** (`missRule`) — the same at a required rule + reference's fail exit: `expected Expr`. +- **Commitment semantics** — synthesis is suppressed inside *uncommitted* + probes: `not()` and separator probes (`probing`), and optional groups that + have not consumed past their entry (`probeBase`). Once an optional consumes a + real token it is committed and synthesizes like required content (`const a = + ;` synthesizes the initializer; a bare `const a` does not invent one). This + is tsc's required-only semantics, derived rather than hand-coded. + +## Three structural theorems the gates forced + +Each of these was surfaced as an `edit ≠ fresh` divergence by the generative +cross-grammar gate, then closed structurally — not patched per-case. + +**T1 — Zero-width success is a synthesis-only artifact.** A strict parser can +never succeed at width zero inside a loop (it would not terminate), so *every* +loop must discard zero-width elements: plain repetitions break on +`pos === before`, hooked repetitions discard and re-arm, left-recursion +continuations and Pratt LEDs refuse zero-width wraps. Without this, synthesis +inside a loop spins unboundedly. + +**T2 — Same-position re-entry is a real cycle class.** Zero-width synthesis +(and, under recovering, the opened dispatch guards) lets a rule re-enter +itself at the same position through paths no grammar check can rule out. +`recRunning` maps each in-flight `(rule, position)` frame to an entry serial; +re-entry fails with PEG cycle semantics. The refinement that matters for reuse: +a cycle refusal that leans on a frame entered *before* the current one makes +the current frame's result a function of its **ancestor stack**, not of the +text — such results are *tainted* (memo-stamped own-generation-only, taint +propagating to whoever reuses them). Internal cycles (both ends inside the +frame) replay from the window text alone and do not taint. + +**T3 — The bar protocol's inputs must be adoption-invariant.** Bar k+1 is +derived from a watermark, so watermarks must be *exact* and *reuse-stable*: +`frameMax` is a frame-local advance watermark (reset at rule entry, folded to +the parent at exit) that makes every stored extent the frame's true probe +reach; memo jumps and adoptions re-raise it to the stored extent, so a reused +subtree contributes the same watermark the parse that built it did. + +## The window-replay theorem + +Define a frame's **window** as `[start, start + ext + 2]` over token indices, +where `ext` is its exact probe extent (T3) and `+2` covers the stop-token and +SECOND-token dispatch reads. + +**Theorem.** Every recovery decision being position-pure, a frame's behavior — +result, probe extent, internal fires and synthesis included — is completely +determined by its window's *text* and its window's *bars*, modulo the +external-cycle dependence of T2. + +Corollaries, each carrying one optimization: + +- **Recovering adoption** (`barsWindowEq`): an old-tree row whose window sees + the same (shifted) bars the build run saw there replays identically — even + rows *containing* `$error`/`$missing` (an error region is exactly what stays + stable across far edits). Broken-state keystrokes go incremental. +- **Cross-attempt memo survival**: attempts within one sequence parse the same + stream under a monotonically growing bar list, so a memo entry whose window + is **bar-free** behaved strictly (no synthesis, no arming; opened dispatch + guards add only non-consuming probes) and is a pure function of window text — + valid in every later attempt. Tainted entries (T2) are excluded; this + exclusion is precisely what the first survival attempt missed and the gates + rejected. Survival is edit-side only: the fresh path's attempt loop resets + the arena per attempt, so earlier attempts' rows are clobbered there. +- **Recovering surgery**: a splice whose damage and re-parsed span sit clear of + every bar window *commutes with every recovery decision* — kept rows replay + at shifted positions, and the fresh parse behaves strictly across the span, + exactly like the strict re-parse the surgery runs. Attempt k's bars are a + prefix of the final list, so one check against the final list covers every + attempt. The spliced tree keeps its bar list, suffix bars shifted. + +Taint is tracked on rows as well as memo entries: a tainted frame's row +carries `rowRM` bit 2, propagated structurally like error containment, and +recovering adoption / run extension refuse it — a context-dependent result is +never reused outside the parse that computed it. + +## Lexer resync under depth shifts + +The windowed re-lex adopts the old token suffix at the first aligned token +where the old suffix's lexing is reproducible from observable state. Two +sufficient conditions (both require empty template stacks on both sides — an +interpolation entry's brace counter is mutable state no record captures — and +a candidate token that carries no cross-token lexer flag its adopted successor +reads): + +- **Equal-depth**: neither lex dipped below the candidate's paren depth since + the divergence point (damage start; before it, identical bytes from an + identical anchor state give identical stacks). Every open entry is then + common to both lexes: the stacks are content-equal, and every future pop + behaves identically. O(1), the common case. +- **Shifted-depth**: the old suffix never pops an entry open at the candidate + (its recorded depth column never dips below the candidate's depth; + pop-on-empty counts as −1). No open entry's head-ness is ever read again, so + stack *contents* are irrelevant and the depths may differ by an arbitrary + shift δ — the splice re-bases the adopted depth records by δ, restoring true + absolute depths (`(`-head bits are local facts of their own neighbors and + stay valid). This is what makes a paren-balance-changing edit O(window) + instead of a relex-to-EOF. The dominant candidate depth is 0 (statement + boundaries), where the condition collapses to "no pop-on-empty beyond the + candidate" — answered O(1) from an ascending doc-level list of pop-on-empty + token indices (almost always empty) instead of an O(suffix) min-build; only + depth > 0 candidates build the suffix minimum, lazily once per edit. + +## Diagnostics are data, derived from the tree + +`cst.errors` is rebuilt at settle from structured lexer entries plus the +`$error`/`$missing` rows found by descending the structurally-propagated +`rowRM` spine — never collected during parsing. That is what makes adoption +safe for diagnostics: an adopted error region re-derives byte-identical +messages from the current token columns. Two derived enrichments: + +- **Viable sets** — for a required literal in a seq, the companion literals + *provably still accepted* when it fails: repetitions before it are always + re-enterable (their nullable-prefix-reachable literals stay viable); + nullable one-shot items are crossed but contribute nothing, since they may + already have consumed. `expected ',' or ']'` never names an impossible + continuation — a static FIRST union would (after `[1, 2` an expression is + not viable), and tsc under-reports the same position as `')' expected`. +- **Paired openers** — for each literal, intersect the sets of preceding + literals across all its seq occurrences; a unique survivor is its structural + opener (`)`←`(`, `]`←`[`, `while`←`do` — derived, no bracket list), attached + as `related` info pointing at the opener leaf among the `$missing`'s earlier + siblings. + +## Measured (9 MB TypeScript, single-character edits, median) + +| phase | Monogram | tsc `updateSourceFile` | tree-sitter | +|---|---:|---:|---:| +| fresh parse | **167 ms** | 207 ms | 430 ms | +| valid keystroke | 0.37 ms | 35 ms | **0.18 ms** | +| breaking edit | 12 ms | 12.0 ms | **0.29 ms** | +| while-broken keystroke | **0.22 ms** | 11.9 ms | 0.30 ms | +| fixing edit | 2.2 ms | 11.9 ms | **0.22 ms** | + +(`test/head-to-head.ts`.) The transition rows measure a first-touch 4.5 MB +cursor jump: token offsets are EOF-relative-biased so local typing never +rewrites the suffix (the 0.37 ms valid keystroke), and the bias boundary +moves with the cursor — a far jump pays once, proportional to the distance. +Repeated break/fix transitions at one position settle to ~1.6–2 ms, of +which the strict-fail pass is 0.23 ms and the recovery attempts 0.46 ms; +the raw 7-column suffix memmove measures 0.07 ms, so the residual is spread +bookkeeping, not a storage floor. + +Error-report agreement with tsc's parser on the conformance files it rejects +(`test/recovery-conformance.ts`, ±8 chars): recall 59.1%, precision 82.4%, +first-error agreement 57.5%. + +## The gates that hold all of this exact + +- `test/incremental-grammars.ts` — generative inputs × seeded edits × all 7 + grammars: every step's tree+errors byte-equal to fresh, self-consistent + spans, no throws (672 steps). +- `test/incremental-verify.ts`, `test/multi-doc.ts` — real-file edit scripts + and interleaved documents under the same byte-equality. +- `test/recovery.ts` — strict-path identity on valid texts, totality and + determinism on an invalid corpus, a char-by-char typing session, and + exact-match diagnostic pins (synthesis quality must not silently regress to + absorption). +- `test/emit-parser-verify.ts` / `test/emit-lexer-verify.ts` — emitted runtime + ≡ interpreter on the corpus, token streams and error messages included. diff --git a/javascript.monarch.json b/javascript.monarch.json index 135a6a4..c016142 100644 --- a/javascript.monarch.json +++ b/javascript.monarch.json @@ -388,7 +388,7 @@ } ], [ - "(?:[0-9]+(?:_[0-9]+)*(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])", + "(?:(?:0|[1-9][0-9]*(?:_[0-9]+)*)(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])", { "token": "number", "switchTo": "@value" @@ -448,6 +448,10 @@ "token": "operator", "switchTo": "@root" }, + "target": { + "token": "keyword", + "switchTo": "@root" + }, "class": { "token": "keyword", "switchTo": "@root" @@ -765,7 +769,7 @@ "include": "@exprBody" }, [ - "/(?:[^/\\\\\\[\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])+/[gimsuydv]*", + "/(?:[^/\\\\\\[*\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])(?:[^/\\\\\\[\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])*/(?:[a-zA-Z0-9_$]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})*", { "token": "regexp", "switchTo": "@value" @@ -802,7 +806,7 @@ "number" ], [ - "(?:[0-9]+(?:_[0-9]+)*(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])", + "(?:(?:0|[1-9][0-9]*(?:_[0-9]+)*)(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])", "number" ], [ @@ -826,6 +830,7 @@ "instanceof": "operator", "in": "keyword", "new": "operator", + "target": "keyword", "class": "keyword", "extends": "keyword", "async": "keyword", @@ -893,7 +898,7 @@ } ], [ - "/(?:[^/\\\\\\[\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])+/[gimsuydv]*", + "/(?:[^/\\\\\\[*\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])(?:[^/\\\\\\[\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])*/(?:[a-zA-Z0-9_$]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})*", "regexp" ], [ diff --git a/javascript.tmLanguage.json b/javascript.tmLanguage.json index 8970209..dcc94e7 100644 --- a/javascript.tmLanguage.json +++ b/javascript.tmLanguage.json @@ -39,9 +39,6 @@ { "include": "#object-method-key" }, - { - "include": "#new-expr" - }, { "include": "#arrow-function-params" }, @@ -99,9 +96,6 @@ { "include": "#scope-keyword-control-loop" }, - { - "include": "#scope-keyword-control-loop-of" - }, { "include": "#scope-keyword-control-flow" }, @@ -126,6 +120,9 @@ { "include": "#scope-keyword-control-from-from" }, + { + "include": "#scope-keyword-other" + }, { "include": "#scope-storage-type-class" }, @@ -138,9 +135,6 @@ { "include": "#scope-storage-modifier-accessibility" }, - { - "include": "#scope-keyword-other" - }, { "include": "#scope-storage-type-function" }, @@ -199,10 +193,10 @@ "include": "#punctuation-comma" }, { - "include": "#scope-punctuation-accessor-optional" + "include": "#scope-punctuation-bracket-square" }, { - "include": "#scope-punctuation-bracket-square" + "include": "#scope-punctuation-accessor-optional" }, { "include": "#scope-punctuation-bracket-curly" @@ -229,7 +223,7 @@ "repository": { "regex-literal-prefix-ops": { "name": "string.regexp.js", - "begin": "(?:(?<=[=|\\^&<>+\\-*%~(,.\\[?:{;])|(?<=\\binstanceof)|(?<=\\bin)|(?<=\\bnew)|(?<=\\bextends)|(?<=\\byield)|(?<=\\bget)|(?<=\\bset)|(?<=\\basync)|(?<=\\belse)|(?<=\\bdo)|(?<=\\breturn)|(?<=\\bthrow)|(?<=\\btry)|(?<=\\bfinally)|(?<=\\bcatch)|(?<=\\bof)|(?<=\\bcase)|(?<=\\bexport)|(?<=\\bdefault)|(?<=\\bimport)|(?<=\\bstatic)|(?<=\\baccessor)|(?<=\\btypeof)|(?<=\\bvoid)|(?<=\\bdelete)|(?<=\\bawait)|(?<=^))\\s*([!](?:\\s*[!])*)\\s*(?:((?:/\\*\\*(?!/)[\\s\\S]*?\\*/|/\\*[\\s\\S]*?\\*/)\\s*))?(/)(?![*/])", + "begin": "(?:(?<=[=|\\^&<>+\\-*%~(,.\\[?:{;])|(?<=\\binstanceof)|(?<=\\bin)|(?<=\\bextends)|(?<=\\byield)|(?<=\\bget)|(?<=\\bset)|(?<=\\basync)|(?<=\\belse)|(?<=\\bdo)|(?<=\\breturn)|(?<=\\bthrow)|(?<=\\btry)|(?<=\\bfinally)|(?<=\\bcatch)|(?<=\\bof)|(?<=\\bcase)|(?<=\\bexport)|(?<=\\bdefault)|(?<=\\bimport)|(?<=\\bstatic)|(?<=\\btypeof)|(?<=\\bvoid)|(?<=\\bdelete)|(?<=\\bawait)|(?<=^))\\s*([!](?:\\s*[!])*)\\s*(?:((?:/\\*\\*(?!/)[\\s\\S]*?\\*/|/\\*[\\s\\S]*?\\*/)\\s*))?(/)(?![*/])", "beginCaptures": { "1": { "name": "keyword.operator.logical.prefix.js" @@ -241,7 +235,7 @@ "name": "punctuation.definition.string.begin.regexp.js" } }, - "end": "(/)([gimsuydv]*)", + "end": "(/)([a-z]*)", "endCaptures": { "1": { "name": "punctuation.definition.string.end.regexp.js" @@ -981,7 +975,7 @@ }, "number": { "name": "constant.numeric.decimal.js", - "match": "(?:[0-9]+(?:_[0-9]+)*(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])" + "match": "(?:(?:0|[1-9][0-9]*(?:_[0-9]+)*)(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])" }, "template": { "name": "string.quoted.other.template.js", @@ -1681,12 +1675,12 @@ "name": "keyword.operator.expression.js" }, "scope-keyword-control-loop": { - "match": "\\b(in|for|while|do|break|continue)\\b", + "match": "\\b(in|for|while|do|break|continue|of)\\b", "name": "keyword.control.loop.js" }, - "scope-keyword-control-loop-of": { - "match": "\\b(of)\\b(?=\\s+[[:alpha:][:digit:]_$\"`({\\[]|\\s*$|\\s*[({\\[\"`/])", - "name": "keyword.control.loop.js" + "scope-keyword-other": { + "match": "\\b(target|meta|as)\\b", + "name": "keyword.other.js" }, "scope-storage-type-class": { "match": "\\b(class)\\b", @@ -1697,11 +1691,11 @@ "name": "keyword.other.extends.js" }, "scope-storage-modifier": { - "match": "\\b(async)\\b", + "match": "\\b(async|accessor)\\b", "name": "storage.modifier.js" }, "scope-storage-modifier-accessibility": { - "match": "\\b(static|accessor)\\b(?=\\s+(?:\\.\\.\\.|[[:alpha:]_$\\[*#{\"'0-9]))", + "match": "\\b(static)\\b(?=\\s+(?:\\.\\.\\.|[[:alpha:]_$\\[*#{\"'0-9]))", "name": "storage.modifier.js" }, "scope-keyword-control-flow": { @@ -1712,10 +1706,6 @@ "match": "\\b(import)\\b", "name": "keyword.control.import.js" }, - "scope-keyword-other": { - "match": "\\b(meta|as)\\b", - "name": "keyword.other.js" - }, "scope-storage-type-function": { "match": "\\b(function)\\b", "name": "storage.type.function.js" @@ -1906,14 +1896,14 @@ "match": "\\(|\\)", "name": "punctuation.bracket.round.js" }, - "scope-punctuation-accessor-optional": { - "match": "\\?\\.", - "name": "punctuation.accessor.optional.js" - }, "scope-punctuation-bracket-square": { "match": "\\[|\\]", "name": "punctuation.bracket.square.js" }, + "scope-punctuation-accessor-optional": { + "match": "\\?\\.", + "name": "punctuation.accessor.optional.js" + }, "scope-punctuation-bracket-curly": { "match": "\\{|\\}", "name": "punctuation.bracket.curly.js" @@ -1931,9 +1921,13 @@ "name": "keyword.control.flow.js" }, "expr-scope-keyword-other": { - "match": "\\b(meta)\\b", + "match": "\\b(target|meta)\\b", "name": "keyword.other.js" }, + "expr-scope-storage-modifier": { + "match": "\\b(async)\\b", + "name": "storage.modifier.js" + }, "expression": { "patterns": [ { @@ -1975,9 +1969,6 @@ { "include": "#object-method-key" }, - { - "include": "#new-expr" - }, { "include": "#arrow-function-params" }, @@ -2030,16 +2021,16 @@ "include": "#scope-keyword-control-from-from" }, { - "include": "#scope-storage-type-class" + "include": "#expr-scope-keyword-other" }, { - "include": "#scope-keyword-other-extends" + "include": "#scope-storage-type-class" }, { - "include": "#scope-storage-modifier" + "include": "#scope-keyword-other-extends" }, { - "include": "#expr-scope-keyword-other" + "include": "#expr-scope-storage-modifier" }, { "include": "#scope-storage-type-function" @@ -2093,10 +2084,10 @@ "include": "#punctuation-comma" }, { - "include": "#scope-punctuation-accessor-optional" + "include": "#scope-punctuation-bracket-square" }, { - "include": "#scope-punctuation-bracket-square" + "include": "#scope-punctuation-accessor-optional" }, { "include": "#scope-punctuation-bracket-curly" @@ -2177,7 +2168,7 @@ }, "regex": { "name": "string.regexp.js", - "begin": "(?:(?<=[=|\\^&<>+\\-*%~(,.\\[?:{;])|(?<=\\binstanceof)|(?<=\\bin)|(?<=\\bnew)|(?<=\\bextends)|(?<=\\byield)|(?<=\\bget)|(?<=\\bset)|(?<=\\basync)|(?<=\\belse)|(?<=\\bdo)|(?<=\\breturn)|(?<=\\bthrow)|(?<=\\btry)|(?<=\\bfinally)|(?<=\\bcatch)|(?<=\\bof)|(?<=\\bcase)|(?<=\\bexport)|(?<=\\bdefault)|(?<=\\bimport)|(?<=\\bstatic)|(?<=\\baccessor)|(?<=\\btypeof)|(?<=\\bvoid)|(?<=\\bdelete)|(?<=\\bawait)|(?<=^))\\s*(?:((?:/\\*\\*(?!/)[\\s\\S]*?\\*/|/\\*[\\s\\S]*?\\*/)\\s*))?(/)(?![*/])", + "begin": "(?:(?<=[=|\\^&<>+\\-*%~(,.\\[?:{;])|(?<=\\binstanceof)|(?<=\\bin)|(?<=\\bextends)|(?<=\\byield)|(?<=\\bget)|(?<=\\bset)|(?<=\\basync)|(?<=\\belse)|(?<=\\bdo)|(?<=\\breturn)|(?<=\\bthrow)|(?<=\\btry)|(?<=\\bfinally)|(?<=\\bcatch)|(?<=\\bof)|(?<=\\bcase)|(?<=\\bexport)|(?<=\\bdefault)|(?<=\\bimport)|(?<=\\bstatic)|(?<=\\btypeof)|(?<=\\bvoid)|(?<=\\bdelete)|(?<=\\bawait)|(?<=^))\\s*(?:((?:/\\*\\*(?!/)[\\s\\S]*?\\*/|/\\*[\\s\\S]*?\\*/)\\s*))?(/)(?![*/])", "beginCaptures": { "1": { "name": "comment.block.js" @@ -2186,7 +2177,7 @@ "name": "punctuation.definition.string.begin.regexp.js" } }, - "end": "(/)([gimsuydv]*)", + "end": "(/)([a-z]*)", "endCaptures": { "1": { "name": "punctuation.definition.string.end.regexp.js" @@ -2201,22 +2192,6 @@ } ] }, - "new-expr": { - "name": "meta.new-expr.js", - "begin": "\\b(new)\\b", - "beginCaptures": { - "1": { - "name": "keyword.operator.expression.js" - } - }, - "end": "(?=[()}\\],=;])", - "patterns": [ - { - "match": "(?:[a-zA-Z_$\\p{L}\\p{Nl}]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})(?:[a-zA-Z0-9_$\\p{L}\\p{Nl}\\p{Nd}\\p{Mn}\\p{Mc}\\p{Pc}]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})*", - "name": "entity.name.function.js" - } - ] - }, "parameter-name": { "match": "(?<=[,(])\\s*(\\.\\.\\.)?\\s*((?:[a-zA-Z_$\\p{L}\\p{Nl}]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})(?:[a-zA-Z0-9_$\\p{L}\\p{Nl}\\p{Nd}\\p{Mn}\\p{Mc}\\p{Pc}]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})*)(?=\\s*[,)=])", "captures": { diff --git a/javascript.ts b/javascript.ts index d1b5b4f..9514d5f 100644 --- a/javascript.ts +++ b/javascript.ts @@ -26,12 +26,27 @@ import { token, rule, defineGrammar, - left, right, none, noUnaryLhs, + left, right, none, noUnaryLhs, lhsTarget, prefixTarget, postfixTarget, op, prefix, postfix, sameLine, - sep, opt, many, many1, alt, exclude, not, + sep, opt, many, many1, alt, exclude, not, reservableNot, capExpr, + awaitCtx, yieldCtx, asyncGenCtx, resetCtx, altPattern, optPattern, seq, oneOf, noneOf, range, anyChar, star, plus, repeat, notFollowedBy, start, } from './src/api.ts'; +// Build the four async×generator arms of a `function` form, routing each arm's params +// and body to its [Await]/[Yield] family: plain resets to none, generator -> yield, +// async -> await, async-generator -> both. `nameParts` is spread in after `function` +// (and `*` for the generator arms); `body` is the function body element. Param/Block +// resolve at thunk-eval time (defined below), so this is safe to call inside a rule(). +function fnArms(nameParts, body) { + return [ + ['function', ...nameParts, '(', sep(Param, ','), ')', resetCtx(body)], + ['function', '*', ...nameParts, '(', sep(yieldCtx(Param), ','), ')', yieldCtx(body)], + ['async', 'function', ...nameParts, '(', sep(awaitCtx(Param), ','), ')', awaitCtx(body)], + ['async', 'function', '*', ...nameParts, '(', sep(asyncGenCtx(Param), ','), ')', asyncGenCtx(body)], + ]; +} + // ── Tokens ── // IdentifierName, ASCII + `\u`-escape forms. The `\uXXXX` / `\u{cp}` alternatives let an @@ -74,7 +89,13 @@ const BigInt_ = token(seq(digits, 'n', numericTailGuard), { scope: 'constan // `[0-9]`/`\d` prefix, so without this the token would lose its `constant.numeric` scope. const fracTail = seq('.', star(digit), star(seq('_', plus(digit)))); const expTail = seq(oneOf('e', 'E'), optPattern(oneOf('+', '-')), digits); -const Number_ = token(seq(altPattern(seq(digits, optPattern(fracTail)), seq('.', digits)), optPattern(expTail), numericTailGuard), { scope: 'constant.numeric.decimal' }); +// A decimal integer part is a single `0` or a `[1-9]`-led run — a leading `0` followed by +// a digit (legacy octal `0123`, leading-zero decimal `09`) is NOT a decimal literal: with +// intPart='0', the trailing digit trips numericTailGuard so the token fails to match and +// the lexer rejects it (tsc's scanner behavior). fracTail/expTail/BigInt keep `digits` +// (leading zeros legal there: `0.012`, `1e007`, `0n`). +const intPart = altPattern('0', seq(range('1', '9'), star(digit), star(seq('_', plus(digit))))); +const Number_ = token(seq(altPattern(seq(intPart, optPattern(fracTail)), seq('.', digits)), optPattern(expTail), numericTailGuard), { scope: 'constant.numeric.decimal' }); // A well-formed JS escape, used in the string-body pattern below. `\u`/`\x` must // match their strict forms — a `\u{cp}` with cp ≤ 0x10FFFF, a 4-hex `\uXXXX`, or a // 2-hex `\xXX` — while `\` + any *other* char (\n, \\, \q non-escape, line @@ -108,7 +129,13 @@ const Template = token(seq('`', star(altPattern(noneOf('`', '\\', '$'), seq( }); const regexEscape = seq('\\', noneOf(lineTerminator)); const regexClassBody = star(altPattern(noneOf(']', '\\', '\n'), regexEscape)); -const Regex_ = token(seq('/', plus(altPattern(noneOf('/', '\\', '[', '\n'), regexEscape, seq('[', regexClassBody, ']'))), '/', star(oneOf('g', 'i', 'm', 's', 'u', 'y', 'd', 'v'))), { +// RegularExpressionChar; the FIRST char additionally excludes `*` (RegularExpressionFirstChar) +// so `/*` is never a regex start — it is a block-comment open, and an unterminated `/* … /` +// is a lexical error, NOT a regex literal. (A `*` anywhere after the first char stays legal: +// `/a*/`.) Body is one-or-more total, so `//` remains a LineComment as before. +const regexChar = altPattern(noneOf('/', '\\', '[', '\n'), regexEscape, seq('[', regexClassBody, ']')); +const regexFirstChar = altPattern(noneOf('/', '\\', '[', '*', '\n'), regexEscape, seq('[', regexClassBody, ']')); +const Regex_ = token(seq('/', regexFirstChar, star(regexChar), '/', star(identPart)), { // flags: maximal-munch any IdentifierPart run (tsc lexes flags leniently; validity is a checker rule) regex: true, regexContext: { divisionAfterTypes: ['Ident', 'Number', 'String', 'Template', 'BigInt'], @@ -162,6 +189,12 @@ export { // (let/static/implements/yield/await/…) — those ARE valid identifiers in some // context a CFG can't detect (sloppy mode, non-generator/non-async), so forbidding // them here would reject valid code (`var let = 1`, `function f(yield) {}`). +// NOT reservable: tsc's PARSER accepts await/yield (and let/static/…) as binding +// identifiers even inside an async/generator body — the "reserved word" rule there is +// a CHECKER diagnostic, not a parse error (`async function f(){ let await = 1 }`, +// `function* g(){ function yield(){} }` both parse). The [Await]/[Yield] reservation +// that IS a parse error lives at expression position (notReservedExpr), where `await` +// must be the operator and so needs an operand. export const notReserved = not(alt( 'break', 'case', 'catch', 'class', 'const', 'continue', 'debugger', 'default', 'delete', 'do', 'else', 'enum', 'export', 'extends', 'false', 'finally', 'for', @@ -176,14 +209,23 @@ export const notReserved = not(alt( // `null`, …), and TS's own error-recovery tolerates several reserved words sliding into // the bare-identifier fallback inside otherwise-valid files (e.g. `export default …`, // undeclared `for (x in …)`, `class … extends (e)`, a decorator before `export`). The -// words below have NO such role: they are the prefix operators `void`/`typeof`/`delete` -// (which must take an operand) plus the `catch`/`throw` keywords and `enum`. Forbidding -// the bare-identifier fallback for exactly these rejects `catch(x){}` with no `try`, -// `void ;`/`typeof ;`/`delete ;` (operatorless prefix op), and `throw ;` — while leaving -// every valid expression (and TS's recovery cases) untouched. Verified: widening this -// set to other reserved words regresses valid code; these five are the FN-safe maximum. -export const notReservedExpr = not(alt( - 'catch', 'delete', 'enum', 'throw', 'typeof', 'void', +// words below have NO such role: the prefix operators `void`/`typeof`/`delete` (which +// must take an operand), the `catch`/`throw` keywords, `enum`, `case` (a bare `case` +// expression let `case 1 y();` inside a switch parse as three statements), and +// `class` (a valid class expression always out-matches the bare-identifier fallback, +// so forbidding the fallback only rejects broken classes — `class extends D ;` with +// no body parsed as three statements). Forbidding the bare-identifier fallback for +// exactly these rejects `catch(x){}` with no `try`, `void ;`/`typeof ;`/`delete ;` +// (operatorless prefix op), `throw ;`, a colon-less `case`, and a body-less `class` +// — while leaving every valid expression (and TS's recovery cases) untouched. +// Verified by a zero-flip accept/reject scan over the conformance corpus; widening +// further regresses: `extends` is load-bearing for tsc's tolerated heritage shapes +// (`interface I extends { }` reads `{` as the body, `extends A extends B`, +// `extends Foo?.Bar` — all parse-accepted by tsc through the identifier fallback). +export const notReservedExpr = reservableNot(alt( + 'break', 'case', 'catch', 'class', 'continue', 'debugger', 'delete', 'do', + 'else', 'enum', 'finally', 'for', 'if', 'return', 'switch', 'throw', 'try', + 'typeof', 'void', 'while', 'with', )); // ── Precedence ladder (shared ECMAScript operator precedence) ── @@ -199,8 +241,11 @@ export const jsLedPrecs = [ ]; export const ecmaPrec = [ - right('=', '+=', '-=', '*=', '/=', '%=', '**=', '<<=', '>>=', '>>>=', '&=', '|=', '^='), - right('??=', '||=', '&&='), + // Assignment operators require a LeftHandSideExpression target (ECMAScript + // AssignmentTargetType): `-x = 1`, `++x = 1`, `x++ = 1` are syntax errors; `x = 1`, + // `x.y = 1`, `(x++) = 1` (a parenthesized cover) are fine. + right(lhsTarget('=', '+=', '-=', '*=', '/=', '%=', '**=', '<<=', '>>=', '>>>=', '&=', '|=', '^=')), + right(lhsTarget('??=', '||=', '&&=')), left('??'), left('||'), left('&&'), @@ -214,8 +259,19 @@ export const ecmaPrec = [ left('*', '/', '%'), right(noUnaryLhs('**')), // `-x ** y` is a syntax error: a unary-prefix expr can't be a `**` LHS right(prefix('!', '~', '+', '-', 'typeof', 'void', 'delete', 'await', 'yield')), + // prefix `++`/`--` (update prefixes): the spec operand is a UnaryExpression + // (`UpdateExpression : ++ UnaryExpression`), so `++-x`, `++ ++x`, `++await x`, `++delete a.b` + // are all PRODUCTION-DERIVABLE — the CST producer accepts them and emits the concrete tree. + // "operand is not a simple assignment target" is a Static-Semantics early error (the same + // class as `(a+b)=c`), which is identified downstream when an AST `UpdateExpression` + // (operand: SimpleAssignmentTarget) fails to lower — NOT here. So this stays a plain prefix. right(prefix('++', '--')), - left(postfix('++', '--')), + // postfix `++`/`--` operand IS a LeftHandSideExpression in the grammar + // (`UpdateExpression : LeftHandSideExpression [no LT] ++`), so `++x++`, `x++ ++` are genuine + // PRODUCTION-violations (operand `++x`/`x++` is an UpdateExpression, not a LHS) — no parse + // tree exists, so the CST producer correctly rejects them. (Asymmetric with the prefix above + // by the grammar's own slot types: prefix operand = UnaryExpression, postfix operand = LHS.) + left(postfixTarget('++', '--')), ]; // ── Decorators ── @@ -229,13 +285,18 @@ const DecoratorExpr = rule($ => [ // ── Expressions ── const Prop = rule($ => { - const method = ['(', sep(Param, ','), ')', Block]; // ( … ) { … } + // ( … ) { … }, params+body routed to a [Await]/[Yield] family (see memTail); the + // MemberName stays outside it (a computed key inherits the enclosing context). + const propTail = (ctx) => ['(', sep(ctx(Param), ','), ')', ctx(Block)]; return [ ['...', Expr], // spread - // accessor (get/set) - [alt('get', 'set'), MemberName, '(', opt(sep(Param, ',')), ')', Block], - // method: async?/generator?, any member name (incl `#x`, computed `[e]`), then ( … ) { … } - [opt('async'), opt('*'), MemberName, ...method], + // accessor (get/set) — get/set bodies are plain (reset) + [alt('get', 'set'), MemberName, '(', opt(sep(resetCtx(Param), ',')), ')', resetCtx(Block)], + // method, 4-way split on async × generator (each routes params+body to its family) + ['async', '*', MemberName, ...propTail(asyncGenCtx)], + ['async', MemberName, ...propTail(awaitCtx)], + ['*', MemberName, ...propTail(yieldCtx)], + [MemberName, ...propTail(resetCtx)], // value property — any member name incl computed `[e]: v` (MemberName covers `[Expr]`) [MemberName, ':', Expr], ['[', Expr, many(',', Expr), ']', ':', Expr], // computed comma list (lenient) @@ -270,8 +331,11 @@ const Expr = rule($ => [ // (both are one token) goes to the first-listed alternative, so listing the literals // first makes `this`/`true`/… arrive as $keyword leaves — the tree records what the // word IS instead of the bare-identifier fallback winning the tie and stamping Ident. - 'true', 'false', 'null', 'undefined', 'this', 'super', - [notReservedExpr, Ident], + 'true', 'false', 'null', 'undefined', 'this', + // `super` is a CONSTRAINED primary (mirrors tsc's parseSuperExpression): MUST be + // immediately followed by a call `(args)`, member `.name`/`.#priv`, or element `[expr]`. + ['super', alt(['(', sep($, ','), ')'], ['.', alt(Ident, PrivateField)], ['[', $, ']'])], + [not('super'), not('new'), notReservedExpr, Ident], Number_, String_, Template, @@ -282,31 +346,55 @@ const Expr = rule($ => [ ['...', $], [$, '(', sep($, ','), ')'], [$, '.', alt(Ident, PrivateField)], - // optional chaining: ?.x | ?.#x | ?.(args) | ?.[i] | ?.`…` + // optional chaining: ?.x | ?.#x | ?.(args) | ?.[i] | ?.`…`. A private member `?.#x` IS + // valid current ECMAScript (V8 + Babel accept it; tsc's lone parse rejection is a bug being + // removed in TS#60263) — so PrivateField stays. The CST producer models the syntax; it does + // not adjudicate tsc-only restrictions. [$, '?.', alt(Ident, PrivateField, ['(', sep($, ','), ')'], ['[', $, ']'], Template)], [$, '[', $, ']'], [$, '?', $, ':', $], [$, 'instanceof', $], [$, 'in', $], [$, Template], - // new T | new T(args) - ['new', NewTarget, opt('(', sep($, ','), ')')], - ['new', 'class', Ident, opt('extends', ClassHeritage), '{', many(ClassMember), '}', opt('(', sep($, ','), ')')], - ['new', 'class', opt('extends', ClassHeritage), '{', many(ClassMember), '}', opt('(', sep($, ','), ')')], + // `new.target` meta-property — the only `new` form not followed by a target; matched by a + // dedicated arm (NOT the bare identifier nud, which excludes `new`) so a failed `new T` arm + // can't slide `new` in as an Ident (`new Foo()` → the comparison `(new < T) > Foo()`). + ['new', '.', 'target'], + // new T | new T(args). An optional chain may NOT follow a bare `new` (no Arguments): a + // NewExpression is not a valid `?.` base, so `new a?.b` / `new class{}?.x` have no parse tree + // (tsc + V8 + babel all reject). `not('?.')` guards the no-call exit; `new a()?.b` chains via + // the outer `?.` LED unchanged. + ['new', not('<'), NewTarget, alt(['(', sep($, ','), ')'], not('?.'))], + ['new', 'class', Ident, opt('extends', ClassHeritage), '{', many(ClassMember), '}', alt(['(', sep($, ','), ')'], not('?.'))], + ['new', 'class', opt('extends', ClassHeritage), '{', many(ClassMember), '}', alt(['(', sep($, ','), ')'], not('?.'))], ['[', many(opt($), ','), opt($), ']'], ['{', sep(Prop, ','), '}'], - [opt('async'), '(', sep(Param, ','), ')', '=>', alt($, Block)], + // Arrow functions, async/non-async SPLIT so the [Await] grammar parameter can route + // each arm's params + body to the right rule family (await-yield-fork.ts): an async + // arrow's params and body are await-context (`async (a = await) =>` rejects — await + // needs an operand), a plain arrow's body resets to none. + // capExpr('?'): an ArrowFunction is the LOWEST-precedence AssignmentExpression — it can be + // neither the operand of a binary/logical/conditional operator nor an assignment target, so + // each arm is capped BELOW the conditional `?`: it parses only at an assignment-or-looser + // minBp and, once parsed, admits no led (`() => {} || a` rejects, NOT `(() => {}) || a`). A + // `||`/`?:` INSIDE an expression body (`() => a || b`) is unaffected — parsed by the body `$`. + // The body is `alt(Block, $)` (Block FIRST) = the spec's ConciseBody `[lookahead ≠ {] + // AssignmentExpression | { FunctionBody }`: `() => {}` is a block body, not an object literal + // that greedily absorbs a trailing `|| a` / `.x`. + capExpr('?', 'async', '(', sep(awaitCtx(Param), ','), ')', '=>', awaitCtx(alt(Block, $))), + capExpr('?', '(', sep(Param, ','), ')', '=>', resetCtx(alt(Block, $))), // async arrow with a BARE parameter: `async err => …` (ES2017). `async` and the // parameter must share a line (`async\nx => …` is `async;` then a plain arrow — // the spec's [no LineTerminator here] between async and the binding identifier). - ['async', sameLine, Ident, '=>', alt($, Block)], - [Ident, '=>', alt($, Block)], + capExpr('?', 'async', sameLine, awaitCtx(notReservedExpr, Ident), '=>', awaitCtx(alt(Block, $))), + capExpr('?', notReservedExpr, Ident, '=>', resetCtx(alt(Block, $))), ['yield', alt(['*', $], [opt($)])], // yield e | yield* e (delegate) | yield ['(', $, many(',', $), ')'], ['import', alt(['(', $, ')'], ['.', 'meta'])], PrivateField, HexNumber, OctalNumber, BinaryNumber, BigInt_, - [opt('async'), 'function', opt('*'), opt(Ident), '(', sep(Param, ','), ')', Block], + // function expression, 4-way split on async × generator (see fnArms). + ...fnArms([opt(Ident)], Block), // named vs anonymous kept separate (greedy opt(Ident) would eat a leading // `extends`); decorator dimension collapsed via opt(DecoratorExpr). [opt(DecoratorExpr), 'class', Ident, many('extends', sep(alt([not('extends'), ClassHeritage]), ',')), '{', many(ClassMember), '}'], @@ -376,16 +464,23 @@ const ForHead = rule($ => { return [ // declared head: `let/const/var/using/await using ` then C-style or in/of. // ForBinding gives a no-`in` initializer so `for (var a = 1 in xs)` parses. - [alt('let', 'const', 'var', 'using', ['await', 'using']), sep(ForBinding, ','), alt( + // `for (using of of …)` has no parse tree (the using-DECL reading is suppressed by the + // spec `[lookahead != using of]` and `using` as an identifier then fails); guard the exact + // triple only, so `for (using of ;…)` and `for (await using of of …)` stay valid. + [not(['using', 'of', 'of']), alt('let', 'const', 'var', 'using', ['await', 'using']), sep(ForBinding, ','), alt( cTail, - [alt('in', 'of'), Expr], + // the for-in OBJECT is a full Expression (comma included: `for (a in b, c)`); + // for-of takes an AssignmentExpression - no comma (tsc rejects `for (x of a, b)`) + ['in', Expr, many(',', Expr)], + ['of', Expr], )], [opt(Expr, many(',', Expr)), ...cTail], // C-style, no declaration: `for (i=0; …; …)` / `for (;;)` // for-in/of, no declaration: `for (x of xs)`. The target Expr parses in a no-`in` // context (same exclude as binding initializers): the `in` belongs to the for-head, // not to an in-LED inside the target — without it `for (key in obj)` swallowed the // `in`, the arm failed, and the statement fell back to a CALL parse `for(...)`. - [exclude('in', Expr), alt('in', 'of'), Expr], + [exclude('in', Expr), 'in', Expr, many(',', Expr)], + [exclude('in', Expr), 'of', Expr], ]; }); @@ -411,7 +506,7 @@ const Stmt = rule($ => [ ['break', opt(sameLine, notReserved, Ident), opt(';')], ['continue', opt(sameLine, notReserved, Ident), opt(';')], ['try', Block, opt('catch', opt('(', alt(Param, BindingPattern), ')'), Block), opt('finally', Block)], - [Ident, ':', $], + [notReserved, Ident, ':', $], ';', ['debugger', opt(';')], ['with', '(', Expr, ')', $], @@ -427,7 +522,7 @@ const Stmt = rule($ => [ // (extends-expression heritage, bare `;` class elements, decorator placements), so // 31 tsc-valid corpus files still rely on the class-EXPRESSION fallback — widen the // declaration arm first, then guard. - [not(alt('function', 'class', ['async', 'function'])), Expr, many(',', Expr), opt(';')], + [not(alt('function', 'class', ['async', 'function'], ['let', '['])), Expr, many(',', Expr), opt(';')], ]); // ── Declarations ── @@ -448,28 +543,47 @@ const MemberName = rule($ => [ // member's shared `modifiers …` prefix isn't re-parsed per alternative. Inner // alt() is first-match, so branches are ordered specific-before-general // (generator/accessor before the MemberName method/field split). -const Modifier = alt('static', 'accessor', 'async'); -const callTail = ['(', sep(Param, ','), ')', opt(Block), opt(';')] as const; +// modifier only when NOT followed by name-making tokens (see typescript.ts) +// `async` is NOT a generic member modifier here: it leads the async/async-generator +// method arms below (which give the body its [Await] context), so the modifier soup +// must not swallow it into a plain method (the class analog of the Decl modifier-prefix +// fix). `static`/`accessor` stay generic modifiers. +const Modifier = alt([alt('static', 'accessor'), not(alt('(', '=', '{', '}'))]); +// Class member ( params ) body, with params+body routed to a [Await]/[Yield] family: +// plain methods reset (a method body has its OWN, non-inherited context — the spec's +// implicit function boundary), generators yield, async await, async-generators both. +// The MemberName stays OUTSIDE the family: a computed key `[e]` is evaluated in the +// ENCLOSING context, so it must inherit, not reset. +const memTail = (ctx) => ['(', sep(ctx(Param), ','), ')', opt(ctx(Block)), opt(';')]; const ClassMember = rule($ => [ ';', // SemicolonClassElement: `class C { ; }` - DecoratorExpr, - ['constructor', '(', sep(Param, ','), ')', Block, opt(';')], - ['static', Block], + ['constructor', '(', sep(resetCtx(Param), ','), ')', resetCtx(Block), opt(';')], + [many(DecoratorExpr), many(Modifier), 'static', awaitCtx(Block)], // static block body is [+Await] (await reserved); decorators/modifiers parse (SEMANTIC errors) + // decorators PREFIX a member, before any modifier (see typescript.ts) [ + many(DecoratorExpr), many(Modifier), alt( - ['*', MemberName, ...callTail], // generator method - [alt('get', 'set'), MemberName, '(', opt(sep(Param, ',')), ')', opt(Block), opt(';')], // accessor + // `async` is order-free among modifiers (tsc parses any order), so it carries + // its own inner modifier run and an async member's body is [+Await]/[+Await,+Yield]. + ['async', many(Modifier), '*', MemberName, ...memTail(asyncGenCtx)], // async generator method + ['async', many(Modifier), alt('get', 'set'), MemberName, '(', opt(sep(awaitCtx(Param), ',')), ')', opt(awaitCtx(Block)), opt(';')], // async accessor (semantic error; parses) + ['async', many(Modifier), 'static', awaitCtx(Block)], // `async static { }` (semantic error; parses) + ['async', many(Modifier), MemberName, ...memTail(awaitCtx)], // async method + ['*', MemberName, ...memTail(yieldCtx)], // generator method + [alt('get', 'set'), MemberName, '(', opt(sep(resetCtx(Param), ',')), ')', opt(resetCtx(Block)), opt(';')], // accessor [MemberName, alt( - [...callTail], // method (requires `(`) - [opt('=', Expr), opt(';')], // field (all-optional → catch-all) + [...memTail(resetCtx)], // method (requires `(`) + // field catch-all; a ';'-less field must not be followed by a same-line + // decorator (see typescript.ts) + [opt('=', resetCtx(Expr)), alt([';'], [not(sameLine)], [not(not('}'))])], )], ), ], // Fallbacks for a member NAMED like a modifier (`static = 1`, `get = 1`, `async() {}`): // many(Modifier) would eat the name, so the member kind alt fails and we land here. - [MemberName, opt('=', Expr), opt(';')], - [MemberName, '(', sep(Param, ','), ')', opt(Block), opt(';')], + [MemberName, opt('=', resetCtx(Expr)), alt([';'], [not(sameLine)], [not(not('}'))])], + [MemberName, '(', sep(resetCtx(Param), ','), ')', opt(resetCtx(Block)), opt(';')], ]); const ImportSpecifier = rule($ => [ @@ -498,14 +612,14 @@ const Decl = rule($ => [ // leading `function` is preferred as a declaration over an IIFE expression- // statement: Program tries Decl before Stmt, so `function f(){}\n()=>{}` parses // as a declaration + arrow rather than longest-matching `function f(){}()` (IIFE). - [opt('async'), 'function', opt('*'), Ident, '(', sep(Param, ','), ')', Block], + ...fnArms([Ident], Block), // class decl: optional decorators. gen-tm expands the opt()/many() to recover // the `class Ident … { … }` shape for highlighting. [many(DecoratorExpr), 'class', Ident, many('extends', sep(alt([not('extends'), ClassHeritage]), ',')), '{', many(ClassMember), '}'], ['export', alt($, Stmt)], [many1(DecoratorExpr), $], // decorators before export/default/etc. ['export', 'default', alt( - [opt('async'), 'function', opt('*'), opt(Ident), '(', sep(Param, ','), ')', Block], // function + ...fnArms([opt(Ident)], Block), // function [Expr, opt(';')], // catch-all: export default )], ['export', '*', alt(['from', String_, opt(';')], ['as', Ident, 'from', String_, opt(';')])], diff --git a/javascriptreact.monarch.json b/javascriptreact.monarch.json index ae0c000..23d323f 100644 --- a/javascriptreact.monarch.json +++ b/javascriptreact.monarch.json @@ -388,7 +388,7 @@ } ], [ - "(?:[0-9]+(?:_[0-9]+)*(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])", + "(?:(?:0|[1-9][0-9]*(?:_[0-9]+)*)(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])", { "token": "number", "switchTo": "@value" @@ -462,6 +462,10 @@ "token": "operator", "switchTo": "@root" }, + "target": { + "token": "keyword", + "switchTo": "@root" + }, "class": { "token": "keyword", "switchTo": "@root" @@ -779,7 +783,7 @@ "include": "@exprBody" }, [ - "/(?:[^/\\\\\\[\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])+/[gimsuydv]*", + "/(?:[^/\\\\\\[*\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])(?:[^/\\\\\\[\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])*/(?:[a-zA-Z0-9_$]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})*", { "token": "regexp", "switchTo": "@value" @@ -816,7 +820,7 @@ "number" ], [ - "(?:[0-9]+(?:_[0-9]+)*(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])", + "(?:(?:0|[1-9][0-9]*(?:_[0-9]+)*)(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])", "number" ], [ @@ -848,6 +852,7 @@ "instanceof": "operator", "in": "keyword", "new": "operator", + "target": "keyword", "class": "keyword", "extends": "keyword", "async": "keyword", @@ -915,7 +920,7 @@ } ], [ - "/(?:[^/\\\\\\[\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])+/[gimsuydv]*", + "/(?:[^/\\\\\\[*\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])(?:[^/\\\\\\[\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])*/(?:[a-zA-Z0-9_$]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})*", "regexp" ], [ diff --git a/javascriptreact.tmLanguage.json b/javascriptreact.tmLanguage.json index 9818823..5bbd2af 100644 --- a/javascriptreact.tmLanguage.json +++ b/javascriptreact.tmLanguage.json @@ -48,9 +48,6 @@ { "include": "#object-method-key" }, - { - "include": "#new-expr" - }, { "include": "#arrow-function-params" }, @@ -108,9 +105,6 @@ { "include": "#scope-keyword-control-loop" }, - { - "include": "#scope-keyword-control-loop-of" - }, { "include": "#scope-keyword-control-flow" }, @@ -135,6 +129,9 @@ { "include": "#scope-keyword-control-from-from" }, + { + "include": "#scope-keyword-other" + }, { "include": "#scope-storage-type-class" }, @@ -147,9 +144,6 @@ { "include": "#scope-storage-modifier-accessibility" }, - { - "include": "#scope-keyword-other" - }, { "include": "#scope-storage-type-function" }, @@ -208,10 +202,10 @@ "include": "#punctuation-comma" }, { - "include": "#scope-punctuation-accessor-optional" + "include": "#scope-punctuation-bracket-square" }, { - "include": "#scope-punctuation-bracket-square" + "include": "#scope-punctuation-accessor-optional" }, { "include": "#scope-punctuation-bracket-curly" @@ -708,7 +702,7 @@ }, "regex-literal-prefix-ops": { "name": "string.regexp.js.jsx", - "begin": "(?:(?<=[=|\\^&<>+\\-*%~(,.\\[?:{;])|(?<=\\binstanceof)|(?<=\\bin)|(?<=\\bnew)|(?<=\\bextends)|(?<=\\byield)|(?<=\\bget)|(?<=\\bset)|(?<=\\basync)|(?<=\\belse)|(?<=\\bdo)|(?<=\\breturn)|(?<=\\bthrow)|(?<=\\btry)|(?<=\\bfinally)|(?<=\\bcatch)|(?<=\\bof)|(?<=\\bcase)|(?<=\\bexport)|(?<=\\bdefault)|(?<=\\bimport)|(?<=\\bstatic)|(?<=\\baccessor)|(?<=\\btypeof)|(?<=\\bvoid)|(?<=\\bdelete)|(?<=\\bawait)|(?<=^))\\s*([!](?:\\s*[!])*)\\s*(?:((?:/\\*\\*(?!/)[\\s\\S]*?\\*/|/\\*[\\s\\S]*?\\*/)\\s*))?(/)(?![*/])", + "begin": "(?:(?<=[=|\\^&<>+\\-*%~(,.\\[?:{;])|(?<=\\binstanceof)|(?<=\\bin)|(?<=\\bextends)|(?<=\\byield)|(?<=\\bget)|(?<=\\bset)|(?<=\\basync)|(?<=\\belse)|(?<=\\bdo)|(?<=\\breturn)|(?<=\\bthrow)|(?<=\\btry)|(?<=\\bfinally)|(?<=\\bcatch)|(?<=\\bof)|(?<=\\bcase)|(?<=\\bexport)|(?<=\\bdefault)|(?<=\\bimport)|(?<=\\bstatic)|(?<=\\btypeof)|(?<=\\bvoid)|(?<=\\bdelete)|(?<=\\bawait)|(?<=^))\\s*([!](?:\\s*[!])*)\\s*(?:((?:/\\*\\*(?!/)[\\s\\S]*?\\*/|/\\*[\\s\\S]*?\\*/)\\s*))?(/)(?![*/])", "beginCaptures": { "1": { "name": "keyword.operator.logical.prefix.js.jsx" @@ -720,7 +714,7 @@ "name": "punctuation.definition.string.begin.regexp.js.jsx" } }, - "end": "(/)([gimsuydv]*)", + "end": "(/)([a-z]*)", "endCaptures": { "1": { "name": "punctuation.definition.string.end.regexp.js.jsx" @@ -1460,7 +1454,7 @@ }, "number": { "name": "constant.numeric.decimal.js.jsx", - "match": "(?:[0-9]+(?:_[0-9]+)*(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])" + "match": "(?:(?:0|[1-9][0-9]*(?:_[0-9]+)*)(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])" }, "template": { "name": "string.quoted.other.template.js.jsx", @@ -2160,12 +2154,12 @@ "name": "keyword.operator.expression.js.jsx" }, "scope-keyword-control-loop": { - "match": "\\b(in|for|while|do|break|continue)\\b", + "match": "\\b(in|for|while|do|break|continue|of)\\b", "name": "keyword.control.loop.js.jsx" }, - "scope-keyword-control-loop-of": { - "match": "\\b(of)\\b(?=\\s+[[:alpha:][:digit:]_$\"`({\\[]|\\s*$|\\s*[({\\[\"`/])", - "name": "keyword.control.loop.js.jsx" + "scope-keyword-other": { + "match": "\\b(target|meta|as)\\b", + "name": "keyword.other.js.jsx" }, "scope-storage-type-class": { "match": "\\b(class)\\b", @@ -2176,11 +2170,11 @@ "name": "keyword.other.extends.js.jsx" }, "scope-storage-modifier": { - "match": "\\b(async)\\b", + "match": "\\b(async|accessor)\\b", "name": "storage.modifier.js.jsx" }, "scope-storage-modifier-accessibility": { - "match": "\\b(static|accessor)\\b(?=\\s+(?:\\.\\.\\.|[[:alpha:]_$\\[*#{\"'0-9]))", + "match": "\\b(static)\\b(?=\\s+(?:\\.\\.\\.|[[:alpha:]_$\\[*#{\"'0-9]))", "name": "storage.modifier.js.jsx" }, "scope-keyword-control-flow": { @@ -2191,10 +2185,6 @@ "match": "\\b(import)\\b", "name": "keyword.control.import.js.jsx" }, - "scope-keyword-other": { - "match": "\\b(meta|as)\\b", - "name": "keyword.other.js.jsx" - }, "scope-storage-type-function": { "match": "\\b(function)\\b", "name": "storage.type.function.js.jsx" @@ -2385,14 +2375,14 @@ "match": "\\(|\\)", "name": "punctuation.bracket.round.js.jsx" }, - "scope-punctuation-accessor-optional": { - "match": "\\?\\.", - "name": "punctuation.accessor.optional.js.jsx" - }, "scope-punctuation-bracket-square": { "match": "\\[|\\]", "name": "punctuation.bracket.square.js.jsx" }, + "scope-punctuation-accessor-optional": { + "match": "\\?\\.", + "name": "punctuation.accessor.optional.js.jsx" + }, "scope-punctuation-bracket-curly": { "match": "\\{|\\}", "name": "punctuation.bracket.curly.js.jsx" @@ -2410,9 +2400,13 @@ "name": "keyword.control.flow.js.jsx" }, "expr-scope-keyword-other": { - "match": "\\b(meta)\\b", + "match": "\\b(target|meta)\\b", "name": "keyword.other.js.jsx" }, + "expr-scope-storage-modifier": { + "match": "\\b(async)\\b", + "name": "storage.modifier.js.jsx" + }, "expression": { "patterns": [ { @@ -2463,9 +2457,6 @@ { "include": "#object-method-key" }, - { - "include": "#new-expr" - }, { "include": "#arrow-function-params" }, @@ -2518,16 +2509,16 @@ "include": "#scope-keyword-control-from-from" }, { - "include": "#scope-storage-type-class" + "include": "#expr-scope-keyword-other" }, { - "include": "#scope-keyword-other-extends" + "include": "#scope-storage-type-class" }, { - "include": "#scope-storage-modifier" + "include": "#scope-keyword-other-extends" }, { - "include": "#expr-scope-keyword-other" + "include": "#expr-scope-storage-modifier" }, { "include": "#scope-storage-type-function" @@ -2581,10 +2572,10 @@ "include": "#punctuation-comma" }, { - "include": "#scope-punctuation-accessor-optional" + "include": "#scope-punctuation-bracket-square" }, { - "include": "#scope-punctuation-bracket-square" + "include": "#scope-punctuation-accessor-optional" }, { "include": "#scope-punctuation-bracket-curly" @@ -2665,7 +2656,7 @@ }, "regex": { "name": "string.regexp.js.jsx", - "begin": "(?:(?<=[=|\\^&<>+\\-*%~(,.\\[?:{;])|(?<=\\binstanceof)|(?<=\\bin)|(?<=\\bnew)|(?<=\\bextends)|(?<=\\byield)|(?<=\\bget)|(?<=\\bset)|(?<=\\basync)|(?<=\\belse)|(?<=\\bdo)|(?<=\\breturn)|(?<=\\bthrow)|(?<=\\btry)|(?<=\\bfinally)|(?<=\\bcatch)|(?<=\\bof)|(?<=\\bcase)|(?<=\\bexport)|(?<=\\bdefault)|(?<=\\bimport)|(?<=\\bstatic)|(?<=\\baccessor)|(?<=\\btypeof)|(?<=\\bvoid)|(?<=\\bdelete)|(?<=\\bawait)|(?<=^))\\s*(?:((?:/\\*\\*(?!/)[\\s\\S]*?\\*/|/\\*[\\s\\S]*?\\*/)\\s*))?(/)(?![*/])", + "begin": "(?:(?<=[=|\\^&<>+\\-*%~(,.\\[?:{;])|(?<=\\binstanceof)|(?<=\\bin)|(?<=\\bextends)|(?<=\\byield)|(?<=\\bget)|(?<=\\bset)|(?<=\\basync)|(?<=\\belse)|(?<=\\bdo)|(?<=\\breturn)|(?<=\\bthrow)|(?<=\\btry)|(?<=\\bfinally)|(?<=\\bcatch)|(?<=\\bof)|(?<=\\bcase)|(?<=\\bexport)|(?<=\\bdefault)|(?<=\\bimport)|(?<=\\bstatic)|(?<=\\btypeof)|(?<=\\bvoid)|(?<=\\bdelete)|(?<=\\bawait)|(?<=^))\\s*(?:((?:/\\*\\*(?!/)[\\s\\S]*?\\*/|/\\*[\\s\\S]*?\\*/)\\s*))?(/)(?![*/])", "beginCaptures": { "1": { "name": "comment.block.js.jsx" @@ -2674,7 +2665,7 @@ "name": "punctuation.definition.string.begin.regexp.js.jsx" } }, - "end": "(/)([gimsuydv]*)", + "end": "(/)([a-z]*)", "endCaptures": { "1": { "name": "punctuation.definition.string.end.regexp.js.jsx" @@ -2689,22 +2680,6 @@ } ] }, - "new-expr": { - "name": "meta.new-expr.js.jsx", - "begin": "\\b(new)\\b", - "beginCaptures": { - "1": { - "name": "keyword.operator.expression.js.jsx" - } - }, - "end": "(?=[()}\\],=;])", - "patterns": [ - { - "match": "(?:[a-zA-Z_$\\p{L}\\p{Nl}]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})(?:[a-zA-Z0-9_$\\p{L}\\p{Nl}\\p{Nd}\\p{Mn}\\p{Mc}\\p{Pc}]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})*", - "name": "entity.name.function.js.jsx" - } - ] - }, "parameter-name": { "match": "(?<=[,(])\\s*(\\.\\.\\.)?\\s*((?:[a-zA-Z_$\\p{L}\\p{Nl}]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})(?:[a-zA-Z0-9_$\\p{L}\\p{Nl}\\p{Nd}\\p{Mn}\\p{Mc}\\p{Pc}]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})*)(?=\\s*[,)=])", "captures": { diff --git a/src/api.ts b/src/api.ts index b2ab873..e7ab5f1 100644 --- a/src/api.ts +++ b/src/api.ts @@ -94,6 +94,7 @@ interface OpMarker { readonly __kind: 'op' } interface SameLineMarker { readonly __kind: 'sameLine' } interface NoCommentMarker { readonly __kind: 'noCommentBefore' } interface NoMultilineFlowMarker { readonly __kind: 'noMultilineFlowBefore' } +interface NotLeftLeafMarker { readonly __kind: 'notLeftLeaf'; readonly words: string[] } interface PrefixSlot { readonly __kind: 'prefix'; (...ops: string[]): PrefixOps; @@ -102,11 +103,12 @@ interface PostfixSlot { readonly __kind: 'postfix'; (...ops: string[]): PostfixOps; } -interface PrefixOps { readonly __kind: 'prefix-ops'; ops: string[] } -interface PostfixOps { readonly __kind: 'postfix-ops'; ops: string[] } +interface PrefixOps { readonly __kind: 'prefix-ops'; ops: string[]; requireTarget?: boolean } +interface PostfixOps { readonly __kind: 'postfix-ops'; ops: string[]; requireTarget?: boolean } interface NoUnaryLhsOps { readonly __kind: 'no-unary-lhs-ops'; ops: string[] } +interface LhsTargetOps { readonly __kind: 'lhs-target-ops'; ops: string[] } -type Marker = OpMarker | PrefixSlot | PostfixSlot | SameLineMarker | NoCommentMarker | NoMultilineFlowMarker; +type Marker = OpMarker | PrefixSlot | PostfixSlot | SameLineMarker | NoCommentMarker | NoMultilineFlowMarker | NotLeftLeafMarker; export const op: OpMarker = { __kind: 'op' }; @@ -124,6 +126,18 @@ export const noCommentBefore: NoCommentMarker = { __kind: 'noCommentBefore' }; // rejected while a single-line one accepts (see RuleExpr 'noMultilineFlowBefore'). export const noMultilineFlowBefore: NoMultilineFlowMarker = { __kind: 'noMultilineFlowBefore' }; +// Zero-width LEFT-operand head-leaf guard for a Pratt LED arm. Place it at the HEAD of a LED +// alternative, before the self `$` (e.g. `[notLeftLeaf('void','null'), $, '.', Ident]`). The arm +// matches only when the LEFT node's OUTERMOST (head) leaf token TEXT is NOT one of `words`; when it +// IS, the arm is treated as not-matched (skipped) and the connector rebinds to nothing. Models TS's +// rule that a qualified type name's root is an IdentifierReference, so the keyword/literal types +// `void`/`null`/`true`/`false`/`this` are not `.`-qualifiable (`void.x` has no parse tree) while an +// identifier-rooted type (`A.B`, `undefined.x`, `number.x`) is. Mirrors the AssignmentTargetType gate +// (`lhsTarget`/`prefixTarget`), reading the SAME head leaf but predicated on TEXT membership. +export function notLeftLeaf(...words: string[]): NotLeftLeafMarker { + return { __kind: 'notLeftLeaf', words }; +} + export const prefix: PrefixSlot = Object.assign( (...ops: string[]): PrefixOps => ({ __kind: 'prefix-ops' as const, ops }), { __kind: 'prefix' as const }, @@ -141,6 +155,27 @@ export const postfix: PostfixSlot = Object.assign( // allows `-x ** y` and would not use this. The engine enforces it generically. export const noUnaryLhs = (...ops: string[]): NoUnaryLhsOps => ({ __kind: 'no-unary-lhs-ops' as const, ops }); +// Mark infix operators whose LEFT operand must be a valid ASSIGNMENT TARGET +// (a LeftHandSideExpression — identifier / member / element / call / paren / `this`), +// NOT a prefix-unary, prefix-update, or postfix-update expression. E.g. JS `=` and the +// compound assignments: `-x = 1`, `++x = 1`, `x++ = 1` are syntax errors, but `x = 1`, +// `x.y = 1`, `(x++) = 1` (a parenthesized cover) are fine. This is ECMAScript's +// AssignmentTargetType, enforced at PARSE time. A general, declarable property; the +// engine enforces it generically via the operand node's outermost form (head/tail leaf). +export const lhsTarget = (...ops: string[]): LhsTargetOps => ({ __kind: 'lhs-target-ops' as const, ops }); + +// Postfix operators whose OPERAND must be a valid assignment target (LHS), same shape +// rule as `lhsTarget` above — e.g. JS postfix `++`/`--`: `x++` is fine but `-x++` parses +// as `-(x++)`, and `++x++`, `x++ ++` are syntax errors (the operand `++x` / `x++` is not +// a LeftHandSideExpression). Distinct from `postfix(...)` (no operand-shape constraint). +export const postfixTarget = (...ops: string[]): PostfixOps => ({ __kind: 'postfix-ops' as const, ops, requireTarget: true }); + +// Prefix operators whose OPERAND must be a valid assignment target (LHS) — e.g. JS prefix +// `++`/`--` (the update prefixes): `++x`, `++x.y` are fine but `++-x`, `++ ++x`, `++x--` +// are syntax errors. Distinct from `prefix(...)` (the pure-unary `-`/`!`/`typeof`/… take +// ANY operand, including an update: `-x++`, `void ++x` are fine). +export const prefixTarget = (...ops: string[]): PrefixOps => ({ __kind: 'prefix-ops' as const, ops, requireTarget: true }); + // ── Combinators ── class SepNode { @@ -185,15 +220,58 @@ class ExcludeNode { readonly items: Element[]; constructor(connectors: string[], items: Element[]) { this.connectors = connectors; this.items = items; } } +class CtxNode { + // Mark the wrapped items as [Await]/[Yield] context (the ECMAScript grammar + // parameter): inside an async function/arrow/method body await is the AwaitExpression + // operator (no bare-identifier reading), and inside a generator body yield is the + // YieldExpression operator. The await-yield-fork build transform reads this marker to + // name-fork the body-reachable rule closure; every other consumer treats it as a + // transparent group. Wrap ONLY the async/generator arm's body+params; a nested + // non-async function/arrow/class body is simply left UNwrapped (context resets). + readonly __kind = 'ctx' as const; + readonly mode: 'await' | 'yield' | 'asyncgen' | 'reset'; + readonly items: Element[]; + constructor(mode: 'await' | 'yield' | 'asyncgen' | 'reset', items: Element[]) { this.mode = mode; this.items = items; } +} class NotNode { readonly __kind = 'not' as const; - // Zero-width negative lookahead over a single element (wrap a sequence in a - // group/alt if needed). Matches nothing; succeeds only when `item` can't match. - readonly item: Element; - constructor(item: Element) { this.item = item; } + // Zero-width negative lookahead over an element, or an array (a seq, like + // everywhere else in the rule DSL). Matches nothing; succeeds only when + // `item` can't match. `reservable` flags the bare-identifier reserved-word guard + // (notReservedExpr) so the await-yield-fork transform extends it per context family. + readonly item: Element | Element[]; + readonly reservable: boolean; + constructor(item: Element | Element[], reservable = false) { this.item = item; this.reservable = reservable; } } -type Combinator = SepNode | OptNode | ManyNode | Many1Node | AltNode | ExcludeNode | NotNode; +class RelaxNode { + // A tree-sitter-only divergence: the PARSER (and every other generator) parses + // `strict`; gen-treesitter renders `relaxed`. Use when a parser-correct constraint is + // tree-sitter-GLR-hostile and the highlighter can safely over-accept the rare malformed + // form (see RuleExpr.group.tsRelaxed). Like ctx/exclude it lowers to a transparent group. + readonly __kind = 'relax' as const; + readonly strict: Element[]; + readonly relaxed: Element[]; + constructor(strict: Element[], relaxed: Element[]) { this.strict = strict; this.relaxed = relaxed; } +} + +class CapExprNode { + // Wrap a NUD alternative that is a complete assignment-level expression — an + // ArrowFunction, the LOWEST-precedence ECMAScript AssignmentExpression. `below` names + // the operator whose binding power is the cap: the alternative may be parsed only when + // the enclosing Pratt minBp is looser than `below`, and once parsed it admits NO led + // (`() => {} || a` is not `(() => {}) || a` — an arrow can be neither operand of + // `||`/`??`/`?:`/binary, nor an assignment target). Reuses the transparent `group` node: + // matched exactly like the bare alternative (no extra CST node), the cap is read only by + // the expression engine. A general property — any grammar with a lowest-precedence + // primary expression form can declare it; the engine enforces it generically. + readonly __kind = 'cap-expr' as const; + readonly below: string; + readonly items: Element[]; + constructor(below: string, items: Element[]) { this.below = below; this.items = items; } +} + +type Combinator = SepNode | OptNode | ManyNode | Many1Node | AltNode | ExcludeNode | NotNode | CtxNode | RelaxNode | CapExprNode; export function sep(item: Element, delimiter: string): SepNode { return new SepNode(item, delimiter); @@ -222,11 +300,43 @@ export function exclude(connectors: string | string[], ...items: Element[]): Exc return new ExcludeNode(typeof connectors === 'string' ? [connectors] : connectors, items); } +// Parse `strict` (in the parser and all generators) but render `relaxed` for tree-sitter. +// For a parser-correct constraint that explodes / inflates the tree-sitter GLR table while +// the highlighter doesn't need it. Each side is a single element or an array (a seq). +export function tsRelax(strict: Element | Element[], relaxed: Element | Element[]): RelaxNode { + return new RelaxNode(Array.isArray(strict) ? strict : [strict], Array.isArray(relaxed) ? relaxed : [relaxed]); +} + +// Mark a NUD alternative as a complete assignment-level expression (an ArrowFunction — +// the lowest-precedence ECMAScript AssignmentExpression). `below` names the operator whose +// binding power caps it: the alternative parses only when the enclosing Pratt minBp is +// looser than `below`, and once parsed admits no led. See CapExprNode. +export function capExpr(below: string, ...items: Element[]): CapExprNode { + return new CapExprNode(below, items); +} + +// Mark items as await / yield / async-generator context (see CtxNode). Wrap an +// async arm's body and params in awaitCtx(...), a generator arm's in yieldCtx(...), +// an async-generator's in asyncGenCtx(...). +export function awaitCtx(...items: Element[]): CtxNode { return new CtxNode('await', items); } +export function yieldCtx(...items: Element[]): CtxNode { return new CtxNode('yield', items); } +export function asyncGenCtx(...items: Element[]): CtxNode { return new CtxNode('asyncgen', items); } +// Reset to NO await/yield context (a nested non-async/non-generator function/arrow/ +// method body, a class body, a computed property key, a field initializer). Wrapping a +// body in resetCtx() inside an already-forked family routes its refs back to the plain +// family — the boundary the fork transform stops at. +export function resetCtx(...items: Element[]): CtxNode { return new CtxNode('reset', items); } + // Zero-width negative lookahead: `not(x)` matches nothing and succeeds only when // `x` would NOT match here. -export function not(item: Element): NotNode { +export function not(item: Element | Element[]): NotNode { return new NotNode(item); } +// The bare-identifier reserved-word guard (notReservedExpr / notReserved): a `not` +// the await-yield-fork transform extends with await/yield inside those contexts. +export function reservableNot(item: Element | Element[]): NotNode { + return new NotNode(item, true); +} // ── Precedence ── @@ -236,7 +346,7 @@ interface PrecLevelDef { operators: PrecOperator[]; } -type OpSpec = string | PrefixOps | PostfixOps | NoUnaryLhsOps; +type OpSpec = string | PrefixOps | PostfixOps | NoUnaryLhsOps | LhsTargetOps; function buildPrecOps(ops: OpSpec[]): PrecOperator[] { const result: PrecOperator[] = []; @@ -244,9 +354,11 @@ function buildPrecOps(ops: OpSpec[]): PrecOperator[] { if (typeof o === 'string') { result.push({ value: o, position: 'infix' }); } else if (o.__kind === 'prefix-ops') { - for (const v of o.ops) result.push({ value: v, position: 'prefix' }); + for (const v of o.ops) result.push({ value: v, position: 'prefix', requireTarget: o.requireTarget }); } else if (o.__kind === 'postfix-ops') { - for (const v of o.ops) result.push({ value: v, position: 'postfix' }); + for (const v of o.ops) result.push({ value: v, position: 'postfix', requireTarget: o.requireTarget }); + } else if (o.__kind === 'lhs-target-ops') { + for (const v of o.ops) result.push({ value: v, position: 'infix', requireTarget: true }); } else { for (const v of o.ops) result.push({ value: v, position: 'infix', noUnaryLhs: true }); } @@ -311,6 +423,30 @@ function toRuleExpr(el: Element, names: Map): RuleExpr { : { type: 'seq' as const, items: el.items.map(i => toRuleExpr(i, names)) }; return { type: 'group', body, suppress: el.connectors }; } + if (el instanceof CtxNode) { + // Transparent group carrying the ctxMode marker; only the await-yield-fork + // transform reads ctxMode, everyone else recurses into body as a plain group. + const body = el.items.length === 1 + ? toRuleExpr(el.items[0], names) + : { type: 'seq' as const, items: el.items.map(i => toRuleExpr(i, names)) }; + return { type: 'group', body, ctxMode: el.mode }; + } + if (el instanceof RelaxNode) { + // Transparent group: every consumer reads `body` (strict); only gen-treesitter + // renders `tsRelaxed`. + const build = (items: Element[]): RuleExpr => items.length === 1 + ? toRuleExpr(items[0], names) + : { type: 'seq', items: items.map(i => toRuleExpr(i, names)) }; + return { type: 'group', body: build(el.strict), tsRelaxed: build(el.relaxed) }; + } + if (el instanceof CapExprNode) { + // Reuse the transparent `group` node (every walker recurses into `body`); `capBelow` + // is read only by the expression engine's Pratt core. + const body = el.items.length === 1 + ? toRuleExpr(el.items[0], names) + : { type: 'seq' as const, items: el.items.map(i => toRuleExpr(i, names)) }; + return { type: 'group', body, capBelow: el.below }; + } if (el instanceof AltNode) { // A branch may be a single element or a sequence (array → seq). return { @@ -326,7 +462,11 @@ function toRuleExpr(el: Element, names: Map): RuleExpr { }; } if (el instanceof NotNode) { - return { type: 'not', body: toRuleExpr(el.item, names) }; + // an array is a seq here like everywhere else in the rule DSL + const body = Array.isArray(el.item) + ? { type: 'seq' as const, items: el.item.map(i => toRuleExpr(i, names)) } + : toRuleExpr(el.item, names); + return el.reservable ? { type: 'not', body, reservable: true } : { type: 'not', body }; } const marker = el as Marker; if (marker.__kind === 'op') return { type: 'op' }; @@ -335,6 +475,7 @@ function toRuleExpr(el: Element, names: Map): RuleExpr { if (marker.__kind === 'sameLine') return { type: 'sameLine' }; if (marker.__kind === 'noCommentBefore') return { type: 'noCommentBefore' }; if (marker.__kind === 'noMultilineFlowBefore') return { type: 'noMultilineFlowBefore' }; + if (marker.__kind === 'notLeftLeaf') return { type: 'notLeftLeaf', words: marker.words }; throw new Error(`Unknown element: ${JSON.stringify(el)}`); } diff --git a/src/await-yield-fork.ts b/src/await-yield-fork.ts new file mode 100644 index 0000000..52ac310 --- /dev/null +++ b/src/await-yield-fork.ts @@ -0,0 +1,173 @@ +// Build-time grammar transform implementing the ECMAScript [Await]/[Yield] grammar +// parameters by NAME-FORKING the body-reachable rule closure into context families. +// +// WHY a fork and not a runtime flag: Monogram's incremental adoption reuses a row iff +// its window (text + bars) replays identically — a row's parse must be a pure function +// of (window text, window bars) GIVEN ITS RULE. async/generator context flows from an +// ENCLOSING function OUTSIDE a row's window, so a runtime context flag read by core() +// but absent from the reuse key breaks that purity (a far `function`->`async function` +// edit, or even node surgery re-parsing a body statement with the ambient flag reset to +// its default, makes edit() diverge from a fresh parse). The fix that costs ZERO new +// reuse machinery: make the context part of the RULE IDENTITY. Every reuse predicate +// already keys on rowRule/rid (adoptSeek, runExtend, surgery's SURG_ELEM/RULE_FN_BY_ID), +// and the memo arrays are name-keyed, so an await-context Block is literally a different +// rule (Block$A) with its own rid and memo slot — a cross-family reuse is structurally +// UNREPRESENTABLE, not merely guarded. The window-replay theorem holds verbatim: the +// rule is part of the frame identity, never out-of-window text. +// +// HOW context boundaries are expressed: the grammar wraps each function/arrow/method/ +// class BODY (and an async arm's params) in a context marker — awaitCtx / yieldCtx / +// asyncGenCtx for the operator contexts, resetCtx for the bodies that reset to none +// (a nested non-async function, a class body, a computed key, a field initializer). +// The markers are transparent `group` nodes carrying `ctxMode`; only this transform +// reads them. The fork is driven ENTIRELY by the markers — the reset boundary (open +// question #3) is explicit, not inferred. +// +// Forks collapse to their BASE rule for every DERIVED artifact via RuleDecl.canon: the +// emitted parser keeps the distinct name for memo/adoption identity but reports `canon` +// as the green-node rule name (so trees stay byte-identical to the base grammar), and +// the AST / TM / tree-sitter / cst-match generators skip forks (a fork's structure and +// scope are its base's). +import type { CstGrammar, RuleDecl, RuleExpr } from './types.ts'; + +type Family = 'await' | 'yield' | 'asyncgen'; +const SUFFIX: Record = { await: '$A', yield: '$Y', asyncgen: '$AY' }; +const RESERVED: Record = { await: ['await'], yield: ['yield'], asyncgen: ['await', 'yield'] }; + + +export function withAwaitYield(grammar: CstGrammar): CstGrammar { + const byName = new Map(grammar.rules.map(r => [r.name, r])); + + // ── 1. Per-family closure: which rules need an $F clone. A rule S is in closure[F] + // if it is reachable, via in-family refs, from a subtree marked mode F — where a + // nested marker of mode M re-roots the walk into family M (or plain, for reset). ── + const closure: Record> = { await: new Set(), yield: new Set(), asyncgen: new Set() }; + + // Walk `expr` collecting the rule refs reachable WITHOUT crossing a ctx marker, and + // recurse into nested markers under their own family. `intoFamily(name, F)` enrolls a + // rule into closure[F] and (first time) walks its body under F. + function walkExpr(expr: RuleExpr, fam: Family | null): void { + if (!expr || typeof expr !== 'object') return; + switch (expr.type) { + case 'ref': + if (fam && byName.has(expr.name)) intoFamily(expr.name, fam); + return; + case 'group': + if (expr.ctxMode && expr.ctxMode !== 'reset') { walkExpr(expr.body, expr.ctxMode); return; } + if (expr.ctxMode === 'reset') { walkExpr(expr.body, null); return; } // plain family: no clone needed + walkExpr(expr.body, fam); return; + case 'seq': case 'alt': expr.items.forEach(i => walkExpr(i, fam)); return; + case 'quantifier': walkExpr(expr.body, fam); return; + case 'not': walkExpr(expr.body, fam); return; + case 'sep': walkExpr(expr.element, fam); return; + default: return; // literal / zero-width markers + } + } + function intoFamily(name: string, fam: Family): void { + if (closure[fam].has(name)) return; + closure[fam].add(name); + const r = byName.get(name); + if (r) walkExpr(r.body, fam); // refs inside an enrolled rule stay in-family + } + // Seed: scan every BASE rule body for ctx markers (the function/arrow/method/class + // body roots) and walk their contents under the marked family. + for (const r of grammar.rules) walkExpr(r.body, null); + + // ── 2. Rewrite an expr for emission in family `fam` (null = plain/base): a ref to a + // rule in closure[fam] becomes the $F clone; a nested ctx marker switches family; + // a reset marker drops to plain; a GUARD_RULE ref takes the family-suffixed guard. ── + function rewrite(expr: RuleExpr, fam: Family | null): RuleExpr { + if (!expr || typeof expr !== 'object') return expr; + switch (expr.type) { + case 'ref': { + if (fam && closure[fam].has(expr.name)) return { type: 'ref', name: expr.name + SUFFIX[fam] }; + return expr; + } + case 'group': { + const inner = expr.ctxMode === 'reset' ? null : (expr.ctxMode ? expr.ctxMode : fam); + const body = rewrite(expr.body, inner); + // strip the ctxMode marker from the emitted grammar (it has done its routing + // job); keep `suppress` (no-in context) and `capBelow` (assignment-level cap), + // both still read by the parser engine. (tsRelaxed is gen-treesitter-only and the + // post-fork grammar is the PARSER's, which uses `body` — so it is correctly dropped.) + const g: RuleExpr = { type: 'group', body }; + if (expr.suppress !== undefined) g.suppress = expr.suppress; + if (expr.capBelow !== undefined) g.capBelow = expr.capBelow; + return g; + } + case 'seq': return { type: 'seq', items: expr.items.map(i => rewrite(i, fam)) }; + case 'alt': return { type: 'alt', items: expr.items.map(i => rewrite(i, fam)) }; + case 'quantifier': return { type: 'quantifier', body: rewrite(expr.body, fam), kind: expr.kind }; + case 'not': { + // the bare-identifier reserved-word guard: inside a context family, also + // forbid that family's keyword(s), so `await`/`yield` lose their identifier + // reading (await with no operand then rejects — the prefix op needs one). + const body = fam && expr.reservable ? addReserved(rewrite(expr.body, fam), RESERVED[fam]) : rewrite(expr.body, fam); + return expr.reservable ? { type: 'not', body, reservable: true } : { type: 'not', body }; + } + case 'sep': return { type: 'sep', element: rewrite(expr.element, fam), delimiter: expr.delimiter }; + default: return expr; + } + } + + // ── 3. The forked rules (appended AFTER the base rules so every existing rid = + // rules.indexOf is unchanged and the entry rule stays last). ── + const forks: RuleDecl[] = []; + const families: Family[] = ['await', 'yield', 'asyncgen']; + for (const fam of families) { + const suf = SUFFIX[fam]; + for (const name of closure[fam]) { + const base = byName.get(name)!; + // rewrite reroutes in-family refs to $F and extends any reservable guard with + // the family's context keyword (see the 'not' case in rewrite()). + forks.push({ name: name + suf, body: rewrite(base.body, fam), flags: [...base.flags], canon: name }); + } + } + + // ── 4. Rewrite the BASE rules in place: a base rule containing ctx markers must now + // reference the $F clones at those roots (materialize the routing). Refs OUTSIDE any + // marker stay plain. ── + const baseRewritten: RuleDecl[] = grammar.rules.map(r => ({ ...r, body: rewrite(r.body, null) })); + + // Insert the forks BEFORE the entry rule (the last rule — findEntryRule reads + // rules[length-1]) so the entry stays last. Existing rids shift only for the entry, + // which is looked up by position consistently everywhere; forks (body-internal + // rules) are never the entry. + if (forks.length === 0) return { ...grammar, rules: baseRewritten }; + const entry = baseRewritten[baseRewritten.length - 1]; + return { ...grammar, rules: [...baseRewritten.slice(0, -1), ...forks, entry] }; +} + +// Collapse the [Await]/[Yield] forks back to the base grammar for the DERIVED-artifact +// generators (AST types / TM scopes / tree-sitter rules): drop every fork rule and +// rewrite any reference to a fork (the base async arm's rerouted Block$A, etc.) back to +// its base name. The result is structurally the pre-fork grammar, so those generators +// emit byte-identically. Identity (returns the same object) when nothing is forked. +export function dropForks(grammar: CstGrammar): CstGrammar { + const canonOf = new Map(); + for (const r of grammar.rules) if (r.canon) canonOf.set(r.name, r.canon); + if (canonOf.size === 0) return grammar; + const reref = (e: RuleExpr): RuleExpr => { + if (!e || typeof e !== 'object') return e; + switch (e.type) { + case 'ref': return canonOf.has(e.name) ? { type: 'ref', name: canonOf.get(e.name)! } : e; + case 'group': return { type: 'group', body: reref(e.body), ...(e.suppress !== undefined ? { suppress: e.suppress } : {}), ...(e.capBelow !== undefined ? { capBelow: e.capBelow } : {}) }; + case 'seq': return { type: 'seq', items: e.items.map(reref) }; + case 'alt': return { type: 'alt', items: e.items.map(reref) }; + case 'quantifier': return { type: 'quantifier', body: reref(e.body), kind: e.kind }; + case 'not': return { type: 'not', body: reref(e.body) }; + case 'sep': return { type: 'sep', element: reref(e.element), delimiter: e.delimiter }; + default: return e; + } + }; + return { ...grammar, rules: grammar.rules.filter(r => !r.canon).map(r => ({ ...r, body: reref(r.body) })) }; +} + +// Add `words` to the INNER body of a reservable guard's not(...): the body is the +// alt of forbidden literals (`alt('catch','class',…)`) or a single literal. Returns +// the extended alt; the caller wraps it back in the `not`. +function addReserved(inner: RuleExpr, words: string[]): RuleExpr { + const lits = words.map((w): RuleExpr => ({ type: 'literal', value: w })); + if (inner.type === 'alt') return { type: 'alt', items: [...inner.items, ...lits] }; + return { type: 'alt', items: [inner, ...lits] }; +} diff --git a/src/cli.ts b/src/cli.ts index cba2bc1..76d6615 100644 --- a/src/cli.ts +++ b/src/cli.ts @@ -133,6 +133,7 @@ function formatExpr(expr: RuleExpr): string { case 'sameLine': return 'sameLine'; case 'noCommentBefore': return 'noCommentBefore'; case 'noMultilineFlowBefore': return 'noMultilineFlowBefore'; + case 'notLeftLeaf': return `notLeftLeaf(${expr.words.map(w => `'${w}'`).join(', ')})`; case 'sep': return `sep(${formatExpr(expr.element)}, '${expr.delimiter}')`; } } diff --git a/src/emit-lexer.ts b/src/emit-lexer.ts index bf2ce1d..18d9c0d 100644 --- a/src/emit-lexer.ts +++ b/src/emit-lexer.ts @@ -108,6 +108,32 @@ export function emitLexer(grammar: CstGrammar, st: LexerSymtab): string | null { emit(`// resync: suffix-zone equality makes a cut token's END mismatch the old one)`); emit(`const LEX_RETRY = { retry: true };`); emit(`let lexWindowMore = false;`); + emit(`let lexSrcBase = 0;`); + emit(`let lexDiagBase = 0; // docLex floor for the current window (its own emissions sit above)`); + emit(`// Shifted-resync support: lexResyncPd is the paren-depth delta between the live`); + emit(`// stack and the old record at the adopted suffix's first token (the splice adds`); + emit(`// it to every adopted tkPd, restoring true absolute depths). altSuffMin[j] =`); + emit(`// min paren depth recorded over the old suffix [j, altN) (pop-on-empty = -1),`); + emit(`// built lazily once per edit (the caller nulls it when the alt stream changes).`); + emit(`let lexResyncPd = 0;`); + emit(`let altSuffMin = null;`); + emit(`let altSuffMinBuf = null;`); + emit(`// ')' pops that found an empty stack, in THIS lexCore call's token indices`); + emit(`let lexEmptyPops = [];`); + emit(`// Min OLD-stream paren depth over the tokens inside the damage itself (set by the`); + emit(`// caller before the window lex): the old-side trajectory min starts from here.`); + emit(`let wndOldMin0 = 0x7fffffff;`); + emit(`function buildAltSuffMin(lo) {`); + emit(` if (altSuffMinBuf === null || altSuffMinBuf.length < altN + 1) altSuffMinBuf = new Int32Array(altN + 1025);`); + emit(` altSuffMin = altSuffMinBuf;`); + emit(` altSuffMin[altN] = 0x7fffffff;`); + emit(` for (let j = altN - 1; j >= lo; j--) {`); + emit(` let d = altPd[j];`); + emit(` if (d === 0 && altK[j] === K_PUNCT && altT[j] === ${tOf(')')} && (j === 0 || altPd[j - 1] === 0)) d = -1;`); + emit(` const nx = altSuffMin[j + 1];`); + emit(` altSuffMin[j] = d < nx ? d : nx;`); + emit(` }`); + emit(`}`); emit(`const LX_UNI_IDENT = /[$_\\p{ID_Start}][$\\u200c\\u200d\\p{ID_Continue}]*/uy;`); emit(`const LX_UNI_CONT = /[$\\u200c\\u200d\\p{ID_Continue}]+/uy;`); emit(`const LX_UNI_FULL = /^[$_\\p{ID_Start}][$\\u200c\\u200d\\p{ID_Continue}]*/u;`); @@ -125,6 +151,7 @@ export function emitLexer(grammar: CstGrammar, st: LexerSymtab): string | null { !first || [...first.ascii].some(cc => kwFirstCcs.has(cc)); // keywords are ASCII-initial const kIdent = identTokenName ? kOf(identTokenName) : 0; const tRParen = tOf(')'); + const tLParen = tOf('('); emit(``); // ── Baked keyword recognizer over a SOURCE SPAN: t-intern with no slice and no hash. // Length window → first-charCode switch → per-keyword compare chains (shortest first); @@ -175,6 +202,7 @@ export function emitLexer(grammar: CstGrammar, st: LexerSymtab): string | null { emit(`}`); if (templateToken) { emit(`function lexTplSpan(source, pos, validateEscapes) {`); + emit(` const tplFrom = pos;`); emit(` while (pos < source.length) {`); emit(` if (${startsWithExpr('source', 'pos', tplInterpOpen)}) return { endsWithInterp: true, end: pos + ${tplInterpOpen.length} };`); emit(` if (source.charCodeAt(pos) === 92) {`); @@ -182,7 +210,11 @@ export function emitLexer(grammar: CstGrammar, st: LexerSymtab): string | null { emit(` if (validateEscapes) {`); emit(` LX_TPL_ESC.lastIndex = pos;`); emit(` const m = LX_TPL_ESC.exec(source);`); - emit(` if (!m) { if (lexWindowMore) throw LEX_RETRY; throw new Error('Invalid escape sequence in template at offset ' + pos); }`); + emit(` if (!m) {`); + emit(` if (lexWindowMore) throw LEX_RETRY;`); + emit(` if (recovering) { docLex.push({ offset: pos + lexSrcBase, end: pos + lexSrcBase + 1, kind: 1, ch: '' }); pos += 1; continue; }`); + emit(` throw new Error('Invalid escape sequence in template at offset ' + pos);`); + emit(` }`); emit(` pos += m[0].length;`); emit(` } else { pos += 2; }`); } else { @@ -194,6 +226,10 @@ export function emitLexer(grammar: CstGrammar, st: LexerSymtab): string | null { emit(` pos++;`); emit(` }`); emit(` if (lexWindowMore) throw LEX_RETRY;`); + emit(` if (recovering) {`); + emit(` docLex.push({ offset: tplFrom + lexSrcBase, end: source.length + lexSrcBase, kind: 2, ch: '' });`); + emit(` return { endsWithInterp: false, end: source.length };`); + emit(` }`); emit(` throw new Error('Unterminated template literal at offset ' + pos);`); emit(`}`); } @@ -223,6 +259,7 @@ export function emitLexer(grammar: CstGrammar, st: LexerSymtab): string | null { emit(`function lexCore(source, startPos, pvK, pvT, wndPtr0, wndMinOff, wndDelta, wndCs, initParens, srcBase, hasMore) {`); emit(` if (srcBase === undefined) srcBase = 0;`); emit(` lexWindowMore = hasMore === true;`); + emit(` lexSrcBase = srcBase;`); emit(` const n = source.length;`); emit(` let pos = startPos;`); emit(` let pendingNl = false;`); @@ -233,12 +270,12 @@ export function emitLexer(grammar: CstGrammar, st: LexerSymtab): string | null { emit(` const parenHeadStack = initParens !== undefined && initParens !== null ? initParens : [];`); emit(` let wndPtr = wndPtr0;`); emit(` let wndHit = -1;`); - emit(` // stack depths as of the last token fully BEFORE the damage: a resync point may`); - emit(` // sit at any depth as long as every bracket still open there was opened before`); - emit(` // the damage (the prefix agrees byte-for-byte, so those stack entries agree too;`); - emit(` // anything opened inside the damage could differ in control-head-ness).`); - emit(` let dmgDp = -1, dmgPd = -1;`); - emit(` let lastDp = templateStack.length, lastPd = parenHeadStack.length;`); + emit(` lexEmptyPops.length = 0;`); + emit(` // Trajectory minimums since the point the two lexes diverge (the damage start;`); + emit(` // before it, identical bytes from an identical anchor state give identical`); + emit(` // tokens and stack ops). An entry at depth <= BOTH mins was open at the`); + emit(` // divergence point in both lexes - i.e. it is the SAME entry.`); + emit(` let dmgMinOld = wndOldMin0, dmgMinNew = -1;`); emit(` function tkPush(k, t, off, end) {`); emit(` off += srcBase; end += srcBase;`); emit(` if (tokN === tkCap) growTok();`); @@ -250,17 +287,59 @@ export function emitLexer(grammar: CstGrammar, st: LexerSymtab): string | null { emit(` pendingNl = false;`); emit(` pvK = k; pvT = t;`); emit(` tokN++;`); + emit(` // Resync: adopt the OLD suffix from this aligned token on. Sound iff the old`); + emit(` // suffix's lexing is reproducible from OBSERVABLE state alone. Always required:`); + emit(` // - both template stacks EMPTY (an entry's brace counter is mutable state no`); + emit(` // record captures - depth equality cannot prove counters equal);`); + emit(` // - the candidate carries no cross-token flag its adopted successor reads`); + emit(` // (postfix-ambiguous op / control keyword / '(' / ')' each make the NEXT`); + emit(` // token's lexing depend on tokens BEFORE the candidate, which the window`); + emit(` // may have re-derived differently than the old stream had them).`); + emit(` // Then either of two sufficient paren-stack conditions:`); + emit(` // - FAST: equal depth, never dipped below it since the divergence point on`); + emit(` // either side - every open entry is then pre-divergence-common, the stacks`); + emit(` // are content-EQUAL, and all future pops behave identically; or`); + emit(` // - SHIFTED: the old suffix never pops an entry that is open at the candidate`); + emit(` // (suffix min depth >= candidate depth, a pop-on-empty counted as -1): no`); + emit(` // open entry's head-ness is ever read again, so the contents are irrelevant`); + emit(` // and the depths may differ by an arbitrary shift - the caller re-bases the`); + emit(` // adopted tkPd column by lexResyncPd to the new truth.`); emit(` if (wndPtr >= 0) {`); - emit(` if (dmgPd < 0) {`); - emit(` if (off >= wndCs) { dmgDp = lastDp; dmgPd = lastPd; }`); - emit(` else { lastDp = tkDp[tokN - 1]; lastPd = tkPd[tokN - 1]; }`); - emit(` }`); - emit(` if (off >= wndMinOff && dmgPd >= 0`); - emit(` && templateStack.length <= dmgDp && parenHeadStack.length <= dmgPd) {`); - emit(` while (wndPtr < altN && (altOff[wndPtr] < 0 ? altOff[wndPtr] + srcLenP1 : altOff[wndPtr]) + wndDelta < off) wndPtr++;`); + emit(` const pd = tkPd[tokN - 1];`); + emit(` if (dmgMinNew < 0) { if (off >= wndCs) dmgMinNew = pd; }`); + emit(` else if (pd < dmgMinNew) dmgMinNew = pd;`); + emit(` if (off >= wndMinOff) {`); + emit(` while (wndPtr < altN && (altOff[wndPtr] < 0 ? altOff[wndPtr] + srcLenP1 : altOff[wndPtr]) + wndDelta < off) { if (altPd[wndPtr] < dmgMinOld) dmgMinOld = altPd[wndPtr]; wndPtr++; }`); emit(` if (wndPtr < altN && (altOff[wndPtr] < 0 ? altOff[wndPtr] + srcLenP1 : altOff[wndPtr]) + wndDelta === off && altK[wndPtr] === k && altT[wndPtr] === t`); - emit(` && (altEnd[wndPtr] < 0 ? altEnd[wndPtr] + srcLenP1 : altEnd[wndPtr]) + wndDelta === end && altDp[wndPtr] === templateStack.length && altPd[wndPtr] === parenHeadStack.length) {`); - emit(` wndHit = wndPtr;`); + emit(` && (altEnd[wndPtr] < 0 ? altEnd[wndPtr] + srcLenP1 : altEnd[wndPtr]) + wndDelta === end`); + emit(` // the candidate's LEADING-TRIVIA flags must match too: the gap before`); + emit(` // it may sit inside the edit (newline removed/added without moving any`); + emit(` // token bytes), and parsers read these flags (sameLine / commentBefore)`); + emit(` && altFl[wndPtr] === tkFl[tokN - 1]`); + emit(` && templateStack.length === 0 && altDp[wndPtr] === 0`); + emit(` && LX_PFXV[t] === 0 && LX_PARENKW[t] === 0`); + emit(` && !(k === K_PUNCT && (t === ${tLParen} || t === ${tRParen}))) {`); + emit(` const q = altPd[wndPtr];`); + emit(` if (q < dmgMinOld) dmgMinOld = q;`); + emit(` if (q === pd && pd <= dmgMinOld && pd <= dmgMinNew) {`); + emit(` wndHit = wndPtr;`); + emit(` lexResyncPd = 0;`); + emit(` } else {`); + emit(` // shifted: q = 0 needs only "no pop-on-empty beyond the candidate"`); + emit(` // (the doc-level list is ascending - one end check); q > 0 needs the`); + emit(` // full suffix minimum, built lazily once per edit`); + emit(` let okTail;`); + emit(` if (q === 0) {`); + emit(` okTail = docEmptyPops.length === 0 || docEmptyPops[docEmptyPops.length - 1] <= wndPtr;`); + emit(` } else {`); + emit(` if (altSuffMin === null) buildAltSuffMin(wndPtr0);`); + emit(` okTail = altSuffMin[wndPtr + 1] >= q;`); + emit(` }`); + emit(` if (okTail) {`); + emit(` wndHit = wndPtr;`); + emit(` lexResyncPd = pd - q;`); + emit(` }`); + emit(` }`); emit(` }`); emit(` }`); emit(` }`); @@ -277,7 +356,10 @@ export function emitLexer(grammar: CstGrammar, st: LexerSymtab): string | null { emit(` return LX_DIVK[k] !== 0 || LX_DIVT[t] !== 0;`); emit(` }`); emit(` while (pos < n) {`); - emit(` if (wndHit >= 0) { tokN--; return wndHit; }`); + emit(` // resync retracts the duplicated token push — and any lexer diagnostics + // emitted FOR it (the old stream's persisted entry survives via the shift; + // keeping the window's copy too double-reports the same character)`); + emit(` if (wndHit >= 0) { tokN--; while (docLex.length > lexDiagBase && docLex[docLex.length - 1].offset >= tkOff[tokN]) docLex.length--; return wndHit; }`); emit(` const cc = source.charCodeAt(pos);`); emit(` // whitespace: ASCII \\s run by char loop; a non-ASCII candidate falls back to the regex`); emit(` if (cc === 32 || (cc >= 9 && cc <= 13)) {`); @@ -370,7 +452,11 @@ export function emitLexer(grammar: CstGrammar, st: LexerSymtab): string | null { emit(`${ind} if (m !== null) {`); if (m.identLike) { const plen = (identPrefixByName.get(m.name) ?? '').length; - emit(`${ind} if (!lexIdentValid(m[0], ${plen})) { if (lexWindowMore) throw LEX_RETRY; throw new Error("Invalid identifier escape at offset " + pos + ": '" + m[0] + "'"); }`); + emit(`${ind} if (!lexIdentValid(m[0], ${plen})) {`); + emit(`${ind} if (lexWindowMore) throw LEX_RETRY;`); + emit(`${ind} if (!recovering) throw new Error("Invalid identifier escape at offset " + pos + ": '" + m[0] + "'");`); + emit(`${ind} docLex.push({ offset: pos + lexSrcBase, end: pos + lexSrcBase + m[0].length, kind: 3, ch: m[0] });`); + emit(`${ind} }`); } if (m.skip) { emit(`${ind} if (m[0].includes('\\n')) pendingNl = true;`); @@ -391,7 +477,8 @@ export function emitLexer(grammar: CstGrammar, st: LexerSymtab): string | null { emit(`${ind} parenHeadStack.push(_ph);`); emit(`${ind} extraFl = _ph ? 8 : 0; }`); } else if (lit === ')') { - emit(`${ind}lastCloseWasParenHead = parenHeadStack.pop() ?? false;`); + emit(`${ind}if (parenHeadStack.length === 0) { lastCloseWasParenHead = false; lexEmptyPops.push(tokN); }`); + emit(`${ind}else lastCloseWasParenHead = parenHeadStack.pop();`); } if (regexCtx?.postfixAfterValueTexts?.includes(lit)) { emit(`${ind}lastBangWasPostfix = prevIsValue();`); @@ -515,9 +602,15 @@ export function emitLexer(grammar: CstGrammar, st: LexerSymtab): string | null { emit(` }`); } emit(` if (lexWindowMore) throw LEX_RETRY;`); + emit(` if (recovering) {`); + emit(` docLex.push({ offset: pos + srcBase, end: pos + srcBase + 1, kind: 0, ch: source[pos] });`); + emit(` tkPush(${st.KIND_NAMED_FALLBACK}, 0, pos, pos + 1);`); + emit(` pos += 1;`); + emit(` continue;`); + emit(` }`); emit(` throw new Error("Unexpected character at offset " + pos + ": '" + source[pos] + "'");`); emit(` }`); - emit(` if (wndHit >= 0) { tokN--; return wndHit; }`); + emit(` if (wndHit >= 0) { tokN--; while (docLex.length > lexDiagBase && docLex[docLex.length - 1].offset >= tkOff[tokN]) docLex.length--; return wndHit; }`); emit(` return hasMore ? -2 : -1;`); emit(`}`); emit(`// Windowed-relex restart anchor: the last token B ending at/before the damage`); diff --git a/src/emit-parser.ts b/src/emit-parser.ts index 4498f64..0168a0a 100644 --- a/src/emit-parser.ts +++ b/src/emit-parser.ts @@ -27,6 +27,7 @@ import type { CstGrammar, RuleExpr, RuleDecl, PrecLevel } from './types.ts'; import { isKeywordLiteral, collectLiterals } from './grammar-utils.ts'; import { emitLexer } from './emit-lexer.ts'; +import { withAwaitYield } from './await-yield-fork.ts'; // ── Static analysis (re-derived; mirrors gen-parser.ts exactly) ── @@ -35,6 +36,7 @@ interface OpInfo { rbp: number; assoc: 'left' | 'right' | 'none'; position: 'infix' | 'prefix' | 'postfix'; + requireTarget?: boolean; } type FirstTok = { lit: string } | { tok: string } | null; @@ -61,20 +63,26 @@ function analyze(grammar: CstGrammar) { const prefixOps = new Map(); const noUnaryLhsOps = new Set(); const postfixOpValues = new Set(); + // Infix/postfix ops whose operand must be a valid assignment target (LHS) — see + // PrecOperator.requireTarget. Keyed like noUnaryLhsOps for the byte-table dispatch. + const requireTargetOps = new Set(); for (let i = 0; i < grammar.precs.length; i++) { const level = grammar.precs[i]; const bp = (i + 1) * 2; for (const op of level.operators) { if (op.position === 'prefix') { - prefixOps.set(op.value, { lbp: 0, rbp: level.assoc === 'right' ? bp - 1 : bp, assoc: level.assoc, position: 'prefix' }); + prefixOps.set(op.value, { lbp: 0, rbp: level.assoc === 'right' ? bp - 1 : bp, assoc: level.assoc, position: 'prefix', requireTarget: op.requireTarget }); + if (op.requireTarget) requireTargetOps.add(op.value); } else if (op.position === 'postfix') { postfixOpValues.add(op.value); - opTable.set(op.value, { lbp: bp, rbp: 0, assoc: level.assoc, position: 'postfix' }); + opTable.set(op.value, { lbp: bp, rbp: 0, assoc: level.assoc, position: 'postfix', requireTarget: op.requireTarget }); + if (op.requireTarget) requireTargetOps.add(op.value); } else { const lbp = bp; const rbp = level.assoc === 'right' ? bp - 1 : bp; - opTable.set(op.value, { lbp, rbp, assoc: level.assoc, position: 'infix' }); + opTable.set(op.value, { lbp, rbp, assoc: level.assoc, position: 'infix', requireTarget: op.requireTarget }); if (op.noUnaryLhs) noUnaryLhsOps.add(op.value); + if (op.requireTarget) requireTargetOps.add(op.value); } } } @@ -91,6 +99,16 @@ function analyze(grammar: CstGrammar) { ledPrecByConnector.set(lp.connector, { lbp, rhsBp: lp.chainRhs ? lbp : null }); } + // Binary / relational / conditional connectors — the MIDDLE child of a `$ op $` (or + // alternative-form) LED. A node whose child[1] is one of these is a binary expression, + // NOT a LeftHandSideExpression, so it is not a valid assignment target (`a + b = c`, + // `a in b = c`, `a as T = b` are spec grammar errors). Ladder INFIX ops carry the + // operator as an operator-tag leaf; the alternative-form binary LEDs (`in`/`instanceof`/ + // `as`/`satisfies`/`?`) carry it as a keyword/punct leaf — both land at child[1]. + const binaryConnectors = new Set(); + for (const [v, info] of opTable) if (info.position === 'infix') binaryConnectors.add(v); + for (const k of ledPrecByConnector.keys()) binaryConnectors.add(k); + // Pratt rules. const prattRules = new Set(); for (const rule of grammar.rules) if (hasMarker(rule.body)) prattRules.add(rule.name); @@ -98,11 +116,17 @@ function analyze(grammar: CstGrammar) { function classifyAlts(rule: RuleDecl) { const alts = rule.body.type === 'alt' ? rule.body.items : [rule.body]; const nuds: RuleExpr[] = []; - const leds: { expr: RuleExpr; items: RuleExpr[] }[] = []; + const leds: { expr: RuleExpr; items: RuleExpr[]; notLeftLeaf?: string[] }[] = []; for (const alt of alts) { const items = alt.type === 'seq' ? alt.items : [alt]; - if (items[0]?.type === 'ref' && items[0].name === rule.name) leds.push({ expr: alt, items: items.slice(1) }); - else nuds.push(alt); + // A LED arm may carry a leading `notLeftLeaf(...)` head-leaf guard before the self `$` + // (`[notLeftLeaf('void',…), $, '.', Ident]`). Strip it into LED metadata; the self-ref is + // then the next item and `led.items` is everything after it — identical to a plain LED. + const guard = items[0]?.type === 'notLeftLeaf' ? items[0].words : undefined; + const head = guard ? 1 : 0; + if (items[head]?.type === 'ref' && (items[head] as { name: string }).name === rule.name) { + leds.push({ expr: alt, items: items.slice(head + 1), notLeftLeaf: guard }); + } else nuds.push(alt); } return { nuds, leds }; } @@ -110,18 +134,26 @@ function analyze(grammar: CstGrammar) { const alts = rule.body.type === 'alt' ? rule.body.items : [rule.body]; const atoms: RuleExpr[] = []; const continuations: RuleExpr[][] = []; + const contNotLeftLeaf: (string[] | null)[] = []; for (const alt of alts) { const items = alt.type === 'seq' ? alt.items : [alt]; - if (items[0]?.type === 'ref' && items[0].name === rule.name) continuations.push(items.slice(1)); - else atoms.push(alt); + // A continuation may carry a leading `notLeftLeaf(...)` head-leaf guard before the self `$`. + // Strip it into per-continuation metadata; the self-ref is the next item. + const guard = items[0]?.type === 'notLeftLeaf' ? items[0].words : undefined; + const head = guard ? 1 : 0; + if (items[head]?.type === 'ref' && (items[head] as { name: string }).name === rule.name) { + continuations.push(items.slice(head + 1)); + contNotLeftLeaf.push(guard ?? null); + } else atoms.push(alt); } - return { atoms, continuations }; + return { atoms, continuations, contNotLeftLeaf }; } function isLeftRecursive(rule: RuleDecl): boolean { const alts = rule.body.type === 'alt' ? rule.body.items : [rule.body]; return alts.some(alt => { const items = alt.type === 'seq' ? alt.items : [alt]; - return items[0]?.type === 'ref' && items[0].name === rule.name; + const head = items[0]?.type === 'notLeftLeaf' ? 1 : 0; + return items[head]?.type === 'ref' && (items[head] as { name: string }).name === rule.name; }); } @@ -161,13 +193,14 @@ function analyze(grammar: CstGrammar) { // Access-tail + tail-closing LED classification (Pratt). // Returns, per Pratt rule, parallel arrays of flags aligned to the leds array. - const ledMeta = new Map(); + const ledMeta = new Map(); for (const [ruleName, { leds }] of prattClassified.entries()) { const accessTail: boolean[] = []; const tailClosing: boolean[] = []; const mixfix: (MixfixInfo | null)[] = []; const first: FirstTok[] = []; const prec: ({ lbp: number; rhsBp: number | null } | null)[] = []; + const notLeftLeaf: (string[] | null)[] = []; for (const led of leds) { const it = led.items; let isAccessTail = false, isTailClosing = false; @@ -191,8 +224,29 @@ function analyze(grammar: CstGrammar) { } } prec.push(lp); + notLeftLeaf.push(led.notLeftLeaf ?? null); } - ledMeta.set(ruleName, { accessTail, tailClosing, mixfix, first, prec }); + ledMeta.set(ruleName, { accessTail, tailClosing, mixfix, first, prec, notLeftLeaf }); + } + + // Capped-NUD classification (Pratt). A NUD alternative wrapped in a `cap`-group is a + // complete assignment-level expression (an ArrowFunction — the lowest-precedence + // AssignmentExpression): it parses only when minBp is LOOSER than the named connector's + // binding power (so it is refused as the operand of any tighter operator, e.g. + // `a || () => {}`), and once parsed it admits NO led (so `() => {} || a` leaves `|| a` + // unconsumed and the parse rejects). `cap[i]` is the binding-power threshold for nud i + // (null = uncapped). The connector's lbp resolves from the ladder or the ledPrec table. + const connectorLbp = (connector: string): number => { + const op = opTable.get(connector); + if (op) return op.lbp; + const lp = ledPrecByConnector.get(connector); + if (lp) return lp.lbp; + throw new Error(`capExpr: connector ${JSON.stringify(connector)} is not a ladder operator or ledPrec connector`); + }; + const nudCap = new Map(); + for (const [ruleName, { nuds }] of prattClassified.entries()) { + nudCap.set(ruleName, nuds.map(nud => + nud.type === 'group' && nud.capBelow !== undefined ? connectorLbp(nud.capBelow) : null)); } // Left-rec continuation mixfix. @@ -285,7 +339,7 @@ function analyze(grammar: CstGrammar) { if (kws) pending = pending ? new Set([...pending, ...kws]) : kws; continue; } - if (item.type === 'op' || item.type === 'postfix' || item.type === 'sameLine' || item.type === 'noCommentBefore' || item.type === 'noMultilineFlowBefore') continue; + if (item.type === 'op' || item.type === 'postfix' || item.type === 'sameLine' || item.type === 'noCommentBefore' || item.type === 'noMultilineFlowBefore' || item.type === 'notLeftLeaf') continue; const f = exprFirst(item); if (f === null) return null; for (const k of f) { @@ -306,7 +360,7 @@ function analyze(grammar: CstGrammar) { return acc; } case 'quantifier': case 'group': return exprFirst(e.body); - case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': return new Set(); + case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': case 'notLeftLeaf': return new Set(); case 'sep': return exprFirst(e.element); default: return null; } @@ -365,7 +419,7 @@ function analyze(grammar: CstGrammar) { const acc = new Set(); for (const item of e.items) { if (item.type === 'prefix') return null; - if (item.type === 'op' || item.type === 'postfix' || item.type === 'not' || item.type === 'sameLine' || item.type === 'noCommentBefore' || item.type === 'noMultilineFlowBefore') continue; + if (item.type === 'op' || item.type === 'postfix' || item.type === 'not' || item.type === 'sameLine' || item.type === 'noCommentBefore' || item.type === 'noMultilineFlowBefore' || item.type === 'notLeftLeaf') continue; const f = exprFirstPlain(item); if (f === null) return null; for (const k of f) acc.add(k); @@ -383,7 +437,7 @@ function analyze(grammar: CstGrammar) { return acc; } case 'quantifier': case 'group': return exprFirstPlain(e.body); - case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': return new Set(); + case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': case 'notLeftLeaf': return new Set(); case 'sep': return exprFirstPlain(e.element); default: return null; } @@ -407,7 +461,7 @@ function analyze(grammar: CstGrammar) { const acc = new Set(); for (let i = j; i < items.length; i++) { const item = items[i]; - if (item.type === 'not' || item.type === 'sameLine' || item.type === 'noCommentBefore' || item.type === 'noMultilineFlowBefore') continue; + if (item.type === 'not' || item.type === 'sameLine' || item.type === 'noCommentBefore' || item.type === 'noMultilineFlowBefore' || item.type === 'notLeftLeaf') continue; if (item.type === 'op' || item.type === 'postfix') { for (const k of opKeys) acc.add(k); return acc; } if (item.type === 'prefix') { for (const k of prefixOps.keys()) acc.add(k); return acc; } const f = exprFirstPlain(item); @@ -420,7 +474,7 @@ function analyze(grammar: CstGrammar) { function suffixNullable(items: RuleExpr[], j: number): boolean { for (let i = j; i < items.length; i++) { const item = items[i]; - if (item.type === 'not' || item.type === 'sameLine' || item.type === 'noCommentBefore' || item.type === 'noMultilineFlowBefore') continue; + if (item.type === 'not' || item.type === 'sameLine' || item.type === 'noCommentBefore' || item.type === 'noMultilineFlowBefore' || item.type === 'notLeftLeaf') continue; if (item.type === 'op' || item.type === 'prefix' || item.type === 'postfix') return false; if (!exprNullable(item)) return false; } @@ -438,7 +492,7 @@ function analyze(grammar: CstGrammar) { const items = e.items; for (let i = 0; i < items.length; i++) { const item = items[i]; - if (item.type === 'not' || item.type === 'sameLine' || item.type === 'noCommentBefore' || item.type === 'noMultilineFlowBefore') continue; + if (item.type === 'not' || item.type === 'sameLine' || item.type === 'noCommentBefore' || item.type === 'noMultilineFlowBefore' || item.type === 'notLeftLeaf') continue; let isec: Sec; let itemNullable: boolean; if (item.type === 'op' || item.type === 'postfix' || item.type === 'prefix') { @@ -490,7 +544,7 @@ function analyze(grammar: CstGrammar) { if (sec.len1) acc.add(e.delimiter); return { s: acc, len1: sec.len1 }; } - case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': + case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': case 'notLeftLeaf': return { s: new Set(), len1: false }; case 'op': case 'prefix': case 'postfix': return { s: new Set(), len1: true }; @@ -604,16 +658,18 @@ function analyze(grammar: CstGrammar) { // is >= NAMED_MIN (behaves as "a named token" for the keyword-by-text branch) yet // collides with NO real token-name kind (so matchToken(name) never false-matches it). const KIND_NAMED_FALLBACK = nextKind; + typeKind.set('$error', KIND_NAMED_FALLBACK); const symtab = { KIND_PUNCT, KIND_TEMPLATE_HEAD, KIND_NAMED_MIN, KIND_NAMED_FALLBACK, typeKind, kwLitKind, puLitKind, classifyKey, }; return { - grammar, tokenNames, opTable, prefixOps, noUnaryLhsOps, postfixOpValues, + grammar, tokenNames, opTable, prefixOps, noUnaryLhsOps, postfixOpValues, requireTargetOps, binaryConnectors, prattRules, leftRecSet, ruleByName, prattClassified, leftRecClassified, maxBp, templateTokenName, templateTokenNames, firstTokenOf, altDeepFirst, altNullable, - altSecond, ledMeta, contMeta, nullableRules, firstSets, symtab, qualKeys, + altSecond, ledMeta, contMeta, nudCap, nullableRules, firstSets, symtab, qualKeys, + exprFirst, exprNullable, }; } @@ -715,7 +771,7 @@ class Emitter { // The run-extension target of a repetition: when the body unwraps to a plain ref of // a rule that routes through parseRuleEntry (pratt / left-rec / spine), its rule id; // else -1 (the loop gets no extension hook — adoption stays element-by-element). - quantRunRuleId(body: RuleExpr): number { + quantRunInfo(body: RuleExpr): { rid: number; name: string } | null { const a = this.a; let expr = body; while (true) { @@ -726,10 +782,52 @@ class Emitter { } break; } - if (expr.type !== 'ref' || !a.ruleByName.has(expr.name)) return -1; + if (expr.type !== 'ref' || !a.ruleByName.has(expr.name)) return null; const name = expr.name; - if (!(a.prattRules.has(name) || a.leftRecSet.has(name) || this.spineSet().has(name))) return -1; - return a.grammar.rules.findIndex(r => r.name === name); + if (!(a.prattRules.has(name) || a.leftRecSet.has(name) || this.spineSet().has(name))) return null; + const rid = a.grammar.rules.findIndex(r => r.name === name); + return rid >= 0 ? { rid, name } : null; + } + quantRunRuleId(body: RuleExpr): number { + const info = this.quantRunInfo(body); + return info === null ? -1 : info.rid; + } + // Recovery hooks stay at SPINE-SHAPED repetitions (a plain rule ref or an + // alt of rule refs — statement/member lists): hooking expression-internal + // repetitions lets a bar-armed absorption fire inside longest-match arm probing, + // which distorts arm selection and cascades (measured: 273 errors for one broken + // identifier). An unhooked inner failure escalates to the nearest hooked list, + // which absorbs at statement granularity. + quantRecoverFirst(body: RuleExpr): Set | null { + const a = this.a; + const unwrap = (x: RuleExpr): RuleExpr => { + while (true) { + if (x.type === 'group' && !(x.suppress && x.suppress.length)) { x = x.body; continue; } + if (x.type === 'seq') { + const real = x.items.filter(it => it.type !== 'op' && it.type !== 'prefix' && it.type !== 'postfix'); + if (real.length === 1) { x = real[0]; continue; } + } + return x; + } + }; + const expr = unwrap(body); + const refFirst = (x: RuleExpr): Set | null => { + if (x.type !== 'ref' || !a.ruleByName.has(x.name)) return null; + if (a.nullableRules.has(x.name)) return null; + const fs = a.firstSets.get(x.name); + return fs && fs.size > 0 ? fs : null; + }; + if (expr.type === 'ref') return refFirst(expr); + if (expr.type === 'alt') { + const u = new Set(); + for (const item of expr.items) { + const fs = refFirst(unwrap(item)); + if (fs === null) return null; + for (const k of fs) u.add(k); + } + return u.size > 0 ? u : null; + } + return null; } /** @@ -809,7 +907,9 @@ class Emitter { const a = this.a; switch (expr.type) { case 'literal': { - return `if (!${this.matchLiteralCall(expr.value)}) { ${onFail} }`; + const vs = this.vsetNext; + this.vsetNext = 0; + return `if (!${this.matchLiteralCall(expr.value, vs)}) { ${onFail} }`; } case 'ref': { if (a.tokenNames.has(expr.name)) { @@ -832,9 +932,16 @@ class Emitter { // flattened inline too — its failure restores to the SAME save point (the whole // matcher fn's _save), exactly like matchSeq's single saved/restore. const parts: string[] = []; - for (const item of expr.items) { + for (let i = 0; i < expr.items.length; i++) { + const item = expr.items[i]; if (item.type === 'op' || item.type === 'prefix' || item.type === 'postfix') continue; + if (item.type === 'quantifier') { + const nx = expr.items[i + 1]; + this.quantFollowT = nx !== undefined && nx.type === 'literal' ? this.litT(nx.value) : -1; + } + if (item.type === 'literal') this.vsetNext = this.vsetFor(expr.items, i); parts.push(this.matchInto(item, onFail)); + this.quantFollowT = -1; } return parts.join('\n'); } @@ -851,7 +958,11 @@ class Emitter { return lines.join('\n'); } case 'quantifier': - return this.matchQuantifierInto(expr.body, expr.kind, onFail); + { + const closerT = this.quantFollowT; + this.quantFollowT = -1; + return this.matchQuantifierInto(expr.body, expr.kind, onFail, closerT); + } case 'group': { // A suppress-carrying group stages the LED-connector exclusion for the next // parseRule, then matches its body (same as matchExpr 'group'). @@ -870,7 +981,7 @@ class Emitter { } const save = this.id(), sn = this.id(), fn = this.matchFn(expr.body), m = this.id(); return [ - `{ const ${save} = pos; const ${sn} = scn; const ${m} = ${fn}(); pos = ${save}; scn = ${sn};`, + `{ const ${save} = pos; const ${sn} = scn; probing++; const ${m} = ${fn}(); probing--; pos = ${save}; scn = ${sn};`, ` if (${m}) { ${onFail} } }`, ].join('\n'); } @@ -880,6 +991,11 @@ class Emitter { return `if (!(pos < cap && (tkFl[pos] & 2) === 0)) { ${onFail} }`; case 'noMultilineFlowBefore': return `if (!(pos < cap && (tkFl[pos] & 4) === 0)) { ${onFail} }`; + case 'notLeftLeaf': + // The head-leaf LED gate is applied in the Pratt LED loop (not here); the marker is + // stripped from the LED arm's items, so it never reaches the matcher. As a leaf-position + // no-op it consumes nothing and succeeds (matches the empty string). + return ``; case 'sep': return this.matchSepInto(expr.element, expr.delimiter, onFail); default: @@ -890,26 +1006,95 @@ class Emitter { // Quantifier: body is matched via a helper fn (pushes + boolean), so the loop here // uses `return`/`break` only against ITS OWN while — no nested-loop hazard. - private matchQuantifierInto(body: RuleExpr, kind: '*' | '+' | '?', onFail: string): string { + private quantFollowT = -1; + litT(value: string): number { return -1; } // bound by emitParser to the punct-literal table + + // ── Viable-set companions (diagnostics) ── + // For a REQUIRED literal C in a seq, the literals PROVABLY still accepted when + // C's matcher fails: walking backward from C, a repetition ('*'/'+') is always + // re-enterable so its nullable-prefix-reachable literals stay viable; nullable + // one-shot items ('?' optionals, nullable groups, sep, zero-width markers) are + // crossed but contribute nothing (they may already have consumed their match); + // the first non-nullable item stops the walk. "expected ',' or ']'" therefore + // never names an impossible continuation — unlike a static FIRST union, which + // after `[1, 2` would still claim an expression. Each distinct message gets one + // id, threaded through the matcher into the $missing row (settle decodes it). + private vsetNext = 0; + vsetMsgs: string[] = ['']; + private vsetIds = new Map(); + private nullPrefixLits(x: RuleExpr, acc: Set): boolean { // → nullable (crossable)? + switch (x.type) { + case 'literal': acc.add(x.value); return false; + case 'seq': { for (const it of x.items) if (!this.nullPrefixLits(it, acc)) return false; return true; } + case 'group': return this.nullPrefixLits(x.body, acc); + case 'quantifier': { this.nullPrefixLits(x.body, acc); return x.kind !== '+'; } + case 'alt': { let all = true; for (const it of x.items) if (!this.nullPrefixLits(it, acc)) all = false; return all; } + case 'ref': return false; // conservative: treat rules as non-nullable + case 'sep': return true; + default: return true; // zero-width markers / Pratt position markers + } + } + private vsetFor(items: RuleExpr[], k: number): number { + const item = items[k]; + if (item.type !== 'literal') return 0; + const comp = new Set(); + for (let j = k - 1; j >= 0; j--) { + const pj = items[j]; + if (pj.type === 'op' || pj.type === 'prefix' || pj.type === 'postfix') continue; + if (pj.type === 'quantifier' && pj.kind !== '?') { this.nullPrefixLits(pj.body, comp); continue; } + if (pj.type === 'quantifier' || pj.type === 'sep' || pj.type === 'not' || pj.type === 'sameLine' || pj.type === 'noCommentBefore') continue; + if (pj.type === 'group' && this.nullPrefixLits(pj.body, new Set())) continue; + break; + } + comp.delete(item.value); + if (comp.size === 0) return 0; + const msg = [...comp, item.value].map(v => "'" + v + "'").join(' or '); + let id = this.vsetIds.get(msg); + if (id === undefined) { id = this.vsetMsgs.length; this.vsetMsgs.push(msg); this.vsetIds.set(msg, id); } + return id; + } + private matchQuantifierInto(body: RuleExpr, kind: '*' | '+' | '?', onFail: string, closerT = -1): string { const fn = this.matchFn(body); if (kind === '?') { - // Try once; on failure the helper restored pos/scn itself. - return `${fn}();`; + // Try once; on failure the helper restored pos/scn itself. The probe guard + // keeps synthesis out of UNCOMMITTED optional paths, tsc-style: before the + // group consumes a real token its failure is free (no synthesis); once it + // has consumed (pos > probeBase) the group is committed — 'const a = ;' + // must synthesize the initializer Expr, not drop the whole '= Expr' group. + return `{ const _pb = probeBase; probeBase = pos; ${fn}(); probeBase = _pb; }`; } // Run-extension: after an iteration whose element was ADOPTED from the old tree, // bulk-adopt its following old siblings (runExtend) instead of re-entering the // rule machinery once per element. Only loops over a parseRuleEntry-routed rule // get the hook, and runExtend re-checks rid + generation, so an inner rule's // adoption can never feed elements into an outer loop. - const runId = this.quantRunRuleId(body); + // + // The same loops are the RECOVERY sync points: in recovering mode (second pass, + // entered only after the strict parse rejected) a failing element absorbs tokens + // into an $error node up to the element's FIRST set / a closer / EOF and the + // loop continues — strict-mode behavior is byte-identical (the hook is gated on + // `recovering`, and a SUCCEEDING rule parses identically in both modes). + const runInfo = this.quantRunInfo(body); + const runId = runInfo === null ? -1 : runInfo.rid; const ext = runId >= 0 ? `\n if (adoptRunPos === pos) runExtend(${runId});` : ''; + const recFirst = this.quantRecoverFirst(body); + const csFn = recFirst !== null ? this.membershipFn(recFirst) : 'null'; + // The element's LEADING token is the loop's continuation decision — its + // failure is a normal list end, so synthesis is suppressed until the element + // commits (consumes past the iteration start): rep(seq(',', Expr)) must not + // mint a phantom ',' to keep the list going, but once the real ',' is there + // a missing Expr synthesizes (tsc list-element semantics). Same commitment + // device as the optional-probe guard, staged inline (hot loop — no closure). + const failFor = (beforeV: string, bsnV: string) => recFirst !== null + ? `const ${beforeV}_pb = probeBase; probeBase = pos; const ${beforeV}_fm = frameMax; frameMax = pos; const ${beforeV}_ok = ${fn}(); probeBase = ${beforeV}_pb; const ${beforeV}_re = frameMax; if (${beforeV}_fm > frameMax) frameMax = ${beforeV}_fm;\n if (!${beforeV}_ok) { if (!recovering || !recoverSkip(${csFn}, ${closerT}, ${beforeV}, ${beforeV}_re)) break; continue; }\n if (recovering && pos === ${beforeV}) { scn = ${bsnV}; if (!recoverSkip(${csFn}, ${closerT}, ${beforeV}, ${beforeV}_re)) break; continue; }` + : `const ${beforeV}_pb = probeBase; probeBase = pos; const ${beforeV}_ok = ${fn}(); probeBase = ${beforeV}_pb;\n if (!${beforeV}_ok) break;`; if (kind === '*') { const before = this.id(), bsn = this.id(); return [ `while (true) {`, ` const ${before} = pos; const ${bsn} = scn;`, - ` if (!${fn}()) break;`, - ` if (pos === ${before} && scn === ${bsn}) break;` + ext, + ` ${failFor(before, bsn)}`, + ` if (pos === ${before}) { scn = ${bsn}; break; }` + ext, `}`, ].join('\n'); } @@ -919,8 +1104,8 @@ class Emitter { `if (!${fn}()) { ${onFail} }`, `while (true) {`, ` const ${before} = pos; const ${bsn} = scn;`, - ` if (!${fn}()) break;`, - ` if (pos === ${before} && scn === ${bsn}) break;` + ext, + ` ${failFor(before, bsn)}`, + ` if (pos === ${before}) { scn = ${bsn}; break; }` + ext, `}`, ].join('\n'); } @@ -933,7 +1118,7 @@ class Emitter { return [ `if (${fn}()) {`, ` while (true) {`, - ` const _ds = pos; if (!${this.matchLiteralCall(delimiter)}) { pos = _ds; break; }`, + ` const _ds = pos; probing++; const _dm = ${this.matchLiteralCall(delimiter)}; probing--; if (!_dm) { pos = _ds; break; }`, ` if (!${fn}()) break;`, ` }`, `}`, @@ -950,7 +1135,11 @@ class Emitter { if (!fs || fs.size === 0) return ''; // ruleMightStart: true iff some key in fs matches peek(); guard = NOT that. The set // is baked as a per-set membership fn over two byte tables (see membershipFn). - return `!${this.membershipFn(fs)}(pos)`; + // Recovering runs skip the guard: at a bar the next token is exactly what CANNOT + // start the rule, and the missing-nonterminal hook lives inside parseRuleEntry — + // a pre-call rejection would silence it ('a, ;' must mint the Expr, not end the + // list). Strict pays one global read only when the guard would fail anyway. + return `(!${this.membershipFn(fs)}(pos) && !recovering)`; } // Deep per-alternative dispatch condition (mirrors gen-parser.ts altMightStart): the @@ -1194,10 +1383,13 @@ class Emitter { // ── Lever 1 emit helpers ── // Specialized literal matcher call: keyword → matchKwLit, punct → matchPuLit, each // with the value's baked int (so the runtime does int compares, not string work). - matchLiteralCall(value: string): string { + // vs > 0 = this call site's viable-set id (companion literals provably still + // accepted when the match fails — threaded into the synthesized $missing row). + matchLiteralCall(value: string, vs = 0): string { const d = this.a.symtab.classifyKey(value); - if (d.kind === 'kw') return `matchKwLit(${d.t})`; - if (d.kind === 'punct') return value === '>' ? `matchPuLitGT(${d.t})` : `matchPuLit(${d.t})`; + const va = vs > 0 ? `, ${vs}` : ''; + if (d.kind === 'kw') return `matchKwLit(${d.t}${va})`; + if (d.kind === 'punct') return value === '>' ? `matchPuLitGT(${d.t}${va})` : `matchPuLit(${d.t}${va})`; // A literal key that classifies as a token-name (a token name used as a literal): // unreachable for real grammars, but stay safe via the generic matchLiteral. return `matchLiteral(${J(value)})`; @@ -1212,8 +1404,15 @@ class Emitter { // ── Top-level emit ── export function emitParser(grammar: CstGrammar): string { + // [Await]/[Yield] context: name-fork the body-reachable rule closure into $A/$Y/$AY + // families (see await-yield-fork.ts). No-op for a grammar with no ctx markers. Done + // HERE (not at grammar export) so the forks exist ONLY in the parser's rule identity + // / memo / adoption space; the derived-artifact generators see the base grammar with + // the (transparent-group) markers and emit byte-identically. + grammar = withAwaitYield(grammar); const a = analyze(grammar); const e = new Emitter(a); + e.litT = (v: string) => a.symtab.puLitKind.get(v) ?? -1; const entry = findEntryRule(grammar); // Grammar-lite for the lexer: ONLY what createLexer reads (tokens, precs, the @@ -1312,7 +1511,60 @@ export function emitParser(grammar: CstGrammar): string { } e.emit(`const NOUNARY_T = Uint8Array.from([${nu.join(',')}]);`); } + // Ops whose operand must be a valid assignment target (LHS) — byte-table for the LED + // dispatch (a token's t equals an op value iff its t-int matches — vocabulary). + { + let tSize = 1; + for (const v of st.kwLitKind.values()) tSize = Math.max(tSize, v + 1); + for (const v of st.puLitKind.values()) tSize = Math.max(tSize, v + 1); + const rt = new Array(tSize).fill(0); + for (const v of a.requireTargetOps) { + const d = st.classifyKey(v); + if (d.kind !== 'tok' && d.t > 0) rt[d.t] = 1; + } + e.emit(`const REQTGT_T = Uint8Array.from([${rt.join(',')}]);`); + } e.emit(`const postfixOpValues = new Set(${J([...a.postfixOpValues])});`); + e.emit(`const binaryConnectors = new Set(${J([...a.binaryConnectors])});`); + // Assignment-target shape test (ECMAScript AssignmentTargetType): a node id is NOT a + // valid LHS target iff its outermost form is a prefix-op (prefix-unary OR prefix-update + // `++x`) — head kid is an operator-tag leaf in prefixOps — or a postfix-update (`x++`) — + // tail kid is an operator-tag leaf in postfixOpValues. A parenthesized cover / member / + // element / call / non-null tail has no operator-tag leaf at head or tail, so it passes. + e.emit(`function _notTarget(lhs) {`); + e.emit(` const n = rowCount[lhs]; if (n === 0) return false;`); + e.emit(` const cs = rowStart[lhs];`); + e.emit(` const _h = kids[cs];`); + e.emit(` if (_h < 0 && ((~_h) & 3) === 2) {`); + e.emit(` const _ht = absTok[lhs] + ((~_h) >>> 2);`); + e.emit(` if (prefixOps.has(${e.soa ? 'docText(toff(_ht), tend(_ht))' : 'tkText[_ht]'})) return true;`); + e.emit(` }`); + e.emit(` const _t = kids[cs + n - 1];`); + e.emit(` if (_t < 0 && ((~_t) & 3) === 2) {`); + e.emit(` const _tt = absTok[lhs] + ((~_t) >>> 2);`); + e.emit(` if (postfixOpValues.has(${e.soa ? 'docText(toff(_tt), tend(_tt))' : 'tkText[_tt]'})) return true;`); + e.emit(` }`); + // a binary / relational / conditional expression (`a + b`, `a in b`, `a as T`, …) is not a + // LeftHandSideExpression: its MIDDLE child is a binary connector leaf. (Member `a.b` / + // element `a[b]` have a PUNCT leaf there, a parenthesized cover has a NODE child, so those + // pass — `(a + b) = c` via the cover is correctly accepted, like tsc.) + e.emit(` if (n >= 3) { const _m = kids[cs + 1]; if (_m < 0) { const _mt = absTok[lhs] + ((~_m) >>> 2); if (binaryConnectors.has(${e.soa ? 'docText(toff(_mt), tend(_mt))' : 'tkText[_mt]'})) return true; } }`); + e.emit(` return false;`); + e.emit(`}`); + // Head-leaf TEXT of a node: descend the LEFTMOST-child spine to the OUTERMOST leaf and return its + // token text (the SAME head-leaf the _notTarget gate reads, generalized to recurse through child + // nodes). Drives the notLeftLeaf LED gate: a node whose head leaf text is in the arm's word set + // (e.g. `void`/`null`/`this` for the type `.` qualification) is not a valid LEFT operand of the + // arm. A childless ($missing recovery) node returns '' (matches no word → the arm is not blocked). + e.emit(`function _headLeafText(id) {`); + e.emit(` while (rowCount[id] > 0) {`); + e.emit(` const _hh = kids[rowStart[id]];`); + e.emit(` if (_hh >= 0) { id = _hh; continue; }`); + e.emit(` const _ht = absTok[id] + ((~_hh) >>> 2);`); + e.emit(` return ${e.soa ? 'docText(toff(_ht), tend(_ht))' : 'tkText[_ht]'};`); + e.emit(` }`); + e.emit(` return '';`); + e.emit(`}`); e.emit(`const tokenNames = new Set(${J([...a.tokenNames])});`); e.emit(`const templateTokenNames = new Set(${J([...a.templateTokenNames])});`); e.emit(`const templateTokenName = ${J(a.templateTokenName ?? null)};`); @@ -1320,8 +1572,24 @@ export function emitParser(grammar: CstGrammar): string { e.emit(`const ENTRY = ${J(entry)};`); // Rule-name table: rowRule stores the index; '$template' takes the slot after the // declared rules (parseTemplateExpr's synthetic node). - e.emit(`const RULE_NAMES = ${J([...grammar.rules.map(r => r.name), '$template'])};`); + e.emit(`const RULE_NAMES = ${J([...grammar.rules.map(r => r.name), '$template', '$error', '$missing'])};`); + // DISPLAY names: an [Await]/[Yield] fork (RuleDecl.canon set) keeps its distinct + // RULE_NAMES entry for memo/adoption rule identity, but REPORTS its base name as the + // node's rule name so trees stay byte-identical to the base grammar. Identical to + // RULE_NAMES when no rule is forked (the common case). + e.emit(`const RULE_DISPLAY = ${J([...grammar.rules.map(r => r.canon ?? r.name), '$template', '$error', '$missing'])};`); e.emit(`const RID_TEMPLATE = ${grammar.rules.length};`); + e.emit(`const RID_ERROR = ${grammar.rules.length + 1};`); + e.emit(`const RID_MISSING = ${grammar.rules.length + 2};`); + { + // literal-int → text (for "expected 'x'" diagnostics on $missing rows) + const inv: string[] = []; + for (const [txt, t] of a.symtab.kwLitKind) inv[t] = txt; + for (const [txt, t] of a.symtab.puLitKind) inv[t] = txt; + e.emit(`const LIT_NAMES = ${J(Array.from(inv, (x) => x ?? ''))};`); + } + // (recovery sync closers are threaded per-loop from the enclosing seq — see + // quantFollowT; a global closer table froze top-level recovery at any ']'.) e.emit(`const prattRuleNames = new Set(${J([...a.prattRules])});`); // The expression rule the template-interpolation fallback (findExprRule) picks: // first pratt rule that isn't Type, in declaration order. Bake the resolved name. @@ -1527,6 +1795,12 @@ let rowKC = new Uint8Array(8192); // eagerly). rowNF = first kid index (absolute, like rowStart) that may hold an // end-relative value; batch parses never flip, so the decode branch never fires. let rowNF = new Int32Array(8192).fill(0x7fffffff); +// recovery-made bit: the row was memoized during a RECOVERING parse while recovery +// candidates were being created under it — its subtree may contain $error rows, so +// a STRICT pass must not adopt it (an adopted error region would let a strict pass +// 'succeed' over broken text and wipe its diagnostics). Recovering passes adopt +// these rows freely. +let rowRM = new Uint8Array(8192); function ktr(p, k) { const v = kidTokRel[k]; return v < 0 ? v + rowTokLen[p] + 1 : v; } function kcr(p, k) { const v = kidRel[k]; return v < 0 ? v + rowLen[p] + 1 : v; } // transient BUILD coordinates (absolute), valid for rows completed in the current @@ -1561,6 +1835,7 @@ function growRows() { const ok = new Uint8Array(rowCap); ok.set(rowOK); rowOK = ok; const kc = new Uint8Array(rowCap); kc.set(rowKC); rowKC = kc; const nf = new Int32Array(rowCap).fill(0x7fffffff); nf.set(rowNF.subarray(0, nodeN)); rowNF = nf; + const rm = new Uint8Array(rowCap); rm.set(rowRM.subarray(0, nodeN)); rowRM = rm; const ac = new Int32Array(rowCap); ac.set(absChar); absChar = ac; const at = new Int32Array(rowCap); at.set(absTok); absTok = at; } @@ -1615,10 +1890,26 @@ function finishNode(rid, mark) { } rowRule[id] = rid; rowLen[id] = myEnd - myOff; rowCount[id] = n; rowTokLen[id] = myTokEnd - myTok; - rowExt[id] = maxPos - myTok; + rowExt[id] = frameMax - myTok; rowOK[id] = 0; rowKC[id] = 0; rowNF[id] = 0x7fffffff; + rowRM[id] = 0; + // recovery-made propagation: STRUCTURAL, bitwise — bit 1: a kid is (or contains) + // an $error row; bit 2: a kid's result is context-tainted (the cycle sentinel) + // and must not be reused outside its own parse. Batch parses never enter this. + if (recovering) { + const ke = rowStart[id] + rowCount[id]; + let rm = 0; + for (let i2 = rowStart[id]; i2 < ke; i2++) { + const e2 = kids[i2]; + if (e2 >= 0) { + rm |= rowRM[e2] | (rowRule[e2] >= RID_ERROR ? 1 : 0); + if (rm === 3) break; + } + } + rowRM[id] = rm; + } absChar[id] = myOff; absTok[id] = myTok; scn = mark; return id; @@ -1651,10 +1942,26 @@ function finishWrap(rid, lhsId, mark) { rowRule[id] = rid; rowLen[id] = myEnd - myOff; rowStart[id] = ks; rowCount[id] = n + 1; rowTokLen[id] = myTokEnd - myTok; - rowExt[id] = maxPos - myTok; + rowExt[id] = frameMax - myTok; rowOK[id] = 0; rowKC[id] = 0; rowNF[id] = 0x7fffffff; + rowRM[id] = 0; + // recovery-made propagation: STRUCTURAL, bitwise — bit 1: a kid is (or contains) + // an $error row; bit 2: a kid's result is context-tainted (the cycle sentinel) + // and must not be reused outside its own parse. Batch parses never enter this. + if (recovering) { + const ke = rowStart[id] + rowCount[id]; + let rm = 0; + for (let i2 = rowStart[id]; i2 < ke; i2++) { + const e2 = kids[i2]; + if (e2 >= 0) { + rm |= rowRM[e2] | (rowRule[e2] >= RID_ERROR ? 1 : 0); + if (rm === 3) break; + } + } + rowRM[id] = rm; + } absChar[id] = myOff; absTok[id] = myTok; scn = mark; return id; @@ -1663,6 +1970,19 @@ function finishWrap(rid, lhsId, mark) { // ── per-parse state (module-level closures, reset by parse()) ── let pos = 0; let maxPos = 0; +// Cap-propagation flag (capExpr): set true when a pratt call returns a CAPPED +// assignment-level expression (an ArrowFunction), so an enclosing operator LED can refuse +// to continue it (in a = ()=>{} || x the assignment RHS is a capped arrow, so the || must +// not attach to the assignment; it stays unconsumed and the parse rejects). Reset at each +// capped-rule pratt entry; read by the op LED right after parsing its RHS. +let _prattCapped = false; +// Frame-LOCAL advance watermark: reach of the CURRENT rule frame (reset to the +// frame's start at parseRuleEntry, folded back into the parent on exit). Keeps +// rowExt/memo watermarks EXACT — the global maxPos contaminates them with probes +// from earlier siblings, and recovery-bar minting (bar = strict-fail maxPos) must +// be identical between a fresh parse and an adoption re-run. frameMax <= maxPos +// always, so the hot advance pays one extra compare only at frontier breaches. +let frameMax = 0; let memoNode = []; let memoEnd = []; let memoExt = []; // per-entry lookahead extent (see parseRuleEntry) @@ -1691,31 +2011,32 @@ function offset() { // Keyword literal: the interpreter required tok.type !== '' && tokenNames.has(tok.type) // && tok.text === value. With interned kinds that is tok.k >= K_NAMED_MIN (a declared // token name; '' is PUNCT, templates are below NAMED_MIN) && tok.t === KW(value). -function matchKwLit(kw) { +function matchKwLit(kw, vs) { // A kw-range t can only come from a named token (template spans never intern to a // keyword), so the old k >= K_NAMED_MIN guard was redundant — one int compare. - if (pos >= cap || tkT[pos] !== kw) return false; + // vs (optional) = the call site's viable-set id, threaded into the $missing row. + if (pos >= cap || tkT[pos] !== kw) return recovering ? missTok(kw, vs) : false; scPush(~((pos << 2) | 1)); - if (++pos > maxPos) maxPos = pos; + if (++pos > frameMax) { frameMax = pos; if (pos > maxPos) maxPos = pos; } return true; } // Punct literal: tok.type === '' && tok.text === value, with the gt-splice fallback. // tok.t === PU(value) is the exact-text fast path; the splice handles a longer // gt-led token matching the gt key. value/pu are baked by the caller. -function matchPuLit(pu) { +function matchPuLit(pu, vs) { // A pu-range t can only come from a punct token, so the old k === K_PUNCT guard was // redundant — one int compare. The '>'-split lives only in matchPuLitGT ('>' sites). - if (pos >= cap || tkT[pos] !== pu) return false; + if (pos >= cap || tkT[pos] !== pu) return recovering ? missTok(pu, vs) : false; scPush(~(pos << 2)); - if (++pos > maxPos) maxPos = pos; + if (++pos > frameMax) { frameMax = pos; if (pos > maxPos) maxPos = pos; } return true; } -function matchPuLitGT(pu) { +function matchPuLitGT(pu, vs) { if (pos >= cap) return false; const off = toff(pos); if (tkT[pos] === pu) { scPush(~(pos << 2)); - if (++pos > maxPos) maxPos = pos; + if (++pos > frameMax) { frameMax = pos; if (pos > maxPos) maxPos = pos; } return true; } // Split multi-'>' tokens: '>>', '>>>', '>>=', '>>>=' can yield a single '>': shift the @@ -1726,6 +2047,12 @@ function matchPuLitGT(pu) { ${e.soa ? '' : 'const restText = tkText[pos].slice(1);'} if (tokN === tkCap) growTok(); parenCachePos = -1; + // token indices shift past this point: the OLD-TREE adoption mapping + // (adoptDmg*/adoptDelta, frozen at edit start) is no longer valid — turn + // adoption off for the remainder of this parse (the '>' split is rare; the + // memo generation bump below already isolates the memo) + adoptRoot = -1; + adoptRunPos = -1; tkK.copyWithin(pos + 1, pos, tokN); tkT.copyWithin(pos + 1, pos, tokN); tkOff.copyWithin(pos + 1, pos, tokN); @@ -1750,14 +2077,17 @@ function matchPuLitGT(pu) { if (parseLimit < 0) cap = tokN; // Token indices shifted: drop the per-rule memo arrays (recreated lazily at the new size). memoGenCur++; // positions shifted mid-parse: every stamped entry is stale + memoRecFloor = 0x7fffffff; // including across attempts: pre-split positions + // can never be revalidated against the new stream + for (let _ep = docEmptyPops.length - 1; _ep >= 0 && docEmptyPops[_ep] >= pos; _ep--) docEmptyPops[_ep]++; // GREEN tree: no kids/scratch fixup — every completed row and scratch entry lies // wholly BEFORE the splice point (token pos is being consumed right now), and the // carried memo was just cleared, so nothing reachable references shifted indices. scPush(~(pos << 2)); - if (++pos > maxPos) maxPos = pos; + if (++pos > frameMax) { frameMax = pos; if (pos > maxPos) maxPos = pos; } return true; } - return false; + return recovering ? missTok(pu, vs) : false; } // Generic matchLiteral kept for any unspecialized site: classify value via the baked // tables (no per-call isKeywordLiteral / string compares) and delegate. @@ -1772,9 +2102,9 @@ function matchLiteral(value) { // (No named-token kind equals K_NAMED_FALLBACK, so an unforeseen type never matches.) // The materialized tokenType is type-derived (kind 0) — name needs no baking here. function matchTokK(nameKind) { - if (pos >= cap || tkK[pos] !== nameKind) return false; + if (pos >= cap || tkK[pos] !== nameKind) return recovering ? missTok(-nameKind) : false; scPush(~(pos << 2)); - if (++pos > maxPos) maxPos = pos; + if (++pos > frameMax) { frameMax = pos; if (pos > maxPos) maxPos = pos; } return true; } @@ -1786,29 +2116,32 @@ function parseTemplateExpr() { const k = tkK[pos]; if (k === K_TPL_TOKEN) { scPush(~(pos << 2)); - if (++pos > maxPos) maxPos = pos; + if (++pos > frameMax) { frameMax = pos; if (pos > maxPos) maxPos = pos; } return true; } if (k === K_TEMPLATE_HEAD) { const mark = scn; + const save = pos; scPush(~(pos << 2)); - if (++pos > maxPos) maxPos = pos; + if (++pos > frameMax) { frameMax = pos; if (pos > maxPos) maxPos = pos; } const interpRule = currentPrattContext ?? EXPR_RULE; + // a head COMMITS to the full chain: every substitution must hold an + // expression and every span must continue (middle) or close (tail) — an + // unterminated template is a parse failure, not a shorter match while (true) { - RULES[interpRule](); - if (pos >= cap) break; + if (!RULES[interpRule]() || pos >= cap) { pos = save; scn = mark; return false; } const nk = tkK[pos]; if (nk === K_TEMPLATE_MIDDLE) { scPush(~(pos << 2)); - if (++pos > maxPos) maxPos = pos; + if (++pos > frameMax) { frameMax = pos; if (pos > maxPos) maxPos = pos; } continue; } if (nk === K_TEMPLATE_TAIL) { scPush(~(pos << 2)); - if (++pos > maxPos) maxPos = pos; + if (++pos > frameMax) { frameMax = pos; if (pos > maxPos) maxPos = pos; } break; } - break; + pos = save; scn = mark; return false; } scPush(finishNode(RID_TEMPLATE, mark)); return true; @@ -1867,6 +2200,53 @@ function emitRuleFns(e: Emitter, a: ReturnType) { }); e.emit(`const SURG_ELEM = new Int32Array([${surg.join(',')}]);`); e.emit(`const RULE_FN_BY_ID = [${a.grammar.rules.map(r => ruleFn(r.name)).join(', ')}];`); + { + // Paired-opener table for diagnostics: for each literal C, intersect — across + // every seq occurrence of C that has preceding literals in its sequencing scope + // (transparent groups inlined; quantifier/alt/not bodies are separate scopes) — + // the SETS of those preceding literals. A unique survivor is C's structural + // opener: ')' keeps '(' through if/while/call alike (interior separators like + // the index signature's ':' vary per shape and intersect away), while ','/':' + // themselves intersect to nothing. No bracket list is hardcoded. Used to attach + // "to match this 'x'" related info to "expected 'C'" $missing diagnostics; the + // sibling scan at collect time self-guards (no opener leaf in the row, no info). + const tOfLit = (txt: string) => (isKeywordLiteral(txt) ? a.symtab.kwLitKind.get(txt) : a.symtab.puLitKind.get(txt)) ?? 0; + const inter = new Map(); // closer t → intersection, nearest-last order + const walk = (x: RuleExpr, acc: number[] | null): void => { + switch (x.type) { + case 'seq': { const sc = acc ?? []; for (const it of x.items) walk(it, sc); return; } + case 'group': walk(x.body, acc); return; + case 'literal': { + const c = tOfLit(x.value); + if (c <= 0) return; + if (acc !== null && acc.length > 0) { + const prev = inter.get(c); + if (prev === undefined) inter.set(c, acc.filter(o => o !== c)); + else inter.set(c, prev.filter(o => acc.includes(o))); + } + if (acc !== null) acc.push(c); + return; + } + // quantifier/alt contents physically FOLLOW the scope's earlier literals + // (an arm of `seq('[', alt(...), ']')` sits after the '['), so they inherit + // a COPY of the accumulator; nothing leaks back out (which arm matched, or + // whether the quantifier matched at all, is unknowable statically). + case 'quantifier': walk(x.body, acc === null ? null : [...acc]); return; + case 'alt': for (const it of x.items) walk(it, acc === null ? null : [...acc]); return; + case 'not': return; + default: return; // refs / zero-width markers neither pair nor reset + } + }; + for (const rule of a.grammar.rules) walk(rule.body, null); + const n = a.symtab.kwLitKind.size + a.symtab.puLitKind.size + 1; + const arr = new Array(n).fill(0); + for (const [c, set] of inter) if (set.length === 1) arr[c] = set[0]; + e.emit(`const PAIR_OPEN = new Int32Array([${arr.join(',')}]);`); + } + // Viable-set messages, registered per CALL SITE during the rule emission above + // (see vsetFor): id → " or "-joined alternatives, decoded from the $missing + // row's packed rowStart at settle. + e.emit(`const VSETS = ${J(e.vsetMsgs)};`); } // Non-recursive rule: longest-match over alts (mirrors parseNonRec). A better arm is @@ -1915,7 +2295,8 @@ function emitNonRecRule(e: Emitter, a: ReturnType, rule: RuleDec // Left-recursive (non-Pratt) rule: atom then continuations (mirrors parseLeftRec). function emitLeftRecRule(e: Emitter, a: ReturnType, rule: RuleDecl) { const ruleFn = `R_${sanitize(rule.name)}`; - const { atoms, continuations } = a.leftRecClassified.get(rule.name)!; + const sn = sanitize(rule.name); + const { atoms, continuations, contNotLeftLeaf } = a.leftRecClassified.get(rule.name)!; const contMix = a.contMeta.get(rule.name)!; // A left-rec rule, like a Pratt rule, goes through parseRule's memo + context + // suppress wrapper in the interpreter — so currentPrattContext is set to this rule @@ -1923,6 +2304,10 @@ function emitLeftRecRule(e: Emitter, a: ReturnType, rule: RuleDe // template-literal TYPE must parse as Type, not the default expression rule). const rid = a.grammar.rules.indexOf(rule); e.emit(`function ${ruleFn}() { return parseRuleEntry(${e.memoIndex(rule.name)}, ${rid}, ${J(rule.name)}, ${ruleFn}_lr); }`); + // notLeftLeaf head-leaf word sets (module-level, built once) for this rule's gated continuations. + contNotLeftLeaf.forEach((words, i) => { + if (words) e.emit(`const _NLLC_${sn}_${i} = new Set(${J(words)});`); + }); e.emit(`function ${ruleFn}_lr(_minBp) {`); e.emit(` const saved = pos; const mark = scn;`); e.emit(` let node = -1; let bestAtomPos = saved;`); @@ -1944,10 +2329,16 @@ function emitLeftRecRule(e: Emitter, a: ReturnType, rule: RuleDe e.emit(` const contSaved = pos; const contMark = scn;`); continuations.forEach((cont, i) => { e.emit(` pos = contSaved; scn = contMark;`); - e.emit(` { let ok = cont_${sanitize(rule.name)}_${i}();`); + // notLeftLeaf head-leaf gate: skip this continuation when the LEFT node's outermost (head) leaf + // text is in its word set (e.g. `void`/`null`/`this` can't be `.`-qualified as a type). + const gate = contNotLeftLeaf[i] ? `!_NLLC_${sn}_${i}.has(_headLeafText(node)) && ` : ''; + e.emit(` { let ok = ${gate}cont_${sanitize(rule.name)}_${i}();`); if (contMix[i]) { e.emit(` if (!ok) { pos = contSaved; scn = contMark; ok = matchMixfixLed_${sanitize(rule.name)}_cont_${i}(); }`); } + // A zero-width continuation is possible only via token synthesis (a strict one + // would never terminate this loop) — discard it or the loop spins. + e.emit(` if (ok && pos === contSaved) { scn = contMark; ok = false; }`); e.emit(` if (ok) {`); e.emit(` node = finishWrap(${rid}, node, contMark);`); e.emit(` continue outer;`); @@ -1972,20 +2363,33 @@ function emitPrattRule(e: Emitter, a: ReturnType, rule: RuleDecl const sn = sanitize(rule.name); const { nuds, leds } = a.prattClassified.get(rule.name)!; const meta = a.ledMeta.get(rule.name)!; + const nudCap = a.nudCap.get(rule.name)!; + const anyCapped = nudCap.some(c => c !== null); // R_() wraps parseRule's memo/context handling, then calls the bp-taking core. const rid = a.grammar.rules.indexOf(rule); e.emit(`function ${ruleFn}() { return parseRuleEntry(${e.memoIndex(rule.name)}, ${rid}, ${J(rule.name)}, ${ruleFn}_pratt); }`); + // notLeftLeaf head-leaf word sets (module-level, built once) for this rule's gated LED arms. + meta.notLeftLeaf.forEach((words, i) => { + if (words) e.emit(`const _NLL_${sn}_${i} = new Set(${J(words)});`); + }); e.emit(`function ${ruleFn}_pratt(minBp) {`); e.emit(` const saved = pos; const mark = scn;`); e.emit(` let lhs = -1; let bestNudPos = saved;`); + // `capped` becomes true iff the winning NUD is a capped (assignment-level) expression — + // an ArrowFunction. Such a NUD admits no led, so the led loop is skipped entirely. + if (anyCapped) e.emit(` let capped = false; _prattCapped = false;`); // NUD loop. const nudDispatch = e.altMaskDispatch(nuds, '_am'); if (nudDispatch) e.emit(` ${nudDispatch.maskInit}`); nuds.forEach((nud, i) => { const items = nud.type === 'seq' ? nud.items : [nud]; + const capBp = nudCap[i]; e.emit(` // nud ${i}`); - e.emit(` if (${nudDispatch ? nudDispatch.bit(i) : e.altGuard(nud)}) {`); + // A capped NUD parses only at a minBp LOOSER than its cap: it is refused as a tighter + // operator's operand (so `a || () => {}` rejects — `||`'s rhs minBp >= the cap). + const guard = nudDispatch ? nudDispatch.bit(i) : e.altGuard(nud); + e.emit(` if (${capBp !== null ? `minBp < ${capBp} && ` : ''}${guard}) {`); e.emit(` pos = saved; scn = mark;`); if (items[0]?.type === 'prefix') { // prefix $ pattern: identical to parsePratt's prefix branch. @@ -1993,8 +2397,14 @@ function emitPrattRule(e: Emitter, a: ReturnType, rule: RuleDecl e.emit(` const info = PREFIX_BY_T[tkT[pos]];`); e.emit(` if (info) {`); e.emit(` scPush(~((pos << 2) | 2));`); - e.emit(` if (++pos > maxPos) maxPos = pos;`); - e.emit(` const rhs = ${ruleFn}_pratt(info.rbp);`); + e.emit(` if (++pos > frameMax) { frameMax = pos; if (pos > maxPos) maxPos = pos; }`); + e.emit(` let rhs = ${ruleFn}_pratt(info.rbp);`); + e.emit(` if (rhs < 0 && recovering) rhs = missRule(${rid});`); + // A target-requiring prefix (`++`/`--`) operand must be a LeftHandSideExpression + // (`++-x`, `++ ++x`, `++x--`, `++await x` are syntax errors). Fail hard like + // noUnaryLhs. A recovery-synthesized $missing operand has no children, so + // _notTarget returns false → recovery is not falsely rejected. + e.emit(` if (rhs >= 0 && info.requireTarget && _notTarget(rhs)) return -1;`); e.emit(` if (rhs >= 0 && pos > bestNudPos) { scPush(rhs); lhs = finishNode(${rid}, mark); bestNudPos = pos; }`); e.emit(` }`); e.emit(` }`); @@ -2002,6 +2412,8 @@ function emitPrattRule(e: Emitter, a: ReturnType, rule: RuleDecl e.emit(` if (nud_${sn}_${i}() && pos > bestNudPos) {`); e.emit(` lhs = finishNode(${rid}, mark);`); e.emit(` bestNudPos = pos;`); + // The LONGEST match wins; record whether THAT winner is capped. + if (anyCapped) e.emit(` capped = ${capBp !== null ? 'true' : 'false'};`); e.emit(` }`); } e.emit(` }`); @@ -2009,6 +2421,9 @@ function emitPrattRule(e: Emitter, a: ReturnType, rule: RuleDecl e.emit(` scn = mark;`); e.emit(` if (lhs < 0) { pos = saved; return -1; }`); e.emit(` pos = bestNudPos;`); + // A capped NUD (assignment-level arrow) admits no led: return it as-is so a trailing + // tighter operator stays unconsumed and the enclosing parse rejects (`() => {} || a`). + if (anyCapped) e.emit(` if (capped) { _prattCapped = true; return lhs; }`); e.emit(` let tailClosed = false;`); e.emit(` while (true) {`); e.emit(` if (pos >= cap) break;`); @@ -2028,6 +2443,9 @@ function emitPrattRule(e: Emitter, a: ReturnType, rule: RuleDecl // Precedence gate for alternative-form LEDs (see LedPrec): without it they bind // maximally tight (`a == b ? c : d` mis-grouped as `a == (b ? c : d)`). if (meta.prec[i]) conds.push(`${meta.prec[i]!.lbp} > minBp`); + // notLeftLeaf head-leaf gate: skip the arm when the LEFT node's outermost (head) leaf text + // is in the arm's word set (e.g. `void`/`null`/`this` can't be `.`-qualified as a type). + if (meta.notLeftLeaf[i]) conds.push(`!_NLL_${sn}_${i}.has(_headLeafText(lhs))`); // suppress: skip a LED whose first literal connector is in suppressCur. const firstLit = (led.items[0]?.type === 'literal') ? led.items[0].value : null; if (firstLit !== null) conds.push(`!(suppressCur && suppressCur.has(${J(firstLit)}))`); @@ -2043,6 +2461,8 @@ function emitPrattRule(e: Emitter, a: ReturnType, rule: RuleDecl if (meta.mixfix[i]) { e.emit(` if (!ok) { pos = ledSaved; scn = ledMark; ok = matchMixfixLed_${sn}_led_${i}(); }`); } + // Zero-width LED = synthetic-only (see the continuation loop note) — discard. + e.emit(` if (ok && pos === ledSaved) { scn = ledMark; ok = false; }`); e.emit(` if (ok) {`); e.emit(` lhs = finishWrap(${rid}, lhs, ledMark);`); if (meta.tailClosing[i]) e.emit(` tailClosed = true;`); @@ -2060,12 +2480,19 @@ function emitPrattRule(e: Emitter, a: ReturnType, rule: RuleDecl e.emit(` if (info && info.lbp > minBp) {`); e.emit(` if (info.position === 'postfix') {`); e.emit(` if (!tailClosed) {`); + // A target-requiring postfix (`++`/`--`) may not apply to a unary/update operand + // (`++x++`, `x++ ++`): its operand must be a LeftHandSideExpression. Fail hard (like + // noUnaryLhs), so the expression can't reparse some other way. + e.emit(` if (REQTGT_T[tkT[pos]] !== 0 && _notTarget(lhs)) return -1;`); e.emit(` scPush(~((pos << 2) | 2));`); - e.emit(` if (++pos > maxPos) maxPos = pos;`); + e.emit(` if (++pos > frameMax) { frameMax = pos; if (pos > maxPos) maxPos = pos; }`); e.emit(` lhs = finishWrap(${rid}, lhs, ledMark);`); e.emit(` tailClosed = true; matched = true;`); e.emit(` }`); e.emit(` } else {`); + // A target-requiring infix (`=`/`+=`/…) needs a LeftHandSideExpression LEFT operand + // (`-x = 1`, `++x = 1`, `x++ = 1` are syntax errors). Like noUnaryLhs, fail hard. + e.emit(` if (REQTGT_T[tkT[pos]] !== 0 && _notTarget(lhs)) return -1;`); e.emit(` if (NOUNARY_T[tkT[pos]] !== 0 && rowCount[lhs] > 0) {`); e.emit(` const _h = kids[rowStart[lhs]];`); e.emit(` if (_h < 0 && ((~_h) & 3) === 2) {`); @@ -2075,8 +2502,14 @@ function emitPrattRule(e: Emitter, a: ReturnType, rule: RuleDecl e.emit(` }`); e.emit(` }`); e.emit(` scPush(~((pos << 2) | 2));`); - e.emit(` if (++pos > maxPos) maxPos = pos;`); - e.emit(` const rhs = ${ruleFn}_pratt(info.rbp);`); + e.emit(` if (++pos > frameMax) { frameMax = pos; if (pos > maxPos) maxPos = pos; }`); + e.emit(` let rhs = ${ruleFn}_pratt(info.rbp);`); + e.emit(` if (rhs < 0 && recovering) rhs = missRule(${rid});`); + // CAP PROPAGATION: an operator whose RHS is a capped assignment-level expression (an + // ArrowFunction) is ITSELF capped — `a = () => {}` admits no further led, so a trailing + // `|| x` / `? :` stays unconsumed and the parse rejects (`a = () => {} || x`). `return lhs` + // keeps `_prattCapped` true so an enclosing operator refuses it too (`b = a = arrow`). + if (anyCapped) e.emit(` if (rhs >= 0 && _prattCapped) { scPush(rhs); lhs = finishWrap(${rid}, lhs, ledMark); return lhs; }`); e.emit(` if (rhs >= 0) { scPush(rhs); lhs = finishWrap(${rid}, lhs, ledMark); matched = true; }`); e.emit(` else { pos = ledSaved; scn = ledMark; }`); e.emit(` }`); @@ -2103,7 +2536,8 @@ function emitPrattRule(e: Emitter, a: ReturnType, rule: RuleDecl e.emit(`function led_${sn}_${i}() {`); e.emit(` const _save = pos; const _sn = scn;`); e.emit(e.matchInto({ type: 'seq', items: led.items.slice(0, -1) } as RuleExpr, 'pos = _save; scn = _sn; return false;')); - e.emit(` const _rhs = ${ruleFn}_pratt(${lp.rhsBp});`); + e.emit(` let _rhs = ${ruleFn}_pratt(${lp.rhsBp});`); + e.emit(` if (_rhs < 0 && recovering) _rhs = missRule(${rid});`); e.emit(` if (_rhs < 0) { pos = _save; scn = _sn; return false; }`); e.emit(` scPush(_rhs);`); e.emit(` return true;`); @@ -2208,9 +2642,19 @@ function parseRuleEntry(idx, rid, name, core) { let mn = memoNode[idx]; let mx = memoExt[idx]; let mg = memoGen[idx]; - if (!mySup && !capped && me !== undefined && mg[start] === memoGenCur) { + const mgs = me !== undefined ? mg[start] : 0; + // Entry validity: its own generation (negative = cycle-tainted, own-generation + // only, and whoever reuses it inherits the taint), or — across recovery attempts + // of one sequence — any earlier attempt's entry whose probe window is bar-free + // (strict, context-free behavior; see memoRecFloor) and untainted. + if (!mySup && !capped && me !== undefined && (mgs === memoGenCur + || (recovering && (mgs === -memoGenCur + || (mgs >= memoRecFloor && mgs < memoGenCur && !recoverFree && barFreeWin(start, mx[start])))))) { const e = me[start]; if (e !== undefined) { + if (mgs !== memoGenCur) { + if (mgs < 0) cycleMinSerial = 0; else mg[start] = memoGenCur; + } pos = e; // The jump SEMANTICALLY reads everything the stored parse read: keep the advance // watermark ≥ the entry's watermark, or an ENCLOSING rule that completes right @@ -2219,14 +2663,17 @@ function parseRuleEntry(idx, rid, name, core) { // the gap keeps the stale entry alive. A guaranteed batch no-op: the watermark is // monotone and was already ≥ this value when the entry was stored. const ex = mx[start]; - if (ex > maxPos) maxPos = ex; + if (ex > frameMax) { frameMax = ex; if (ex > maxPos) maxPos = ex; } const id = mn[start]; if (id >= 0) { // refresh the reused root's transient BUILD coordinates to the current stream // (its green internals are position-independent; only the attachment point — - // what the enclosing finishNode reads — must be current). + // what the enclosing finishNode reads — must be current). start can be tokN + // for a zero-width synthesized row minted AT EOF — toff(tokN) reads past the + // token columns (stale slots from a longer previous document), so use the + // same EOF guard offset() uses. absTok[id] = start; - absChar[id] = toff(start); + absChar[id] = start < tokN ? toff(start) : (tokN > 0 ? tend(tokN - 1) : 0); scPush(id); return true; } @@ -2239,10 +2686,20 @@ function parseRuleEntry(idx, rid, name, core) { : start >= adoptDmgOldEnd + adoptDelta ? start - adoptDelta : -1; if (q >= 0) { const aid = adoptSeek(q, rid); - if (aid >= 0) { + if (aid >= 0 && recovering && rowRM[aid] !== 0 && missAt(start + rowTokLen[aid])) { + // RE-DERIVE (don't adopt): this recovery-made row ENDS on a recovery bar — exactly + // where a following sibling's list-element / optional synthesis reads the per-position + // memo that this row's interior derivation SEEDS under commitment (missRule/missTok + // fire only when pos > probeBase, a NON-local context barsWindowEq can't see). Adopting + // skips the interior, leaving the memo un-seeded, so the sibling synthesizes one fewer + // $missing than a fresh parse — the incremental≢fresh divergence (#47). Synthesis only + // fires AT a bar (recoverArmed), so a bar at this row's end is precisely the condition. + } else if (aid >= 0 && recovering && !barsWindowEq(start, q, rowExt[aid])) { + // bar context differs from the build run — parse this window for real + } else if (aid >= 0) { pos = start + rowTokLen[aid]; const ext = start + rowExt[aid]; - if (ext > maxPos) maxPos = ext; + if (ext > frameMax) { frameMax = ext; if (ext > maxPos) maxPos = ext; } absTok[aid] = start; absChar[aid] = toff(start); if (adoptHitP >= 0) { @@ -2262,24 +2719,52 @@ function parseRuleEntry(idx, rid, name, core) { } me[start] = pos; mn[start] = aid; - mx[start] = maxPos; + mx[start] = ext; mg[start] = memoGenCur; scPush(aid); return true; } } } + let recKey = -1; + let mySerial = 0; + if (recovering) { + recKey = idx * (tokN + 1) + start; + const rs = recRunning.get(recKey); + if (rs !== undefined) { + // PEG cycle refusal — record which frame it leans on: every open frame + // entered after that one now holds a context-dependent partial result. + if (rs < cycleMinSerial) cycleMinSerial = rs; + return false; + } + mySerial = ++recSerial; + recRunning.set(recKey, mySerial); + } const prevContext = currentPrattContext; currentPrattContext = name; const prevSup = suppressCur; suppressCur = mySup; + const fm0 = frameMax; + frameMax = start; + const cm0 = cycleMinSerial; + if (recKey >= 0) cycleMinSerial = 0x7fffffff; let result; try { result = core(0); } finally { currentPrattContext = prevContext; suppressCur = prevSup; + if (recKey >= 0) recRunning.delete(recKey); + } + let tainted = false; + if (recKey >= 0) { + // Tainted iff some cycle refusal inside this frame leaned on an ancestor of + // the frame itself (entered strictly before it). Fold the minimum outward: + // a refusal that taints this frame taints every enclosing one too. + tainted = cycleMinSerial < mySerial; + if (cm0 < cycleMinSerial) cycleMinSerial = cm0; } + if (result < 0 && recovering) result = missRule(rid); if (!mySup && !capped) { if (me === undefined || me.length < tokN + 1) { me = new Array(tokN + 1); @@ -2293,12 +2778,30 @@ function parseRuleEntry(idx, rid, name, core) { } me[start] = pos; mn[start] = result; - mx[start] = maxPos; - mg[start] = memoGenCur; // the TRUE probe watermark — the +2 read slack (stop token, - // SECOND-token dispatch) is applied at INVALIDATION time - if (result >= 0) rowOK[result] = 1; + mx[start] = frameMax; // the TRUE probe watermark — the +2 read slack (stop token, + // SECOND-token dispatch) is applied at INVALIDATION time + mg[start] = tainted ? -memoGenCur : memoGenCur; + if (result >= 0) { + rowOK[result] = 1; + // a context-tainted result (cycle refusal leaning on an ancestor) is also + // untrustworthy as a ROW: stamp rowRM bit 2 so adoption refuses it — the + // memo stamp alone only protects the entry, not the row adoptSeek can find + if (tainted) rowRM[result] |= 2; + // The row's OWN watermark freezes at finishNode — for a Pratt rule that is + // BEFORE the failed LED extension arms run (the NUD/shorter row survives the + // longest-match), so rowExt under-records the rule's true probe extent and a + // later edit inside a failed arm's reads would not invalidate an adoption. + // The memo watermark (maxPos at exit) is the truth — write it back to the + // row, where adoption can see it after the memo generation dies. (This also + // covers recovering-built rows: a fire that cut a losing arm short is still + // bounded by the recorded probes, so no mode stamp is needed for adoption — + // rowRM stays purely structural for the diagnostics walk.) + const re = frameMax - start; + if (re > rowExt[result]) rowExt[result] = re; + } } + if (fm0 > frameMax) frameMax = fm0; if (result >= 0) { scPush(result); return true; } return false; } @@ -2344,7 +2847,7 @@ function leafTokenType(entry, tokBase) { // — the node's own absolute start coordinates. Leaf spans come from the token // columns at tokBase + the entry's node-relative token index. export const tree = { - ruleNameOf: (id) => RULE_NAMES[rowRule[id]], + ruleNameOf: (id) => RULE_DISPLAY[rowRule[id]], ruleIdOf: (id) => rowRule[id], lenOf: (id) => rowLen[id], tokLenOf: (id) => rowTokLen[id], @@ -2400,7 +2903,8 @@ function visitCore(entry, fns, charBase, tokBase) { // Parse to the ARENA: returns the root node id. function lexInto(source) { -${e.soa ? ` tokenize(source);` : String.raw` docPieces = [source]; docPieceOff = [0]; docLen = source.length; docFlat = source; docCur = 0; +${e.soa ? ` tokenize(source); + docEmptyPops = lexEmptyPops.slice();` : String.raw` docPieces = [source]; docPieceOff = [0]; docLen = source.length; docFlat = source; docCur = 0; const _toks = tokenize(source); const _n = _toks.length; while (tkCap < _n + 1) growTok(); @@ -2425,6 +2929,10 @@ function farthest(errPos) { function runParse(entryRule) { pos = 0; maxPos = 0; + frameMax = 0; + recRunning.clear(); + recSerial = 0; + cycleMinSerial = 0x7fffffff; parseLimit = -1; cap = tokN; currentPrattContext = null; @@ -2439,11 +2947,30 @@ function runParse(entryRule) { return er; } if (!RULES[entry]()) { - const hasTok = pos < cap; - throw new Error('Parse error at offset ' + (hasTok ? toff(pos) : 0) + ': unexpected ' + (hasTok ? "'" + tokTextAt(pos) + "'" : 'end of input') + farthest(pos)); + if (!recovering || !recoverArmed(pos, maxPos)) { + const hasTok = pos < cap; + throw new Error('Parse error at offset ' + (hasTok ? toff(pos) : 0) + ': unexpected ' + (hasTok ? "'" + tokTextAt(pos) + "'" : 'end of input') + farthest(pos)); + } + const mark = scn; + const from = pos; + while (pos < tokN) { scPush(~(pos << 2)); pos++; } + if (pos > frameMax) { frameMax = pos; if (pos > maxPos) maxPos = pos; } + docDiags.push({ offset: from < tokN ? toff(from) : 0, end: tokN > 0 ? tend(tokN - 1) : 0, message: 'no parse' }); + scPush(finishNode(RID_ERROR, mark)); } if (pos < tokN) { - throw new Error('Parse error at offset ' + toff(pos) + ": unexpected '" + tokTextAt(pos) + "' after successful parse" + farthest(pos)); + if (!recovering || !recoverArmed(pos, maxPos)) { + throw new Error('Parse error at offset ' + toff(pos) + ": unexpected '" + tokTextAt(pos) + "' after successful parse" + farthest(pos)); + } + // absorb the unconsumed tail and WRAP [root, tail] — only non-repetition entry + // rules can get here (a rep entry absorbs at its own level) + const mark = scn; + const from = pos; + while (pos < tokN) { scPush(~(pos << 2)); pos++; } + if (pos > frameMax) { frameMax = pos; if (pos > maxPos) maxPos = pos; } + docDiags.push({ offset: toff(from), end: tend(tokN - 1), message: "unexpected '" + tokTextAt(from) + "' after successful parse" }); + scPush(finishNode(RID_ERROR, mark)); + scPush(finishNode(RID_ERROR, 0)); } const rootId = sc[--scn]; rootCharBase = absChar[rootId]; rootTokBase = absTok[rootId]; @@ -2453,14 +2980,7 @@ function runParse(entryRule) { // Source of the last COMPLETED parse — the token columns, arena and memo describe it. // null whenever the module state is not a coherent snapshot (no parse yet, or the last // attempt threw), so parseEdited falls back to a full parse. -// Coherent-edit-base flag: false after a rejected attempt (the next edit falls -// back to a full re-parse of the document text). -let lastOk = false; -// Pieces snapshot of the LIVE tree's text (survives a rejected edit): the reject -// path re-lexes it so the handle keeps reading the previous tree. The document -// pieces above advance on EVERY edit, accepted or rejected — the editor's buffer -// applied the change regardless, and later coordinates are against it. -let treePieces = null; + // the LAST parse root's absolute coordinates (the descent origin — see visit/toObject) let rootCharBase = 0; let rootTokBase = 0; @@ -2532,6 +3052,7 @@ function adoptSeek(q, rid) { let xid = e, xb = cb; for (;;) { if (rowOK[xid] !== 0 && rowRule[xid] === rid + && ((recovering ? rowRM[xid] & 2 : rowRM[xid]) === 0) && (q + rowExt[xid] + 2 <= adoptDmgStart || q >= adoptDmgOldEnd)) { return xid; } @@ -2548,6 +3069,292 @@ function adoptSeek(q, rid) { adoptPath.push(id); adoptBase.push(base); } } +// ── Error recovery (the TOTAL second pass) ── +// parse/edit never crash on input: the strict pass runs first (valid inputs take it +// exclusively — byte-identical trees, full PEG alternative exploration), and only a +// strict REJECT re-parses with the recovering flag set. Failing elements absorb +// tokens into $error rows (their leaves keep the CST text-tiling invariant); what +// went wrong lands in docDiags — the cst.errors field. +let recovering = false; +// cst.errors — a VIEW rebuilt per parse/edit from two sources (array identity is +// stable; contents are spliced in place): +// docLex: STRUCTURED lexer diagnostics (kind + position), persistent across edits +// (shifted like any suffix span; the damage window's re-lex replaces its range). +// Messages are FORMATTED at settle time with the CURRENT offset — a stored +// message string would embed a stale offset after shifts. +// parser diagnostics: derived from the TREE — fresh $error rows via the surviving +// recovery candidates, ADOPTED ones by walking the rowRM-marked subtrees that +// adoption reused this pass (a recovering pass adopts error regions wholesale, +// so per-pass collection alone would silently drop their diagnostics). docPar +// keeps the formatted result for the paths that do not re-parse (surgery). +let docDiags = []; +let docLex = []; +let docPar = []; + +function lexMsg(g) { + if (g.kind === 0) return "Unexpected character at offset " + g.offset + ": '" + g.ch + "'"; + if (g.kind === 1) return 'Invalid escape sequence in template at offset ' + g.offset; + if (g.kind === 2) return 'Unterminated template literal at offset ' + g.offset; + if (g.kind === 3) return "Invalid identifier escape at offset " + g.offset + ": '" + g.ch + "'"; + return g.ch; // kind 4: a verbatim engine message (the totality net) +} +// ── Recovery BARS: the discipline that keeps recovery equivalence-safe ── +// A repetition element fails constantly during ORDINARY parsing (a statement list +// legitimately ends at 'case'; a losing longest-match arm fails mid-probe). Letting +// recovery fire at any failure absorbs valid text and RESCUES losing arms — and the +// incremental side, which adopts strictly-parsed rows instead of re-probing them, +// would diverge from a fresh recovering parse. Recovery therefore only fires at +// positions a STRICT pass has proven to fail: each attempt runs strictly except at +// the ordered bar list (fire when probing reaches the bar, then disarm); a failure +// past the last bar aborts the attempt, appends the new farthest-fail bar, and the +// pass re-runs (adoption keeps re-runs cheap). Bars are text-determined, so fresh +// and incremental recovering parses are byte-identical by construction. +let recoverBars = []; +// (rule, pos) frames currently ON THE STACK during a recovering run, keyed to +// their entry SERIAL. Token synthesis makes zero-width matches possible, so a rule +// can re-enter itself at the SAME position through a synthesized leading token — +// an unbounded recursion no grammar check can rule out. A re-entered (rule, pos) +// frame fails (PEG cycle semantics). Recovering runs also open the first-token +// dispatch guards, so a guard-free ref chain can cycle at one position WITHOUT any +// synthesis — the refusal then depends on which frames are on the stack, i.e. the +// failing result is a function of the frame's ANCESTORS, not of the text alone. +// Strict runs never consult this (zero hot-path cost). +const recRunning = new Map(); +let recSerial = 0; +// Minimum entry-serial referenced by any cycle refusal during the current frame's +// core (0x7fffffff = none). A refusal leaning on a frame entered BEFORE the current +// one (serial < the frame's own) taints the frame: its memo entry is valid only +// where the same ancestors are guaranteed — within its own generation — never +// across attempts. Internal cycles (both ends inside the frame) replay from the +// window text alone and do not taint. +let cycleMinSerial = 0x7fffffff; +// First memo generation of the CURRENT recovery attempt sequence (0x7fffffff = +// none active). Attempts in one sequence parse the SAME token stream under a +// monotonically growing bar list, so an entry from an earlier attempt is valid in +// a later one iff its probe window saw NO bars — no bars means no synthesis and no +// skip arming (both require a window bar), and the open dispatch guards only add +// non-consuming probes, so the frame behaved strictly: a pure function of the +// window text, stable under any bar list that stays out of the window. +let memoRecFloor = 0x7fffffff; +function barFreeWin(s, m) { + const hi = m + 2; + for (let i = 0; i < recoverBars.length; i++) { + const b = recoverBars[i]; + if (b > hi) break; + if (b >= s) return false; + } + return true; +} +let recoverFree = false; // iteration-cap fallback: fire at any failure (still deterministic) +// Missing-token synthesis (the tsc parseExpected analog): at a bar-adjacent failure +// of a REQUIRED literal/token match, materialize a zero-width $missing row instead +// of failing the construct — the structure completes (a call keeps its Call shape +// with the ')' marked missing) and the diagnostic reads "expected 'x'". The firing +// condition is a PURE FUNCTION of (position, bar list): pos within a fixed window +// below a bar — no counters, no maxPos (a global budget threads non-local state +// through the parse and desynchronizes adopted regions; the first attempt at this +// proved it with the cross-grammar gate). probing>0 marks failure-tolerated probes +// (not(), sep delimiters, optionals) where synthesis would flip semantics. The +// zero-width spin is killed structurally: recovering repetition loops DISCARD +// zero-width elements (hooked elements are non-nullable — only synthesis can make +// them zero-width). +let probing = 0; +// Innermost ACTIVE optional-probe start (-1 = none). Synthesis inside an optional +// group is allowed only once the group consumed past this (committed) — failures +// of an uncommitted probe are ordinary "the optional thing isn't there". +let probeBase = -1; +function missAt(p2) { + for (let i = 0; i < recoverBars.length; i++) { + const b = recoverBars[i]; + if (b > p2 + 2) break; + if (p2 <= b && b <= p2 + 2) return true; + } + return false; +} +function missTok(t, vs) { + if (probing !== 0 || pos <= probeBase || recoverFree || !missAt(pos)) return false; + const id = finishNode(RID_MISSING, scn); + rowStart[id] = vs ? t | (vs << 21) : t; + // expected identity: >0 literal int, <0 named token kind, + // >= RULE_MISS_BASE a missing NONTERMINAL (rid offset); + // bits 21+ carry the call site's viable-set id when the + // grammar proves companion literals still accepted here. + // A zero-kid row never dereferences its kids base, so the + // slot is free storage. + scPush(id); + return true; +} +// Missing-NONTERMINAL synthesis (the tsc "Expression expected" analog): a REQUIRED +// rule reference failing inside the bar window stands in as a zero-width $missing +// row carrying the rule identity. Same purity rules as missTok. Returns the node +// id (not pushed — call sites differ) or -1. +const RULE_MISS_BASE = 1 << 20; +function missRule(rid) { + if (probing !== 0 || pos <= probeBase || recoverFree || !missAt(pos)) return -1; + const id = finishNode(RID_MISSING, scn); + rowStart[id] = RULE_MISS_BASE + rid; + return id; +} + +// Collect $error rows under an adopted recovery-made subtree: offset/end from the +// row spans, the message re-derived from the first absorbed token — byte-identical +// to what recoverSkip emitted when the row was built. +// Collect every $error row in the FINAL tree by descending only the recovery-made +// spine (rowRM propagates structurally at finishNode): O(error paths), no global +// walk, no per-candidate bookkeeping — losing-arm rows are simply unreachable. +// Decode a $missing row's packed expected identity (see missTok): bits 21+ carry +// the call site's viable-set id; bit 20 marks a missing nonterminal; else a plain +// literal int (>0) or a named token kind (<0). +function missLit(v) { + if (v >= 1 << 21) return v & 0xFFFFF; + return v > 0 && v < RULE_MISS_BASE ? v : 0; +} +function missEntry(v, kb) { + let message; + if (v >= 1 << 21) message = 'expected ' + VSETS[v >>> 21]; + else if (v >= RULE_MISS_BASE) message = 'expected ' + RULE_DISPLAY[v - RULE_MISS_BASE]; + else if (v > 0) message = "expected '" + LIT_NAMES[v] + "'"; + else message = "expected '" + (K_NAMES[-v] ?? '?') + "'"; + return { offset: kb, end: kb, message }; +} +function collectErrRows(id, charBase, tokBase) { + if (rowRule[id] === RID_MISSING) { + docPar.push(missEntry(rowStart[id], charBase)); + return; + } + if (rowRule[id] === RID_ERROR) { + const fe = rowCount[id] > 0 ? kids[rowStart[id]] : 0; + if (fe < 0) { + // plain absorb: kids are raw tokens — the message quotes the first one + const ft = tokBase + ((~fe) >>> 2); + docPar.push({ offset: charBase, end: charBase + rowLen[id], message: "unexpected '" + docText(toff(ft), tend(ft)) + "'" }); + return; + } + // WRAPPER shape (the runParse leftover net wraps [partial-root, tail-$error]): + // the first kid is a NODE — decoding it as a token leaf reads a garbage column + // (the message then quotes text from an unrelated offset, and differently per + // text layer). Fall through to the generic descent: each kid derives its own + // diagnostics, the tail $error quoting its real first token. + if (rowCount[id] === 0) return; + } + const cs = rowStart[id], n = rowCount[id]; + for (let i = 0; i < n; i++) { + const e = kids[cs + i]; + if (e >= 0 && ((rowRM[e] & 1) !== 0 || rowRule[e] >= RID_ERROR)) { + if (rowRule[e] === RID_MISSING) { + // a missing CLOSER names its matched opener (tsc's "to match this '('"): + // PAIR_OPEN holds the grammar-derived structural pair, and the opener leaf + // — if the construct really matched one — sits among the earlier siblings + const entry = missEntry(rowStart[e], charBase + kcr(id, cs + i)); + // a missing CLOSER names its matched opener (tsc's "to match this '('"): + // PAIR_OPEN holds the grammar-derived structural pair, and the opener leaf + // — if the construct really matched one — sits among the earlier siblings + const lt = missLit(rowStart[e]); + if (lt > 0 && PAIR_OPEN[lt] !== 0) { + for (let j = i - 1; j >= 0; j--) { + const ee = kids[cs + j]; + if (ee < 0) { + const tk = tokBase + ((~ee) >>> 2); + if (tkT[tk] === PAIR_OPEN[lt]) { + entry.related = { offset: toff(tk), end: tend(tk), message: "to match this '" + LIT_NAMES[PAIR_OPEN[lt]] + "'" }; + break; + } + } + } + } + docPar.push(entry); + continue; + } + collectErrRows(e, charBase + kcr(id, cs + i), tokBase + ktr(id, cs + i)); + } + } +} +// Rebuild the cst.errors view: formatted lexer diagnostics + tree-derived parser +// diagnostics (fresh survivors + adopted rowRM subtrees), ordered by offset. +function settleDiags() { + docPar.length = 0; + if (lastRoot >= 0 && ((rowRM[lastRoot] & 1) !== 0 || rowRule[lastRoot] >= RID_ERROR)) { + collectErrRows(lastRoot, rootCharBase, rootTokBase); + } + rebuildDiagView(); +} +function rebuildDiagView() { + docDiags.length = 0; + for (let i = 0; i < docLex.length; i++) { + const g = docLex[i]; + docDiags.push({ offset: g.offset, end: g.end, message: lexMsg(g) }); + } + for (let i = 0; i < docPar.length; i++) docDiags.push(docPar[i]); + docDiags.sort((x, y) => x.offset - y.offset); +} +// Armed iff some bar lies in [pos, maxPos]: the failing element started at/before a +// proven fail point and probing reached it. STATELESS — a losing longest-match arm +// may fire and be discarded without consuming anything (backtrack-safe), legitimate +// repetition ends PAST a bar stay silent (pos > bar), and the runParse safety net +// obeys the same discipline (an ungated net would absorb on the FIRST bar-less +// attempt and pre-empt the whole iteration). +// Token indices of ')' pops that found an EMPTY paren stack, ascending (the lexer +// appends as it lexes; the window splice recomposes). Almost always empty — a +// stray closer beyond balance. The shifted lexer resync's dominant q=0 case needs +// exactly one fact about the whole old suffix ("no pop-on-empty beyond the +// candidate"), which this list answers O(1) instead of an O(suffix) min-build. +let docEmptyPops = []; +// Bar list that built lastRoot (that run's token coords); null = free-fire built +// (free-fire decisions are not bar-pure — such a tree is never adoptable while +// recovering). Strict trees carry []. +let lastBars = []; +// A row replays identically in a recovering run iff its window sees the SAME bars +// (shifted) the build run saw there — every recovery decision (hook arming, +// missTok/missRule, the cycle sentinel) is position-pure, so window text + window +// bars determine the frame's behavior completely. +function barsWindowEq(s, q, ext) { + if (lastBars === null) return false; + const hiN = s + ext + 2, hiO = q + ext + 2; + let i = 0, j = 0; + while (i < recoverBars.length && recoverBars[i] < s) i++; + while (j < lastBars.length && lastBars[j] < q) j++; + for (;;) { + const a = i < recoverBars.length && recoverBars[i] <= hiN ? recoverBars[i] - s : -1; + const b = j < lastBars.length && lastBars[j] <= hiO ? lastBars[j] - q : -1; + if (a !== b) return false; + if (a === -1) return true; + i++; j++; + } +} +function recoverArmed(from, reach) { + // armed iff THE FAILING ELEMENT is stuck at a bar: it starts at/before the bar + // and its OWN farthest probe sits ON it (+2 read slack). The reach is the + // element's frame-local watermark, NOT the global maxPos — a global frontier + // parked on a far bar must not arm unrelated loops (position-PURITY: every + // recovery decision inside a row is a function of the row's window text and + // the bars inside that window, which is what makes recovering adoption sound). + if (recoverFree) return true; + for (let i = 0; i < recoverBars.length; i++) { + const b = recoverBars[i]; + if (from <= b && b <= reach && reach <= b + 2) return true; + if (b > reach) break; + } + return false; +} +function recoverSkip(canStart, closerT, from0, reach) { + if (!recoverArmed(from0, reach)) return false; + if (pos >= cap) return false; + if (closerT >= 0 && tkK[pos] === K_PUNCT && tkT[pos] === closerT) return false; + const mark = scn; + const from = pos; + // the offending token is consumed unconditionally (it may well be IN the + // element's FIRST set — the element failed past it), then run to a sync point + scPush(~(pos << 2)); pos++; + while (pos < cap + && !(closerT >= 0 && tkK[pos] === K_PUNCT && tkT[pos] === closerT) + && !(canStart !== null && canStart(pos))) { + scPush(~(pos << 2)); pos++; + } + if (pos > frameMax) { frameMax = pos; if (pos > maxPos) maxPos = pos; } + scPush(finishNode(RID_ERROR, mark)); + return true; +} + // Run-extension: a repetition whose element was just ADOPTED bulk-adopts the // following OLD SIBLINGS in one tight loop — whole-statement reuse without // re-entering parseRuleEntry/adoptSeek once per element. Soundness: each member @@ -2566,12 +3373,14 @@ function runExtend(rid) { let oq = adoptRunOq; let nq = pos; const sfx = oq >= adoptDmgOldEnd; // past the damage: monotone, no per-member ext check - let mp = maxPos; + let mp = frameMax; while (i < csEnd) { const e = kids[i]; if (e < 0) break; if (pb + ktr(P, i) !== oq) break; if (rowRule[e] !== rid || rowOK[e] === 0) break; + if ((recovering ? rowRM[e] & 2 : rowRM[e]) !== 0) break; + if (recovering && !barsWindowEq(nq, oq, rowExt[e])) break; const tl = rowTokLen[e]; if (tl === 0) break; const ex = rowExt[e]; @@ -2583,7 +3392,7 @@ function runExtend(rid) { nq += tl; oq += tl; i++; } - if (mp > maxPos) maxPos = mp; + if (mp > frameMax) { frameMax = mp; if (mp > maxPos) maxPos = mp; } pos = nq; } @@ -2618,6 +3427,25 @@ function rowKCof(id) { } function trySurgery(dmgA, dmgB, tokD, chrD) { if (adoptRoot < 0) return -1; + if (rowRule[adoptRoot] >= RID_ERROR) return -1; + // A recovery-made tree (rowRM root) CAN take a strict splice when the edit + // provably commutes with every recovery decision: decisions are position-pure + // functions of (window text, window bars), so if no bar window touches the + // damage or the re-parsed span (second check after the re-parse, when the span's + // probe reach is known), no decision changes - kept rows replay identically at + // shifted positions, and a fresh recovering parse behaves strictly across the + // span, exactly like the strict re-parse below (its first possible fire inside + // the span would need a bar at/below the probe reach + 2). Bars adjacent to the + // damage are unmappable across the token delta; free-fire trees (lastBars null) + // are not window-pure - both refuse. + const recTree = rowRM[adoptRoot] !== 0; + if (recTree) { + if (lastBars === null) return -1; + for (let i = 0; i < lastBars.length; i++) { + const b = lastBars[i]; + if (b + 2 >= dmgA && b <= dmgB + 2) return -1; + } + } // the whole-file token math must close, or the shape changed beyond a splice if (adoptRootTok + rowTokLen[adoptRoot] + tokD !== tokN) return -1; // 1. descend along single-affected-row kids, recording the path @@ -2679,6 +3507,10 @@ function trySurgery(dmgA, dmgB, tokD, chrD) { if (L < 0) return -1; const D = surgX[L], Dbase = surgBase[L], Da = surgA[L]; const Db = surgB[L]; + // recovered trees use the length += chrD update below, which needs the node's + // char base unchanged; at Dbase >= dmgA the base token was re-lexed and its + // start may have moved + if (recTree && Dbase >= dmgA) return -1; const elem = SURG_ELEM[rowRule[D]]; const csD = rowStart[D], nD = rowCount[D]; const DendNew = Dbase + rowTokLen[D] + tokD; @@ -2687,7 +3519,8 @@ function trySurgery(dmgA, dmgB, tokD, chrD) { pos = Da < Db ? Dbase + (kids[csD + Da] < 0 ? (~kids[csD + Da]) >>> 2 : ktr(D, csD + Da)) : dmgA; - maxPos = pos; scn = 0; parseLimit = -1; cap = tokN; + const s0 = pos; + maxPos = pos; frameMax = pos; scn = 0; parseLimit = -1; cap = tokN; currentPrattContext = null; suppressNext = null; suppressCur = null; const genAt = memoGenCur; const fn = RULE_FN_BY_ID[elem]; @@ -2712,6 +3545,15 @@ function trySurgery(dmgA, dmgB, tokD, chrD) { if (!fn()) return -1; if (memoGenCur !== genAt || pos === pp) return -1; } + if (recTree) { + // the strict re-parse stands for the fresh recovering parse of this span only + // if no bar window touches anything it read (probes included) + for (let i = 0; i < lastBars.length; i++) { + const b = lastBars[i]; + const bn = b < dmgA ? b : b + tokD; + if (bn + 2 >= s0 && bn <= maxPos + 2) return -1; + } + } // 4. POINT OF NO RETURN — splice D's kid range, shift suffix rels, patch the path const f = scn; const removed = j - Da; @@ -2730,8 +3572,19 @@ function trySurgery(dmgA, dmgB, tokD, chrD) { const ks = kidN; for (let k = 0; k < Da; k++) { kids[ks + k] = kids[csD + k]; - kidRel[ks + k] = kidRel[csD + k]; - kidTokRel[ks + k] = kidTokRel[csD + k]; + // NORMALIZE prefix rels to absolute while copying: the boundary remap below + // puts rowNF at the suffix start, so an end-relative value surviving in the + // copied prefix would never flip down again — its decode would drift by every + // later length update (lengths are still the OLD ones here, so the decode + // bias matches the encoding) + const vtr = kidTokRel[csD + k]; + if (vtr < 0) { + kidTokRel[ks + k] = vtr + rowTokLen[D] + 1; + kidRel[ks + k] = kidRel[csD + k] + rowLen[D] + 1; + } else { + kidRel[ks + k] = kidRel[csD + k]; + kidTokRel[ks + k] = vtr; + } } for (let k = 0; k < f; k++) { const id = sc[k]; @@ -2794,14 +3647,24 @@ function trySurgery(dmgA, dmgB, tokD, chrD) { } } rowNF[D] = bnd; + // A node whose token end lies strictly beyond the damage keeps its char end + // shape: every end-determining coordinate (last real token, or a trailing + // zero-width $missing kid's anchor - finishNode takes the LAST KID's end, which + // a zero-width row can push past the last real token) sits in the suffix and + // shifts by exactly chrD. Only a node ENDING at/inside the damage derives its + // length from the token columns: a pure-trivia edit can sit at a node's token + // BOUNDARY (between its last token and the next sibling's first), token-inside + // but char-outside - the gap belongs to no node, and tend/toff give the exact + // new span. No zero-width kid can end such a node: zero-width rows live at + // bars, and bars adjacent to the damage were refused above. + // ... and only while the node's char BASE is unchanged (a base token at/inside + // the damage was re-lexed and may have moved - leading trivia inserted at a + // node's very start shifts base and end together, leaving the LENGTH alone, + // which is exactly what the token derivation computes) + const keepEndD = Dbase + rowTokLen[D] > dmgB && Dbase < dmgA; rowTokLen[D] += tokD; - // Derive the char length from the token columns rather than adding chrD: a pure- - // trivia edit can sit at a node's token BOUNDARY (between its last token and the - // next sibling's first), token-inside but char-outside — the gap belongs to no - // node. tend/toff give the exact new span; when suffix tokens exist inside the - // node the delta equals chrD (so the suffix-kid rel adds and the end-relative - // bias-cancel stay consistent), and when they don't there are no suffix kids. - if (rowTokLen[D] > 0) rowLen[D] = tend(Dbase + rowTokLen[D] - 1) - toff(Dbase); + if (keepEndD) rowLen[D] += chrD; + else if (rowTokLen[D] > 0) rowLen[D] = tend(Dbase + rowTokLen[D] - 1) - toff(Dbase); { let x = rowExt[D] + (tokD > 0 ? tokD : 0); const fw = maxPos - Dbase; @@ -2875,8 +3738,10 @@ function trySurgery(dmgA, dmgB, tokD, chrD) { // (end-relative kids past the boundary auto-shift via the length update below) } } + const keepEndA = surgBase[i] + rowTokLen[Ai] > dmgB && surgBase[i] < dmgA; // see rowLen[D] above rowTokLen[Ai] += tokD; - if (rowTokLen[Ai] > 0) rowLen[Ai] = tend(surgBase[i] + rowTokLen[Ai] - 1) - toff(surgBase[i]); + if (keepEndA) rowLen[Ai] += chrD; + else if (rowTokLen[Ai] > 0) rowLen[Ai] = tend(surgBase[i] + rowTokLen[Ai] - 1) - toff(surgBase[i]); { let x = rowExt[Ai] + (tokD > 0 ? tokD : 0); const cw = ktr(Ai, csA + ki) + rowExt[surgX[i + 1]]; @@ -2915,14 +3780,15 @@ function makeDoc() { rowStart: new Int32Array(8192), rowCount: new Int32Array(8192), rowExt: new Int32Array(8192), rowOK: new Uint8Array(8192), rowKC: new Uint8Array(8192), rowNF: new Int32Array(8192).fill(0x7fffffff), + rowRM: new Uint8Array(8192), absChar: new Int32Array(8192), absTok: new Int32Array(8192), rowCap: 8192, nodeN: 0, kids: new Int32Array(16384), kidRel: new Int32Array(16384), kidTokRel: new Int32Array(16384), kidCap: 16384, kidN: 0, memoNode: [], memoEnd: [], memoExt: [], memoGen: [], memoGenCur: 0, - lastOk: false, treePieces: null, + docDiags: [], docLex: [], docPar: [], docPieces: null, docPieceOff: null, docLen: 0, docFlat: null, docCur: 0, - rootCharBase: 0, rootTokBase: 0, lastRoot: -1, lastRootTok: 0, + rootCharBase: 0, rootTokBase: 0, lastRoot: -1, lastRootTok: 0, docEmptyPops: [], ${e.soa ? ' parenCachePos: -1, parenCacheStack: [],' : ''} altK: null, altT: null, altOff: null, altEnd: null, altFl: null, altDp: null, altPd: null, altCap: 0, altN: 0, @@ -2933,15 +3799,15 @@ function saveDoc(d) { d.tkDp = tkDp; d.tkPd = tkPd; d.tkCap = tkCap; d.tokN = tokN; d.srcLenP1 = srcLenP1; d.negFrom = negFrom; d.rowRule = rowRule; d.rowLen = rowLen; d.rowTokLen = rowTokLen; d.rowStart = rowStart; - d.rowCount = rowCount; d.rowExt = rowExt; d.rowOK = rowOK; d.rowKC = rowKC; d.rowNF = rowNF; + d.rowCount = rowCount; d.rowExt = rowExt; d.rowOK = rowOK; d.rowKC = rowKC; d.rowNF = rowNF; d.rowRM = rowRM; d.absChar = absChar; d.absTok = absTok; d.rowCap = rowCap; d.nodeN = nodeN; d.kids = kids; d.kidRel = kidRel; d.kidTokRel = kidTokRel; d.kidCap = kidCap; d.kidN = kidN; d.memoNode = memoNode; d.memoEnd = memoEnd; d.memoExt = memoExt; d.memoGen = memoGen; d.memoGenCur = memoGenCur; - d.lastOk = lastOk; d.treePieces = treePieces; + d.docDiags = docDiags; d.docLex = docLex; d.docPar = docPar; d.docPieces = docPieces; d.docPieceOff = docPieceOff; d.docLen = docLen; d.docFlat = docFlat; d.docCur = docCur; d.rootCharBase = rootCharBase; d.rootTokBase = rootTokBase; - d.lastRoot = lastRoot; d.lastRootTok = lastRootTok; + d.lastRoot = lastRoot; d.lastRootTok = lastRootTok; d.lastBars = lastBars; d.docEmptyPops = docEmptyPops; ${e.soa ? ' d.parenCachePos = parenCachePos; d.parenCacheStack = parenCacheStack;' : ''} d.altK = altK; d.altT = altT; d.altOff = altOff; d.altEnd = altEnd; d.altFl = altFl; d.altDp = altDp; d.altPd = altPd; d.altCap = altCap; d.altN = altN; @@ -2951,15 +3817,15 @@ function loadDoc(d) { tkDp = d.tkDp; tkPd = d.tkPd; tkCap = d.tkCap; tokN = d.tokN; srcLenP1 = d.srcLenP1; negFrom = d.negFrom; rowRule = d.rowRule; rowLen = d.rowLen; rowTokLen = d.rowTokLen; rowStart = d.rowStart; - rowCount = d.rowCount; rowExt = d.rowExt; rowOK = d.rowOK; rowKC = d.rowKC; rowNF = d.rowNF; + rowCount = d.rowCount; rowExt = d.rowExt; rowOK = d.rowOK; rowKC = d.rowKC; rowNF = d.rowNF; rowRM = d.rowRM; absChar = d.absChar; absTok = d.absTok; rowCap = d.rowCap; nodeN = d.nodeN; kids = d.kids; kidRel = d.kidRel; kidTokRel = d.kidTokRel; kidCap = d.kidCap; kidN = d.kidN; memoNode = d.memoNode; memoEnd = d.memoEnd; memoExt = d.memoExt; memoGen = d.memoGen; memoGenCur = d.memoGenCur; - lastOk = d.lastOk; treePieces = d.treePieces; + docDiags = d.docDiags; docLex = d.docLex; docPar = d.docPar; docPieces = d.docPieces; docPieceOff = d.docPieceOff; docLen = d.docLen; docFlat = d.docFlat; docCur = d.docCur; rootCharBase = d.rootCharBase; rootTokBase = d.rootTokBase; - lastRoot = d.lastRoot; lastRootTok = d.lastRootTok; + lastRoot = d.lastRoot; lastRootTok = d.lastRootTok; lastBars = d.lastBars; docEmptyPops = d.docEmptyPops; ${e.soa ? ' parenCachePos = d.parenCachePos; parenCacheStack = d.parenCacheStack;' : ''} altK = d.altK; altT = d.altT; altOff = d.altOff; altEnd = d.altEnd; altFl = d.altFl; altDp = d.altDp; altPd = d.altPd; altCap = d.altCap; altN = d.altN; @@ -2987,7 +3853,6 @@ function swapBuffers() { ${e.soa ? '' : 'let altText = [];'} function parseCore(source, entryRule) { - lastOk = false; adoptRoot = -1; adoptRunPos = -1; lexInto(source); @@ -3003,11 +3868,37 @@ function parseCore(source, entryRule) { const root = runParse(entryRule); lastRoot = root; lastRootTok = rootTokBase; - lastOk = true; - treePieces = docPieces.slice(); return root; } +// In-place diagnostic shift for a LOCALLY-strict edit (surgery): diags before the +// damage stay, diags at/after the old damage end ride the char delta, overlapping +// ones drop (their region re-parsed strictly). Splices in place — cst.errors IS +// this array. +// Parser-diag shift for the LOCALLY-strict paths (surgery / strict success): the +// LEXER list is maintained by the window block (which already dropped the re-lexed +// range and shifted the suffix — shifting here would double-apply the delta). +function shiftDiags(a, b, delta) { + let w = 0; + for (let i = 0; i < docPar.length; i++) { + const g = docPar[i]; + if (g.end <= a) { /* kept as is */ } + else if (g.offset >= b) { g.offset += delta; g.end += delta; } + else continue; + // the related anchor (the matched opener) shifts on its own coordinates — it + // can sit on the other side of the damage from its diagnostic + const r = g.related; + if (r !== undefined) { + if (r.end <= a) { /* kept */ } + else if (r.offset >= b) { r.offset += delta; r.end += delta; } + else g.related = undefined; // its token was edited: stale + } + docPar[w++] = g; + } + docPar.length = w; + rebuildDiagView(); +} + // ── Incremental re-parse ── // No edit protocol: the caller hands the NEW source; the damage window is DERIVED by // diffing the old and new token columns (longest identical prefix; longest suffix @@ -3023,30 +3914,33 @@ function parseCore(source, entryRule) { // until then. Lexing is FULL-FILE by design: the lexer carries cross-token state // (template nesting, regex context, markup modes), full lexing is a small share of a // parse, and the diff is what localizes the damage — not the lexer. +// Last-resort totality net: a layer without recovery support threw — the handle +// API still never crashes. Zero-width $error root + the thrown message as the +// diagnostic; the next successful parse/edit resumes normal service. +function totalNet(e) { + // the message lives in the SOURCE layer (docLex kind 4) — a later settle rebuilds + // the view from the sources, and a view-only push would be wiped by it + docLex.length = 0; + docPar.length = 0; + docLex.push({ offset: 0, end: 0, kind: 4, ch: String(e && e.message ? e.message : e) }); + rebuildDiagView(); + scn = 0; + const root = finishNode(RID_ERROR, 0); + lastRoot = root; + lastRootTok = 0; + lastBars = null; + rootCharBase = 0; + rootTokBase = 0; + return root; +} +function apiMisuse(msg) { + const e = new Error(msg); + e.apiMisuse = true; + return e; +} function editCore(entryRule, edits) { - try { - return editCoreRun(entryRule, edits); - } catch (e) { - // REJECTED edit: the splice (and any '>' splits of the failed attempt) already - // rewrote the token columns to the rejected text, and the append-mode fallback - // may have grown the arena — but the live tree's ROWS are untouched. Re-lexing - // the live tree's source restores every read path (leaf spans, visit, next - // edit's restart anchors); O(n) on the reject path only. - if (treePieces !== null) { - // restore the token columns to the LIVE TREE's text — but the DOCUMENT text - // must stay on the rejected content (lexInto/tokenize resets the doc layer - // as a side effect, so save it around the re-lex) - const kP = docPieces, kO = docPieceOff, kL = docLen, kF = docFlat; - lexInto(treePieces.join('')); - docPieces = kP; docPieceOff = kO; docLen = kL; docFlat = kF; docCur = 0; - lastOk = false; - } - throw e; - } -} -function editCoreRun(entryRule, edits) { if (edits === undefined || edits.length === 0) { - throw new Error('edit() requires the changes: [{ start, end, text }] (LSP-style - each edit in the coordinates of the document AFTER the preceding edits in the array)'); + throw apiMisuse('edit() requires the changes: [{ start, end, text }] (LSP-style - each edit in the coordinates of the document AFTER the preceding edits in the array)'); } // The engine owns the document text: the new source is BUILT from the changes, // so "the ranges do not match the text" is unrepresentable. Each edit is applied @@ -3055,7 +3949,7 @@ function editCoreRun(entryRule, edits) { // coordinates, the old end recovered through the total delta. V8 cons strings // make the slice+concat construction cheap; the flat-string cost, where a read // path needs one, is the same the caller would have paid building the text. - if (docPieces === null) throw new Error('edit() before parse(): no document'); + if (docPieces === null) throw apiMisuse('edit() before parse(): no document'); const oldLen = docLen; { let dS = 0x7fffffff; @@ -3064,7 +3958,7 @@ function editCoreRun(entryRule, edits) { const ed = edits[i]; const start = ed.start, end = ed.end, text = ed.text; if (!(start >= 0 && start <= end && end <= docLen) || typeof text !== 'string') { - throw new Error('edit() change #' + i + ' out of range: [' + start + ', ' + end + ') of ' + docLen); + throw apiMisuse('edit() change #' + i + ' out of range: [' + start + ', ' + end + ') of ' + docLen); } applyChange(start, end, text); const newEnd = start + text.length; @@ -3076,29 +3970,7 @@ function editCoreRun(entryRule, edits) { editDmgS = dS; editDmgE = dE; } - if (!lastOk) { - // No coherent edit base (a previous attempt rejected): full re-parse in APPEND - // mode — parseCore would reset the arena and destroy the live tree the handle - // still exposes if THIS parse rejects too. parse() is the only compaction point. - const whole = flattenDoc(); - lexInto(whole); - if (memoEnd.length !== MEMO_RULES) { - memoNode = new Array(MEMO_RULES); - memoEnd = new Array(MEMO_RULES); - memoExt = new Array(MEMO_RULES); - memoGen = new Array(MEMO_RULES); - } - memoGenCur++; - adoptRoot = -1; - adoptRunPos = -1; - const root = runParse(entryRule); - lastRoot = root; - lastRootTok = rootTokBase; - lastOk = true; - treePieces = docPieces.slice(); - return root; - } - lastOk = false; + ${e.soa ? String.raw` // ── M1: WINDOWED re-lex ── // Damage envelope from the composed changes: prefix coordinates are shared, the // old end comes back through the total delta. @@ -3110,7 +3982,16 @@ ${e.soa ? String.raw` // ── M1: WINDOWED re-lex ── // Restart anchor: the last token B ending at/before the damage whose recorded // depths are zero and whose shape carries no cross-token lexer flag (')' control- // head, postfix-ambiguous op). B = -1 restarts at the file head — always sound. - const B = findRestart(cs); + // + // RECOVERED streams add a constraint a strict stream never has: a lexer + // diagnostic marks a point whose tokenization can COUPLE BACKWARD to a later + // edit (a dangling quote pairs with a newly typed one, re-lexing everything + // between), so the window must start below the EARLIEST such point before the + // damage. Forward coupling needs no guard — the resync equality only accepts + // exact re-agreement with the old stream. + let anchorCs = cs; + for (let i = 0; i < docLex.length; i++) if (docLex[i].offset < anchorCs) anchorCs = docLex[i].offset; + const B = findRestart(anchorCs); const initParens = reconstructParensCached(B); const oN = tokN; // first old token at/after the damage end — the resync search floor @@ -3118,6 +3999,16 @@ ${e.soa ? String.raw` // ── M1: WINDOWED re-lex ── { let lo = 0, hi = oN; while (lo < hi) { const mid = (lo + hi) >> 1; if (toff(mid) < ceOld) lo = mid + 1; else hi = mid; } r0 = lo; } + // Old-side trajectory floor across the damage itself: min recorded paren depth of + // the OLD tokens inside [damage start, damage end) - the lexes diverge at the + // damage start, and the resync's fast tier needs the old min from that point on. + { + let lo = 0, hi = r0; + while (lo < hi) { const mid = (lo + hi) >> 1; if (toff(mid) < cs) lo = mid + 1; else hi = mid; } + let m = 0x7fffffff; + for (let i = lo; i < r0; i++) if (tkPd[i] < m) m = tkPd[i]; + wndOldMin0 = m; + } // Lex the window into the spare buffers (the old stream stays live for resync). if (altK === null || altCap < tkCap) { altK = new tkK.constructor(tkCap); altT = new tkT.constructor(tkCap); @@ -3126,6 +4017,7 @@ ${e.soa ? String.raw` // ── M1: WINDOWED re-lex ── altCap = tkCap; } altN = oN; + altSuffMin = null; // the old-suffix min-depth cache follows the alt stream swapBuffers(); // live = scratch, alt = OLD stream tokN = 0; const startOff = B >= 0 ? (altEnd[B] < 0 ? altEnd[B] + srcLenP1 : altEnd[B]) : 0; @@ -3133,16 +4025,24 @@ ${e.soa ? String.raw` // ── M1: WINDOWED re-lex ── // an absolute bias; -2 = ran off the window end before resyncing — re-materialize // a larger window and retry (the common case fits the first one). let R0; + const preLexN = docLex.length; // persisted lexer diags; the window's own + // emissions land after this index + lexDiagBase = preLexN; { let wHi = ceNew + 4096; for (;;) { if (wHi > docLen) wHi = docLen; const windowStr = docText(startOff, wHi); + docLex.length = preLexN; // an aborted attempt re-lexes: drop its pushes tokN = 0; try { R0 = lexCore(windowStr, 0, B >= 0 ? altK[B] : -1, B >= 0 ? altT[B] : 0, r0, ceNew, charDelta, cs, initParens.slice(), startOff, wHi < docLen); } catch (e2) { - if (e2 !== LEX_RETRY) throw e2; + if (e2 !== LEX_RETRY) { + if (recovering) throw e2; // a recovering lexer never throws — a bug + recovering = true; // lex error: the rest of this edit runs in + continue; // the recovering pass (parse included) + } R0 = -2; } if (R0 !== -2) break; @@ -3153,6 +4053,26 @@ ${e.soa ? String.raw` // ── M1: WINDOWED re-lex ── const R = R0 >= 0 ? R0 : oN; swapBuffers(); // live = OLD stream again; window sits in the alt buffers tokN = oN; + // Persisted lexer diagnostics (AFTER the swap-back — toff must decode the OLD + // columns, not the spare window set): entries inside the re-lexed range are + // superseded by the window's own emissions (queued at [preLexN..)); suffix + // entries ride the char delta; prefix entries are untouched. + { + const wndLo = startOff; + const wndHiOld = R < oN ? toff(R) : oldLen; + let w2 = 0; + for (let i = 0; i < preLexN; i++) { + const g = docLex[i]; + if (g.end <= wndLo) docLex[w2++] = g; + else if (g.offset >= wndHiOld) { g.offset += charDelta; g.end += charDelta; docLex[w2++] = g; } + } + // window emissions sit at [preLexN..) in CURRENT coordinates — never shifted; + // compact them down after the kept prefix + if (w2 < preLexN) { + for (let i = preLexN; i < docLex.length; i++) docLex[w2++] = docLex[i]; + docLex.length = w2; + } + } // EOF-relative maintenance: move the negative-zone boundary to THIS edit's suffix // start R. Tokens dropping out of the suffix ([negFrom, R)) flip back to absolute // (they sit at/before the damage now — EOF-unstable); tokens entering it @@ -3197,6 +4117,22 @@ ${e.soa ? String.raw` // ── M1: WINDOWED re-lex ── negFrom = B + 1 + W; srcLenP1 = newLen + 1; tokN = nN; + // a SHIFTED resync adopted the suffix at a different absolute paren depth: re-base + // the adopted depth records to the new truth ('(' head bits are unchanged - an + // entry's head-ness is a local fact of its own neighbors) + if (R0 >= 0 && lexResyncPd !== 0) { + for (let i = B + 1 + W; i < nN; i++) tkPd[i] += lexResyncPd; + lexResyncPd = 0; + } + // recompose the pop-on-empty index list: kept prefix + the window's own + // (window-relative + B+1) + kept suffix riding the token delta + { + const nep = []; + for (let i = 0; i < docEmptyPops.length && docEmptyPops[i] <= B; i++) nep.push(docEmptyPops[i]); + for (let i = 0; i < lexEmptyPops.length; i++) nep.push(lexEmptyPops[i] + B + 1); + for (let i = 0; i < docEmptyPops.length; i++) { const v = docEmptyPops[i]; if (v >= R) nep.push(v + tokenDelta); } + docEmptyPops = nep; + } const nN2 = nN;` : String.raw` // (fallback-lexer grammars keep the full-relex + token-diff path) const oK = tkK, oT = tkT, oOff = tkOff, oEnd = tkEnd, oFl = tkFl, oN = tokN; const oText = tkText; @@ -3210,6 +4146,9 @@ ${e.soa ? String.raw` // ── M1: WINDOWED re-lex ── tkText = altText; tkText.length = 0; altK = oK; altT = oT; altOff = oOff; altEnd = oEnd; altFl = oFl; altText = oText; + docLex.length = 0; // a FULL relex re-derives all lexer diagnostics (none, for + // the recovery-blind fallback lexer) — persisted entries + // from an earlier totality-net edit would go stale lexInto(flattenDoc()); const nN = tokN; const charDelta = docLen - oldLen; @@ -3245,26 +4184,113 @@ ${e.soa ? String.raw` // ── M1: WINDOWED re-lex ── adoptPath.length = 0; adoptBase.length = 0; adoptRunPos = -1; - const sroot = trySurgery(p, dOldEnd, tokenDelta, charDelta); + const sroot = recovering ? -1 : trySurgery(p, dOldEnd, tokenDelta, charDelta); if (sroot >= 0) { adoptRoot = -1; rootCharBase = toff(adoptRootTok); rootTokBase = adoptRootTok; lastRoot = sroot; lastRootTok = adoptRootTok; - lastOk = true; - treePieces = docPieces.slice(); + // the spliced tree keeps its bar list (surgery proved the edit clear of every + // bar window) - suffix bars ride the token delta like everything else + if (lastBars !== null) { + for (let i = 0; i < lastBars.length; i++) if (lastBars[i] >= dOldEnd) lastBars[i] += tokenDelta; + } + shiftDiags(cs, ceOld, charDelta); return sroot; } - const root = runParse(entryRule); + let root; + { + // recovering may already be true here (the window relex recovered a lex error + // and pushed its diagnostics): the first attempt then runs with EMPTY bars — + // strict at the repetition level — and a parse failure flows into the same bar + // iteration. Lex diagnostics are re-seeded into every attempt (the window was + // lexed once; only the parse re-runs). + const lexRecovered = recovering; + const lexSnap = docLex.slice(); + try { + root = runParse(entryRule); + if (!lexRecovered) { + // a strict full pass proves the document free of PARSE errors; persisted + // lexer diagnostics (e.g. an invalid escape outside the damage — its token + // is valid) survive with their shifted positions + docPar.length = 0; + rebuildDiagView(); + lastBars = []; + } else { + lastRoot = root; + lastRootTok = rootTokBase; + lastBars = []; + settleDiags(); + } + recovering = false; + } catch (e) { + // total edit: re-run the SAME spliced stream under the bar discipline. + // Adoption stays LIVE under the bars-window predicate: a row whose window + // saw the same (shifted) bars in the build run replays identically — all + // recovery decisions are position-pure — so each attempt is byte-equal to + // the fresh side's while reusing every row whose bar context matches. + // Attempt 0 (no bars) adopts only where the build run was also bar-free. + recovering = true; + const bars = []; + let done = false; + memoRecFloor = memoGenCur + 1; // attempts share the stream: bar-free-window + // entries survive across them (see decl) + try { + for (let attempt = 0; attempt < 32 && !done; attempt++) { + try { + docLex.length = 0; + for (let i = 0; i < lexSnap.length; i++) docLex.push(lexSnap[i]); + recoverBars = bars; + memoGenCur++; + adoptPath.length = 0; + adoptBase.length = 0; + adoptRunPos = -1; + scn = 0; + root = runParse(entryRule); + done = true; + lastBars = bars.slice(); + } catch (e2) { + let b = maxPos; + if (bars.length > 0 && b <= bars[bars.length - 1]) b = bars[bars.length - 1] + 1; + bars.push(b); + } + } + if (!done) { + recoverFree = true; + lastBars = null; + try { + docLex.length = 0; + for (let i = 0; i < lexSnap.length; i++) docLex.push(lexSnap[i]); + memoGenCur++; + adoptPath.length = 0; + adoptBase.length = 0; + adoptRunPos = -1; + scn = 0; + root = runParse(entryRule); + } catch (e3) { + root = totalNet(e3); + } finally { + recoverFree = false; + } + } + } finally { + recovering = false; + recoverBars = []; + memoRecFloor = 0x7fffffff; + } + lastRoot = root; + lastRootTok = rootTokBase; + settleDiags(); + } + } adoptRoot = -1; lastRoot = root; lastRootTok = rootTokBase; - lastOk = true; - treePieces = docPieces.slice(); return root; } + export { tokenize }; // ── Module-level API: the DEFAULT document (one shared session; tokenize and the // raw tree/tokenAt views read the ACTIVE doc — they are gate/debug surfaces) ── @@ -3295,14 +4321,70 @@ export function createParser() { parse(source, entryRule) { activate(d); entryUsed = entryRule; - gen++; // re-opening resets the arena: old handles die even if THIS parse rejects - const root = parseCore(source, entryRule); - return { d, gen, root }; + gen++; // re-opening resets the arena: old handles die regardless of outcome + docDiags.length = 0; + docLex.length = 0; + docPar.length = 0; + let root; + try { + root = parseCore(source, entryRule); + lastBars = []; + } catch (e) { + // total parse: the strict pass rejected — iterate recovery under the bar + // discipline (see recoverBars); the iteration cap degrades to free-fire, + // and a recovery-blind layer (fallback lexers) degrades to the zero-width + // $error root. Never a crash. + recovering = true; + const bars = []; + let done = false; + // NO cross-attempt survival here: parseCore resets the arena cursor per + // attempt (only parseEdited carries it), so an earlier attempt's rows are + // clobbered — a surviving entry would point at overwritten rows. + try { + for (let attempt = 0; attempt < 32 && !done; attempt++) { + try { + docLex.length = 0; + recoverBars = bars; + root = parseCore(source, entryRule); + done = true; + lastBars = bars.slice(); + } catch (e2) { + let b = maxPos; + if (bars.length > 0 && b <= bars[bars.length - 1]) b = bars[bars.length - 1] + 1; + bars.push(b); + } + } + if (!done) { + recoverFree = true; + lastBars = null; + adoptRoot = -1; // free-fire decisions are non-local: adoption would desync + try { + docLex.length = 0; + root = parseCore(source, entryRule); + } catch (e3) { + root = totalNet(e3); + } finally { + recoverFree = false; + } + } + } finally { + recovering = false; + recoverBars = []; + memoRecFloor = 0x7fffffff; + } + settleDiags(); + } + return { d, gen, root, errors: docDiags }; }, edit(cst, edits) { chk(cst); activate(d); - cst.root = editCore(entryUsed, edits); + try { + cst.root = editCore(entryUsed, edits); + } catch (e) { + if (e instanceof RangeError || (e && e.apiMisuse)) throw e; + cst.root = totalNet(e); + } }, visit(cst, fns) { chk(cst); activate(d); return visitCore(cst.root, fns); }, tree: view, diff --git a/src/gen-cst-match.ts b/src/gen-cst-match.ts index daa50ff..a2dca89 100644 --- a/src/gen-cst-match.ts +++ b/src/gen-cst-match.ts @@ -23,6 +23,7 @@ // must be matched by exactly its rule's matcher, consuming all children. import type { CstGrammar, PrecOperator, RuleDecl, RuleExpr } from './types.ts'; import { isKeywordLiteral } from './grammar-utils.ts'; +import { withAwaitYield } from './await-yield-fork.ts'; // ── Arm step plan ── @@ -74,6 +75,10 @@ function sanitizeIdent(s: string): string { const J = (v: unknown) => JSON.stringify(v); export function generateCstMatch(grammar: CstGrammar, importFrom: string): string { + // Same [Await]/[Yield] fork the parsers apply, so the rule-id space (ruleIdOf) + // agrees with the tree. Matchers/types are emitted for BASE rules only (a fork + // collapses to its base via RULE_CANON); no-op without ctx markers. + grammar = withAwaitYield(grammar); const tokenNames = new Set(grammar.tokens.map(t => t.name)); const templateTokenNames = new Set(grammar.tokens.filter(t => t.template).map(t => t.name)); const ruleNames = new Set(grammar.rules.map(r => r.name)); @@ -85,6 +90,15 @@ export function generateCstMatch(grammar: CstGrammar, importFrom: string): strin { let next = 5; for (const t of grammar.tokens) if (!typeKind.has(t.name)) typeKind.set(t.name, next++); } const ruleId = new Map(grammar.rules.map((r, i) => [r.name, i])); ruleId.set('$template', grammar.rules.length); + // canon rid per rid: a fork collapses to its base; everything else is itself. The + // emitted __nodeOf / dispatch switches canonicalize the CHILD's ruleIdOf through + // this before comparing to the (base) rid a base matcher expects. + const ruleCanon = grammar.rules.map(r => ruleId.get(r.canon ?? r.name)!); + ruleCanon.push(grammar.rules.length, grammar.rules.length + 1, grammar.rules.length + 2); // $template/$error/$missing = self + // canon rid for a rule NAME: an arm that (after the fork) references a fork rule + // (Param$A) is matched against the BASE rid, since the child's ruleIdOf is also + // canonicalized to base in __nodeOf / the dispatch switches. + const cid = (name: string) => ruleCanon[ruleId.get(name)!]; // Pratt / leftRec classification (mirrors the engines' classifyAlts/classifyLeftRec: @@ -112,7 +126,11 @@ export function generateCstMatch(grammar: CstGrammar, importFrom: string): strin }; for (const alt of alts) { - const items = alt.type === 'seq' ? alt.items : [alt]; + const rawItems = alt.type === 'seq' ? alt.items : [alt]; + // A leading `notLeftLeaf(...)` head-leaf guard sits BEFORE the self `$` of a LED arm and is + // zero-width — drop it so the self-ref classification and the step plan match the parser's + // LED node shape (`[leftNode, …]`), exactly as the parsers' classifyAlts strips it. + const items = rawItems[0]?.type === 'notLeftLeaf' ? rawItems.slice(1) : rawItems; // Pratt op-form marker alts are covered by the synthesized op arms below. if (items.some(it => it.type === 'op' || it.type === 'prefix' || it.type === 'postfix')) continue; @@ -177,7 +195,7 @@ export function generateCstMatch(grammar: CstGrammar, importFrom: string): strin return isKeywordLiteral(v) ? v : (PUNCT_NAMES[v] ?? 'p' + [...v].map(c => c.charCodeAt(0)).join('_')); } if (first.type === 'ref') return lowerFirst(first.name); - if (first.type === 'not' || first.type === 'sameLine' || first.type === 'noCommentBefore' || first.type === 'noMultilineFlowBefore') { + if (first.type === 'not' || first.type === 'sameLine' || first.type === 'noCommentBefore' || first.type === 'noMultilineFlowBefore' || first.type === 'notLeftLeaf') { return nameFrom(items.slice(1), fuel - 1); // zero-width: name by what follows } if (first.type === 'alt') { @@ -194,7 +212,7 @@ export function generateCstMatch(grammar: CstGrammar, importFrom: string): strin // (inside opt → 'opt', inside many/sep → 'many') applied to captures. function pushSteps(steps: Step[], it: RuleExpr, captures: Capture[], used: Set, card: Card): void { switch (it.type) { - case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': + case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': case 'notLeftLeaf': return; // zero-width: no children case 'literal': steps.push({ kind: 'lit', text: it.value, tt: ttOf(it.value) }); @@ -327,16 +345,47 @@ export function generateCstMatch(grammar: CstGrammar, importFrom: string): strin return fn; } + // Minimum children the steps WILL consume (a lower bound): a required single-child + // step counts 1, optionals / loops 0, a branches the minimum over its branches. A + // greedy loop or optional must leave at least this many children for the steps that + // follow it — otherwise it can swallow a child a required suffix step needs (the + // parser avoided that with a zero-width guard, e.g. a Modifier's not() lookahead, + // which the CST does not record). The destructurer reconstructs the bound + // structurally: capping a greedy run at cc-suffixMin never cuts below the parser's + // actual count (count + suffix-consumed = cc, suffix-consumed >= suffixMin, so + // count <= cc-suffixMin), so it is a no-op except where greedy would over-consume. + function minKids(steps: Step[]): number { + let m = 0; + for (const s of steps) { + switch (s.kind) { + case 'lit': case 'litAlt': case 'tok': case 'node': m += 1; break; + case 'opt': if (s.min1) m += minKids(s.body); break; + case 'many': case 'sep': break; + case 'branches': { + let bm = Infinity; + for (const b of s.branches) bm = Math.min(bm, b.steps.length === 0 ? 0 : minKids(b.steps)); + if (bm !== Infinity) m += bm; + break; + } + } + } + return m; + } + // Render steps; `onFail(line)` returns the failure statement for this context. - function renderSteps(steps: Step[], w: (s: string) => void, ind: string, fail: () => string): void { - for (const st of steps) renderStep(st, w, ind, fail); + // `outerMin` = minimum children the steps AFTER this list (in the enclosing context) + // will consume; threaded so a loop's room check spans nesting boundaries. + function renderSteps(steps: Step[], w: (s: string) => void, ind: string, fail: () => string, outerMin = 0): void { + for (let k = 0; k < steps.length; k++) { + renderStep(steps[k], w, ind, fail, minKids(steps.slice(k + 1)) + outerMin); + } } function litCond(text: string, tt: string): string { return `__lit(t, cc, tb, i, src, ${J(text)}, ${tt === '$keyword' ? 1 : 0})`; } - function renderStep(st: Step, w: (s: string) => void, ind: string, fail: () => string): void { + function renderStep(st: Step, w: (s: string) => void, ind: string, fail: () => string, suffixMin: number): void { switch (st.kind) { case 'lit': w(`${ind}if (!${litCond(st.text, st.tt)}) ${fail()}`); @@ -354,7 +403,7 @@ export function generateCstMatch(grammar: CstGrammar, importFrom: string): strin const cond = st.name === '$operator' ? `__opTok(t, cc, i)` : st.template - ? `__tok(t, cc, tb, i, ${typeKind.get(st.name)}) || __nodeOf(t, cc, i, ${ruleId.get('$template')})` + ? `__tok(t, cc, tb, i, ${typeKind.get(st.name)}) || __nodeOf(t, cc, i, ${cid('$template')})` : `__tok(t, cc, tb, i, ${typeKind.get(st.name)})`; w(`${ind}if (!(${cond})) ${fail()}`); if (st.cap) assign(st.cap, `__SC[i] as ${st.cap.tsType}`, w, ind); @@ -362,7 +411,7 @@ export function generateCstMatch(grammar: CstGrammar, importFrom: string): strin return; } case 'node': - w(`${ind}if (!__nodeOf(t, cc, i, ${ruleId.get(st.rule)})) ${fail()}`); + w(`${ind}if (!__nodeOf(t, cc, i, ${cid(st.rule)})) ${fail()}`); if (st.cap) assign(st.cap, `__SC[i] as ${st.cap.tsType}`, w, ind); w(`${ind}i++;`); return; @@ -370,10 +419,13 @@ export function generateCstMatch(grammar: CstGrammar, importFrom: string): strin const save = tmp(); const ok = tmp(); const lbl = tmp().replace('_t', '_b'); - w(`${ind}{`); + // a NON-required optional must not consume a child the required suffix needs + // (the min1 first iteration is required and always attempts — the grammar + // guarantees a real element exists or the parser would have rejected) + w(st.min1 ? `${ind}{` : `${ind}if (cc - i > ${suffixMin}) {`); w(`${ind} const ${save} = i; let ${ok} = true;`); w(`${ind} ${lbl}: {`); - renderSteps(st.body, w, ind + ' ', () => `{ ${ok} = false; break ${lbl}; }`); + renderSteps(st.body, w, ind + ' ', () => `{ ${ok} = false; break ${lbl}; }`, suffixMin); w(`${ind} }`); if (st.min1) w(`${ind} if (!${ok}) ${fail()}`); else w(`${ind} if (!${ok}) i = ${save};`); @@ -385,9 +437,10 @@ export function generateCstMatch(grammar: CstGrammar, importFrom: string): strin const ok = tmp(); const lbl = tmp().replace('_t', '_b'); w(`${ind}for (;;) {`); + if (suffixMin > 0) w(`${ind} if (cc - i <= ${suffixMin}) break;`); // leave children for the required suffix w(`${ind} const ${save} = i; let ${ok} = true;`); w(`${ind} ${lbl}: {`); - renderSteps(st.body, w, ind + ' ', () => `{ ${ok} = false; break ${lbl}; }`); + renderSteps(st.body, w, ind + ' ', () => `{ ${ok} = false; break ${lbl}; }`, suffixMin); w(`${ind} }`); w(`${ind} if (!${ok}) { i = ${save}; break; }`); w(`${ind} if (i === ${save}) break;`); // zero-width body guard @@ -405,7 +458,7 @@ export function generateCstMatch(grammar: CstGrammar, importFrom: string): strin w(`${ind}{`); w(`${ind} const ${save} = i; let ${ok0} = true;`); w(`${ind} ${lbl0}: {`); - renderSteps(st.element, w, ind + ' ', () => `{ ${ok0} = false; break ${lbl0}; }`); + renderSteps(st.element, w, ind + ' ', () => `{ ${ok0} = false; break ${lbl0}; }`, suffixMin); w(`${ind} }`); w(`${ind} if (!${ok0}) { i = ${save}; }`); w(`${ind} else for (;;) {`); @@ -413,7 +466,7 @@ export function generateCstMatch(grammar: CstGrammar, importFrom: string): strin w(`${ind} i++;`); w(`${ind} const ${save}2 = i; let ${ok} = true;`); w(`${ind} ${lbl}: {`); - renderSteps(st.element, w, ind + ' ', () => `{ ${ok} = false; break ${lbl}; }`); + renderSteps(st.element, w, ind + ' ', () => `{ ${ok} = false; break ${lbl}; }`, suffixMin); w(`${ind} }`); w(`${ind} if (!${ok}) { i = ${save}2; break; }`); w(`${ind} }`); @@ -442,7 +495,7 @@ export function generateCstMatch(grammar: CstGrammar, importFrom: string): strin } w(`${ind} const ${save} = i; let ${ok} = true;`); w(`${ind} ${lbl}: {`); - renderSteps(renameCaps(b.steps, pfx), w, ind + ' ', () => `{ ${ok} = false; break ${lbl}; }`); + renderSteps(renameCaps(b.steps, pfx), w, ind + ' ', () => `{ ${ok} = false; break ${lbl}; }`, suffixMin); w(`${ind} }`); const fields = renamed.map(cp => `${cp.field}: ${cp.name}${cp.card === 'one' ? '!' : ''}`); w(`${ind} if (${ok}) { ${done} = true; ${assignExpr(st.cap, `{ branch: ${J(b.tag)}${fields.length ? ', ' + fields.join(', ') : ''} }`)} }`); @@ -540,6 +593,7 @@ export function generateCstMatch(grammar: CstGrammar, importFrom: string): strin const matcherMapEntries: string[] = []; for (const rule of grammar.rules) { + if (rule.canon) continue; // a fork collapses to its base matcher/type (RULE_CANON) const plans = buildArms(rule); const tName = matchTypeName(rule.name); const nName = nodeType(rule.name); @@ -606,9 +660,9 @@ export function generateCstMatch(grammar: CstGrammar, importFrom: string): strin lines.push(`${pad}if (cc < 2) {`); lines.push(...memberIdx.map((k, i) => (restAdmit[i] === null || restAdmit[i]!.canEmpty ? pad + ' ' + tryLine(k).trim() : '')).filter(Boolean)); lines.push(`${pad}} else if ((e1 = __SC[1]) >= 0) {`); - lines.push(`${pad} switch (t.ruleIdOf(e1)) {`); + lines.push(`${pad} switch (RULE_CANON[t.ruleIdOf(e1)]) {`); for (const r of [...nset].sort()) { - lines.push(`${pad} case ${ruleId.get(r)}: { // ${r}`); + lines.push(`${pad} case ${cid(r)}: { // ${r}`); lines.push(...subTry(i => restAdmit[i]!.keys.has('n:' + r)).map(l => ' ' + l)); lines.push(`${pad} break;`); lines.push(`${pad} }`); @@ -659,9 +713,9 @@ export function generateCstMatch(grammar: CstGrammar, importFrom: string): strin for (let k = 0; k < plans.length; k++) if (admits[k].canEmpty || admits[k].keys.size === 0) disp.push(tryLine(k)); disp.push(` } else { const e0 = __SC[0];`); disp.push(` if (e0 >= 0) {`); - disp.push(` switch (t.ruleIdOf(e0)) {`); + disp.push(` switch (RULE_CANON[t.ruleIdOf(e0)]) {`); for (const r of [...nodeRules].sort()) { - disp.push(` case ${ruleId.get(r)}: { // ${r}`); + disp.push(` case ${cid(r)}: { // ${r}`); const members = plans.map((_, k) => k).filter(k => admits[k].keys.size === 0 || admits[k].keys.has('n:' + r)); const concrete = members.filter(k => admits[k].keys.size !== 0); const oneStep = concrete.every(k => plans[k].steps[0]?.kind === 'node'); @@ -763,10 +817,13 @@ export function generateCstMatch(grammar: CstGrammar, importFrom: string): strin header.push(` const e = __SC[i];`); header.push(` return e < 0 && t.leafKindOf(e) === 2;`); header.push(`};`); + // canon rid table: a fork node's ruleIdOf maps to its base rid before any compare, + // so a base matcher accepts a forked child. Identity without ctx forks. + header.push(`const RULE_CANON = ${JSON.stringify(ruleCanon)};`); header.push(`const __nodeOf = (t: TreeAccess, cc: number, i: number, rid: number): boolean => {`); header.push(` if (i >= cc) return false;`); header.push(` const e = __SC[i];`); - header.push(` return e >= 0 && t.ruleIdOf(e) === rid;`); + header.push(` return e >= 0 && RULE_CANON[t.ruleIdOf(e)] === rid;`); header.push(`};`); header.push(``); @@ -778,7 +835,8 @@ export function generateCstMatch(grammar: CstGrammar, importFrom: string): strin `};`, `/** rule ID → matcher (the emitted parser's rowRule ids — declaration order). */`, `export const MATCHERS_BY_ID: ((t: TreeAccess, n: never, tb: number, src: string) => { arm: string })[] = [`, - ...grammar.rules.map(r => ` match${sanitizeIdent(r.name)},`), + // a fork's rid maps to its BASE matcher (forks emit no matcher of their own). + ...grammar.rules.map(r => ` match${sanitizeIdent(r.canon ?? r.name)},`), `];`, ]; diff --git a/src/gen-parser.ts b/src/gen-parser.ts index 66b09c2..54d669c 100644 --- a/src/gen-parser.ts +++ b/src/gen-parser.ts @@ -1,6 +1,7 @@ import type { CstGrammar, RuleExpr, RuleDecl } from './types.ts'; import { isKeywordLiteral } from './grammar-utils.ts'; import { createLexer, type Token } from './gen-lexer.ts'; +import { withAwaitYield } from './await-yield-fork.ts'; // ── CST output ── @@ -26,6 +27,7 @@ interface OpInfo { rbp: number; assoc: 'left' | 'right' | 'none'; position: 'infix' | 'prefix' | 'postfix'; + requireTarget?: boolean; } // ── Parser ── @@ -36,6 +38,9 @@ export function getText(node: { offset: number; end: number }, source: string): } export function createParser(grammar: CstGrammar) { + // [Await]/[Yield] fork — same rule-identity space as the emitted parser (no-op + // without ctx markers). Keeps the interp ≡ emit equivalence the gates compare. + grammar = withAwaitYield(grammar); const tokenNames = new Set(grammar.tokens.map(t => t.name)); // The lexer is a separate stage, built from the same grammar (token defs + lexer hints). @@ -119,6 +124,7 @@ export function createParser(grammar: CstGrammar) { rbp: level.assoc === 'right' ? bp - 1 : bp, assoc: level.assoc, position: 'prefix', + requireTarget: op.requireTarget, }); } else if (op.position === 'postfix') { postfixOpValues.add(op.value); @@ -127,11 +133,12 @@ export function createParser(grammar: CstGrammar) { rbp: 0, assoc: level.assoc, position: 'postfix', + requireTarget: op.requireTarget, }); } else { const lbp = bp; const rbp = level.assoc === 'right' ? bp - 1 : bp; - opTable.set(op.value, { lbp, rbp, assoc: level.assoc, position: 'infix' }); + opTable.set(op.value, { lbp, rbp, assoc: level.assoc, position: 'infix', requireTarget: op.requireTarget }); if (op.noUnaryLhs) noUnaryLhsOps.add(op.value); } } @@ -149,6 +156,25 @@ export function createParser(grammar: CstGrammar) { const lbp = lp.sameAs !== undefined ? op.lbp : op.lbp - 1; ledPrecByConnector.set(lp.connector, { lbp, rhsBp: lp.chainRhs ? lbp : null }); } + // Binary / relational / conditional connectors (the MIDDLE child of a `$ op $` LED) — + // a node with one at child[1] is not a LeftHandSideExpression, so not an assignment target + // (`a + b = c`, `a in b = c`). Ladder INFIX ops + alternative-form binary LEDs. + const binaryConnectors = new Set(); + for (const [v, info] of opTable) if (info.position === 'infix') binaryConnectors.add(v); + for (const k of ledPrecByConnector.keys()) binaryConnectors.add(k); + + // A `cap`-group NUD (an ArrowFunction — the lowest-precedence AssignmentExpression) + // parses only when minBp is LOOSER than the named connector's binding power; the value + // resolves from the ladder or the ledPrec table. See parsePratt for enforcement. + const connectorLbp = (connector: string): number => { + const op = opTable.get(connector); + if (op) return op.lbp; + const lp = ledPrecByConnector.get(connector); + if (lp) return lp.lbp; + throw new Error(`capExpr: connector ${JSON.stringify(connector)} is not a ladder operator or ledPrec connector`); + }; + const nudCapOf = (nud: RuleExpr): number | null => + nud.type === 'group' && nud.capBelow !== undefined ? connectorLbp(nud.capBelow) : null; // Classify rules: which use Pratt parsing const prattRules = new Set(); @@ -160,13 +186,18 @@ export function createParser(grammar: CstGrammar) { function classifyAlts(rule: RuleDecl) { const alts = rule.body.type === 'alt' ? rule.body.items : [rule.body]; const nuds: RuleExpr[] = []; - const leds: { expr: RuleExpr; items: RuleExpr[] }[] = []; + const leds: { expr: RuleExpr; items: RuleExpr[]; notLeftLeaf?: string[] }[] = []; for (const alt of alts) { const items = alt.type === 'seq' ? alt.items : [alt]; - if (items[0]?.type === 'ref' && items[0].name === rule.name) { + // A LED arm may carry a leading `notLeftLeaf(...)` head-leaf guard before the self `$` + // (`[notLeftLeaf('void',…), $, '.', Ident]`). Strip it into LED metadata; the self-ref is + // the next item and `led.items` is everything after it — identical to a plain LED. + const guard = items[0]?.type === 'notLeftLeaf' ? items[0].words : undefined; + const head = guard ? 1 : 0; + if (items[head]?.type === 'ref' && (items[head] as { name: string }).name === rule.name) { // Left-recursive: LED - leds.push({ expr: alt, items: items.slice(1) }); + leds.push({ expr: alt, items: items.slice(head + 1), notLeftLeaf: guard }); } else if (items.length >= 2 && items[0]?.type === 'prefix') { // prefix $ → NUD with prefix handling nuds.push(alt); @@ -182,16 +213,22 @@ export function createParser(grammar: CstGrammar) { const alts = rule.body.type === 'alt' ? rule.body.items : [rule.body]; const atoms: RuleExpr[] = []; const continuations: RuleExpr[][] = []; + const contNotLeftLeaf: (string[] | null)[] = []; for (const alt of alts) { const items = alt.type === 'seq' ? alt.items : [alt]; - if (items[0]?.type === 'ref' && items[0].name === rule.name) { - continuations.push(items.slice(1)); + // A continuation may carry a leading `notLeftLeaf(...)` head-leaf guard before the self `$`. + // Strip it into per-continuation metadata; the self-ref is the next item. + const guard = items[0]?.type === 'notLeftLeaf' ? items[0].words : undefined; + const head = guard ? 1 : 0; + if (items[head]?.type === 'ref' && (items[head] as { name: string }).name === rule.name) { + continuations.push(items.slice(head + 1)); + contNotLeftLeaf.push(guard ?? null); } else { atoms.push(alt); } } - return { atoms, continuations }; + return { atoms, continuations, contNotLeftLeaf }; } // ── Left recursion = a left-corner cycle ── @@ -262,7 +299,10 @@ export function createParser(grammar: CstGrammar) { // a standalone definition of "is this rule left-recursive". function peelsDirect(rule: RuleDecl, alt: RuleExpr): boolean { const items = itemsOf(alt); - return items[0]?.type === 'ref' && items[0].name === rule.name; + // A leading zero-width `notLeftLeaf(...)` head-leaf guard precedes the self `$` in a LED arm; + // the arm is still DIRECT left-recursion (the local Pratt transform peels it), so look past it. + const head = items[0]?.type === 'notLeftLeaf' ? 1 : 0; + return items[head]?.type === 'ref' && (items[head] as { name: string }).name === rule.name; } // The PURE left-corner edge map, over ALL alternatives (nothing pre-excluded). This is // the relation that DEFINES left recursion. @@ -365,6 +405,12 @@ export function createParser(grammar: CstGrammar) { ledPrecOf.set(led, lp); } } + // Per-LED notLeftLeaf head-leaf word set (object-keyed like ledFirst/ledPrecOf): the arm matches + // only when the LEFT node's outermost (head) leaf text is NOT in this set. + const ledNotLeftLeaf = new Map>(); + for (const { leds } of prattClassified.values()) { + for (const led of leds) if (led.notLeftLeaf) ledNotLeftLeaf.set(led, new Set(led.notLeftLeaf)); + } // The template token(s): the parser routes their tokens to the interpolation-aware // parseTemplateExpr path (the lexer owns producing them — see gen-lexer.ts). @@ -453,6 +499,12 @@ export function createParser(grammar: CstGrammar) { if (info) contMixfix.set(cont, info); } } + // Per-continuation notLeftLeaf head-leaf word set (object-keyed like contMixfix): the continuation + // matches only when the LEFT node's outermost (head) leaf text is NOT in this set. + const contNotLeftLeaf = new Map>(); + for (const { continuations, contNotLeftLeaf: words } of leftRecClassified.values()) { + continuations.forEach((cont, i) => { if (words[i]) contNotLeftLeaf.set(cont, new Set(words[i]!)); }); + } // ── Access-tail LEDs (closed under a postfix operator) ── // A postfix operator (`a++`) turns its operand into an "update expression" that @@ -506,7 +558,7 @@ export function createParser(grammar: CstGrammar) { const acc = new Set(); for (const item of e.items) { if (item.type === 'prefix') return null; // prefix op → any operator token: give up - if (item.type === 'op' || item.type === 'postfix' || item.type === 'not' || item.type === 'sameLine' || item.type === 'noCommentBefore' || item.type === 'noMultilineFlowBefore') continue; // non-consuming here + if (item.type === 'op' || item.type === 'postfix' || item.type === 'not' || item.type === 'sameLine' || item.type === 'noCommentBefore' || item.type === 'noMultilineFlowBefore' || item.type === 'notLeftLeaf') continue; // non-consuming here const f = exprFirst(item); if (f === null) return null; for (const k of f) acc.add(k); @@ -524,7 +576,7 @@ export function createParser(grammar: CstGrammar) { return acc; } case 'quantifier': case 'group': return exprFirst(e.body); - case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': return new Set(); // zero-width: contributes no FIRST tokens + case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': case 'notLeftLeaf': return new Set(); // zero-width: contributes no FIRST tokens case 'sep': return exprFirst(e.element); default: return null; } @@ -606,7 +658,7 @@ export function createParser(grammar: CstGrammar) { const acc = new Set(); for (let i = j; i < items.length; i++) { const item = items[i]; - if (item.type === 'not' || item.type === 'sameLine' || item.type === 'noCommentBefore' || item.type === 'noMultilineFlowBefore') continue; + if (item.type === 'not' || item.type === 'sameLine' || item.type === 'noCommentBefore' || item.type === 'noMultilineFlowBefore' || item.type === 'notLeftLeaf') continue; if (item.type === 'op' || item.type === 'postfix') { for (const k of secOpKeys) acc.add(k); return acc; } if (item.type === 'prefix') { for (const k of prefixOps.keys()) acc.add(k); return acc; } const f = exprFirst(item); @@ -619,7 +671,7 @@ export function createParser(grammar: CstGrammar) { function suffixNullable(items: RuleExpr[], j: number): boolean { for (let i = j; i < items.length; i++) { const item = items[i]; - if (item.type === 'not' || item.type === 'sameLine' || item.type === 'noCommentBefore' || item.type === 'noMultilineFlowBefore') continue; + if (item.type === 'not' || item.type === 'sameLine' || item.type === 'noCommentBefore' || item.type === 'noMultilineFlowBefore' || item.type === 'notLeftLeaf') continue; if (item.type === 'op' || item.type === 'prefix' || item.type === 'postfix') return false; if (!exprNullable(item)) return false; } @@ -637,7 +689,7 @@ export function createParser(grammar: CstGrammar) { const items = e.items; for (let i = 0; i < items.length; i++) { const item = items[i]; - if (item.type === 'not' || item.type === 'sameLine' || item.type === 'noCommentBefore' || item.type === 'noMultilineFlowBefore') continue; + if (item.type === 'not' || item.type === 'sameLine' || item.type === 'noCommentBefore' || item.type === 'noMultilineFlowBefore' || item.type === 'notLeftLeaf') continue; let isec: Sec; let itemNullable: boolean; if (item.type === 'op' || item.type === 'postfix' || item.type === 'prefix') { @@ -689,7 +741,7 @@ export function createParser(grammar: CstGrammar) { if (sec.len1) acc.add(e.delimiter); return { s: acc, len1: sec.len1 }; } - case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': + case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': case 'notLeftLeaf': return { s: new Set(), len1: false }; case 'op': case 'prefix': case 'postfix': return { s: new Set(), len1: true }; @@ -761,6 +813,11 @@ export function createParser(grammar: CstGrammar) { const tokens = tokenize(source); let pos = 0; let maxPos = 0; // farthest token index ever ADVANCED past (diagnostic; updated at the pos++ sites, mirroring the emitted engine so reject messages stay engine-identical) + // Cap-propagation flag (capExpr), mirrors the emitted engine: set true when a parsePratt + // call returns a CAPPED assignment-level expression (an ArrowFunction). An enclosing + // operator LED reads it right after parsing its RHS and refuses to continue (so the RHS of + // `a = () => {}` admits no trailing `||`/`?:` — it stays unconsumed and the parse rejects). + let _prattCapped = false; // Packrat memo for pratt/left-recursive rules (Expr, Type, …): cache the // parse result + end position by start position, so backtracking doesn't // re-parse the same rule at the same spot. Sound because those rules reset @@ -846,14 +903,19 @@ export function createParser(grammar: CstGrammar) { } if (tok.type === '$templateHead') { const children: CstChild[] = []; + const save = pos; if (++pos > maxPos) maxPos = pos; children.push({ tokenType: '$templateHead', offset: tok.offset, end: tok.offset + tok.text.length }); const interpRule = currentPrattContext ?? findExprRule(); + // a head COMMITS to the full chain: every substitution must hold an + // expression and every span must continue (middle) or close (tail) — an + // unterminated template is a parse failure, not a shorter match while (true) { const exprNode = parseRule(interpRule); - if (exprNode) children.push(exprNode); + if (!exprNode) { pos = save; return null; } + children.push(exprNode); const next = peek(); - if (!next) break; + if (!next) { pos = save; return null; } if (next.type === '$templateMiddle') { if (++pos > maxPos) maxPos = pos; children.push({ tokenType: '$templateMiddle', offset: next.offset, end: next.offset + next.text.length }); @@ -864,10 +926,11 @@ export function createParser(grammar: CstGrammar) { children.push({ tokenType: '$templateTail', offset: next.offset, end: next.offset + next.text.length }); break; } - break; + pos = save; + return null; } - const startOff = children.length > 0 ? childOffset(children[0]) : offset(); - const endOff = children.length > 0 ? childEnd(children[children.length - 1]) : offset(); + const startOff = childOffset(children[0]); + const endOff = childEnd(children[children.length - 1]); return { rule: '$template', children, offset: startOff, end: endOff }; } return null; @@ -950,7 +1013,7 @@ export function createParser(grammar: CstGrammar) { if (children !== null && pos > bestPos) { const startOff = children.length > 0 ? childOffset(children[0]) : offset(); const endOff = children.length > 0 ? childEnd(children[children.length - 1]) : offset(); - bestNode = { rule: rule.name, children, offset: startOff, end: endOff }; + bestNode = { rule: (rule.canon ?? rule.name), children, offset: startOff, end: endOff }; bestPos = pos; } } @@ -978,7 +1041,7 @@ export function createParser(grammar: CstGrammar) { if (children !== null && pos > bestAtomPos) { const startOff = children.length > 0 ? childOffset(children[0]) : offset(); const endOff = children.length > 0 ? childEnd(children[children.length - 1]) : offset(); - node = { rule: rule.name, children, offset: startOff, end: endOff }; + node = { rule: (rule.canon ?? rule.name), children, offset: startOff, end: endOff }; bestAtomPos = pos; } } @@ -989,6 +1052,10 @@ export function createParser(grammar: CstGrammar) { outer: while (true) { const contSaved = pos; for (const cont of continuations) { + // notLeftLeaf head-leaf gate: skip this continuation when the LEFT node's outermost (head) + // leaf text is in its word set (e.g. `void`/`null`/`this` can't be `.`-qualified as a type). + const nll = contNotLeftLeaf.get(cont); + if (nll !== undefined && nll.has(headLeafText(node))) continue; pos = contSaved; let children = matchSeq(cont); // Mixfix operand re-bind (same fix parsePratt uses): a continuation of the @@ -1002,7 +1069,7 @@ export function createParser(grammar: CstGrammar) { } if (children !== null) { node = { - rule: rule.name, + rule: (rule.canon ?? rule.name), children: [node, ...children], offset: node.offset, end: children.length > 0 ? childEnd(children[children.length - 1]) : node.end, @@ -1017,17 +1084,58 @@ export function createParser(grammar: CstGrammar) { return node; } + // Assignment-target shape test (ECMAScript AssignmentTargetType): a node is NOT a valid + // LHS target iff its outermost form is a prefix-op (prefix-unary OR prefix-update `++x`) + // — head child is an `$operator` leaf in prefixOps — or a postfix-update (`x++`) — tail + // child is an `$operator` leaf in postfixOpValues. A parenthesized cover / member / + // element / call / non-null (`!`) tail has no `$operator` leaf at head or tail → passes. + const notAssignTarget = (node: CstNode): boolean => { + const cs = node.children; + if (cs.length === 0) return false; + const head = cs[0]; + if (head && 'tokenType' in head && head.tokenType === '$operator' + && prefixOps.has(source.slice(head.offset, head.end))) return true; + const tail = cs[cs.length - 1]; + if (tail && 'tokenType' in tail && tail.tokenType === '$operator' + && postfixOpValues.has(source.slice(tail.offset, tail.end))) return true; + // a binary / relational / conditional expression (`a + b`, `a in b`, `a as T`) is not a + // LeftHandSideExpression: its MIDDLE child is a binary-connector leaf. Member `a.b` / + // element `a[b]` have a `$punct` leaf there, a paren cover has a NODE child → those pass. + if (cs.length >= 3) { const m = cs[1]; if (m && 'tokenType' in m && binaryConnectors.has(source.slice(m.offset, m.end))) return true; } + return false; + }; + + // Head-leaf TEXT of a node: descend the LEFTMOST-child spine to the OUTERMOST leaf and return + // its source text (the same head leaf `notAssignTarget` reads, generalized to recurse through + // child nodes). Drives the notLeftLeaf LED gate. A childless node returns '' (matches no word). + const headLeafText = (node: CstNode): string => { + let cur: CstChild = node; + while (!('tokenType' in cur)) { + if (cur.children.length === 0) return ''; + cur = cur.children[0]; + } + return source.slice(cur.offset, cur.end); + }; + // Pratt parser for rules with op/prefix/postfix function parsePratt(rule: RuleDecl, minBp: number): CstNode | null { const { nuds, leds } = prattClassified.get(rule.name)!; const saved = pos; + _prattCapped = false; // reset; set true only on a capped (arrow) return // NUD: parse atom or prefix (longest match) let lhs: CstNode | null = null; let bestNudPos = saved; + // True iff the winning NUD is a capped (assignment-level) expression — an + // ArrowFunction. Such a NUD admits no led; the led loop is skipped entirely. + let capped = false; const startTok = tokens[saved] ?? null; const startTok2 = (parseLimit >= 0 && saved + 1 >= parseLimit) ? null : (tokens[saved + 1] ?? null); for (const nud of nuds) { + // A capped NUD parses only at a minBp LOOSER than its cap: refused as the operand + // of any tighter operator (so `a || () => {}` rejects — `||`'s rhs minBp >= cap). + const capBp = nudCapOf(nud); + if (capBp !== null && minBp >= capBp) continue; if (!altMightStart(nud, startTok)) continue; if (!altMightSecond(nud, startTok2)) continue; pos = saved; @@ -1043,9 +1151,13 @@ export function createParser(grammar: CstGrammar) { if (++pos > maxPos) maxPos = pos; const opLeaf: CstLeaf = { tokenType: '$operator', offset: tok.offset, end: tok.offset + tok.text.length }; const rhs = parsePratt(rule, info.rbp); + // A target-requiring prefix (`++`/`--`) operand must be a LeftHandSideExpression + // (`++-x`, `++ ++x`, `++x--` are syntax errors). Fail hard like noUnaryLhs. + if (rhs && info.requireTarget && notAssignTarget(rhs)) return null; if (rhs && pos > bestNudPos) { - lhs = { rule: rule.name, children: [opLeaf, rhs], offset: opLeaf.offset, end: rhs.end }; + lhs = { rule: (rule.canon ?? rule.name), children: [opLeaf, rhs], offset: opLeaf.offset, end: rhs.end }; bestNudPos = pos; + capped = false; // a prefix NUD is never capped } } } @@ -1056,14 +1168,19 @@ export function createParser(grammar: CstGrammar) { if (children !== null && pos > bestNudPos) { const startOff = children.length > 0 ? childOffset(children[0]) : offset(); const endOff = children.length > 0 ? childEnd(children[children.length - 1]) : offset(); - lhs = { rule: rule.name, children, offset: startOff, end: endOff }; + lhs = { rule: (rule.canon ?? rule.name), children, offset: startOff, end: endOff }; bestNudPos = pos; + capped = capBp !== null; // the LONGEST match wins; record whether it is capped } } if (lhs) pos = bestNudPos; if (!lhs) { pos = saved; return null; } + // A capped NUD (assignment-level arrow) admits no led: return it as-is so a trailing + // tighter operator stays unconsumed and the enclosing parse rejects (`() => {} || a`). + if (capped) { _prattCapped = true; return lhs; } + // Once a postfix operator binds (`a++`), the operand is an update expression // that access tails (`[…]`, `.x`, `(…)`, ``, tagged template) can't extend. let tailClosed = false; @@ -1088,6 +1205,10 @@ export function createParser(grammar: CstGrammar) { // tight (`a == b ? c : d` mis-grouped as `a == (b ? c : d)`). const lp = ledPrecOf.get(led); if (lp !== undefined && lp.lbp <= minBp) continue; + // notLeftLeaf head-leaf gate: skip the arm when the LEFT node's outermost (head) leaf text + // is in the arm's word set (e.g. `void`/`null`/`this` can't be `.`-qualified as a type). + const nll = ledNotLeftLeaf.get(led); + if (nll !== undefined && 'children' in lhs && nll.has(headLeafText(lhs))) continue; if (!canStart(ledFirst.get(led), tok)) continue; // first-token dispatch for LED continuations pos = ledSaved; @@ -1114,7 +1235,7 @@ export function createParser(grammar: CstGrammar) { } if (children !== null) { lhs = { - rule: rule.name, + rule: (rule.canon ?? rule.name), children: [lhs, ...children], offset: lhs.offset, end: children.length > 0 ? childEnd(children[children.length - 1]) : lhs.end, @@ -1135,13 +1256,19 @@ export function createParser(grammar: CstGrammar) { if (info && info.lbp > minBp) { if (info.position === 'postfix') { if (!tailClosed) { // can't postfix an update expr (`a++ --`) + // A target-requiring postfix (`++`/`--`) operand must be a LeftHandSideExpression + // (`++x++`, `x++ ++` are syntax errors). Fail hard like noUnaryLhs. + if (info.requireTarget && 'children' in lhs && notAssignTarget(lhs)) return null; if (++pos > maxPos) maxPos = pos; const opLeaf: CstLeaf = { tokenType: '$operator', offset: tok.offset, end: tok.offset + tok.text.length }; - lhs = { rule: rule.name, children: [lhs, opLeaf], offset: lhs.offset, end: opLeaf.end }; + lhs = { rule: (rule.canon ?? rule.name), children: [lhs, opLeaf], offset: lhs.offset, end: opLeaf.end }; tailClosed = true; matched = true; } } else { + // A target-requiring infix (`=`/`+=`/…) needs a LeftHandSideExpression LEFT operand + // (`-x = 1`, `++x = 1`, `x++ = 1` are syntax errors). Fail hard like noUnaryLhs. + if (info.requireTarget && 'children' in lhs && notAssignTarget(lhs)) return null; // A `noUnaryLhs` op (e.g. `**`) may not take a bare unary-prefix expression // (`-x`, `typeof x` — a prefix-op node whose op is NOT also a postfix, i.e. // not an update `++`/`--`) as its LEFT operand. Fail the whole expression @@ -1159,8 +1286,15 @@ export function createParser(grammar: CstGrammar) { if (++pos > maxPos) maxPos = pos; const opLeaf: CstLeaf = { tokenType: '$operator', offset: tok.offset, end: tok.offset + tok.text.length }; const rhs = parsePratt(rule, info.rbp); + // CAP PROPAGATION: an operator whose RHS is a capped assignment-level expression + // (an ArrowFunction) is itself capped — it admits no further led, so a trailing + // `|| x` / `? :` stays unconsumed (`a = () => {} || x` rejects). `_prattCapped` is + // still true from the RHS, so an enclosing operator refuses it too (`b = a = arrow`). + if (rhs && _prattCapped) { + return { rule: (rule.canon ?? rule.name), children: [lhs, opLeaf, rhs], offset: lhs.offset, end: rhs.end }; + } if (rhs) { - lhs = { rule: rule.name, children: [lhs, opLeaf, rhs], offset: lhs.offset, end: rhs.end }; + lhs = { rule: (rule.canon ?? rule.name), children: [lhs, opLeaf, rhs], offset: lhs.offset, end: rhs.end }; matched = true; } else { pos = ledSaved; @@ -1252,6 +1386,11 @@ export function createParser(grammar: CstGrammar) { const tok = peek(); return tok && !tok.multilineFlowBefore ? [] : null; } + case 'notLeftLeaf': + // The head-leaf LED gate is applied in the Pratt LED loop (not here); the marker is + // stripped from the LED arm's items, so it never reaches here. As a leaf-position no-op + // it consumes nothing and succeeds (returns no children). + return []; case 'sep': return matchSep(expr.element, expr.delimiter); default: @@ -1484,13 +1623,27 @@ export function createParser(grammar: CstGrammar) { // API parity with the emitted engine's handle surface: edit() re-parses and // updates the SAME tree object in place (the handle is the document's tree — - // edit returns nothing, exactly like the emitted engine; no reuse here). - const edit = (cst: { rule: string; children: unknown[]; offset: number; end: number }, source: string): void => { - const next = parse(source) as typeof cst; + // edit returns nothing, exactly like the emitted engine; no reuse here), and + // both are TOTAL: input errors land in the errors field, never a throw. The + // interpreter has no recovery machinery, so an invalid text degrades to a + // zero-width $error root plus the strict diagnostic. + type Cst = { rule: string; children: unknown[]; offset: number; end: number; errors?: { offset: number; end: number; message: string }[] }; + const parseTotal = (source: string): Cst => { + try { + const t = parse(source) as Cst; + t.errors = []; + return t; + } catch (e) { + return { rule: '$error', children: [], offset: 0, end: 0, errors: [{ offset: 0, end: 0, message: (e as Error).message }] }; + } + }; + const edit = (cst: Cst, source: string): void => { + const next = parseTotal(source); cst.rule = next.rule; cst.children = next.children; cst.offset = next.offset; cst.end = next.end; + cst.errors = next.errors; }; - return { parse, edit, tokenize, profCounts }; + return { parse, parseTotal, edit, tokenize, profCounts }; } // ── Helpers ── diff --git a/src/gen-tm.ts b/src/gen-tm.ts index cbbf48b..3dad3e5 100644 --- a/src/gen-tm.ts +++ b/src/gen-tm.ts @@ -3151,10 +3151,10 @@ function detectDeclarations(grammar: CstGrammar, tokenNames: Set): DeclI nameIdx++; continue; } - // Zero-width guards (`not(...)` / `sameLine` / `noCommentBefore` / `noMultilineFlowBefore`) - // consume no token, so they can sit between the keyword and the name (e.g. `'type' not(reserved) - // Ident`) without changing the `keyword name` highlight pattern — skip past them. - if (item.type === 'not' || item.type === 'sameLine' || item.type === 'noCommentBefore' || item.type === 'noMultilineFlowBefore') { + // Zero-width guards (`not(...)` / `sameLine` / `noCommentBefore` / `noMultilineFlowBefore` / + // `notLeftLeaf(...)`) consume no token, so they can sit between the keyword and the name (e.g. + // `'type' not(reserved) Ident`) without changing the `keyword name` highlight pattern — skip past them. + if (item.type === 'not' || item.type === 'sameLine' || item.type === 'noCommentBefore' || item.type === 'noMultilineFlowBefore' || item.type === 'notLeftLeaf') { nameIdx++; continue; } @@ -4326,7 +4326,7 @@ function ruleIsNullable(e: RuleExpr, byName: Map, seen = new S case 'alt': return e.items.some(i => ruleIsNullable(i, byName, seen)); case 'quantifier': return e.kind === '*' || e.kind === '?'; case 'group': return ruleIsNullable(e.body, byName, seen); - case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': return true; // zero-width assertions + case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': case 'notLeftLeaf': return true; // zero-width assertions case 'ref': { if (seen.has(e.name)) return false; seen.add(e.name); const b = byName.get(e.name); return b ? ruleIsNullable(b, byName, seen) : false; } default: return false; // literal / token / op / prefix / postfix / sep } diff --git a/src/gen-treesitter.ts b/src/gen-treesitter.ts index 1c15f2f..f533016 100644 --- a/src/gen-treesitter.ts +++ b/src/gen-treesitter.ts @@ -223,7 +223,10 @@ function renderExpr(expr: RuleExpr, ctx: GrammarJsContext): string { return `repeat1(${body})`; } case 'group': - return renderExpr(expr.body, ctx); + // A tsRelax group carries a tree-sitter-only alternate rendering (a parser-strict + // constraint the highlighter relaxes — see RuleExpr.group.tsRelaxed). Render that + // instead of the strict body; every other consumer uses `body`. + return renderExpr(expr.tsRelaxed ?? expr.body, ctx); case 'not': // Zero-width negative lookahead: not expressible in a tree-sitter CFG, and // it consumes nothing, so it drops to a no-op (the surrounding choice keeps @@ -242,6 +245,11 @@ function renderExpr(expr: RuleExpr, ctx: GrammarJsContext): string { // Zero-width "preceding flow was single-line" assertion (YAML flow-as-block-key) — like // `noCommentBefore`, a scanner-level restriction; a no-op in the CFG. return 'blank()'; + case 'notLeftLeaf': + // Zero-width LEFT head-leaf guard — a left-operand predicate is not expressible in tree-sitter + // GLR; it consumes nothing, so it renders a no-op (the constrained LED is wrapped in tsRelax, + // so tree-sitter renders the UNCONSTRAINED `.` form and never reaches this case in practice). + return 'blank()'; case 'sep': { // sep(elem, ',') = optional(seq(elem, repeat(seq(',', elem)), optional(','))) // Trailing delimiter is allowed (matches the parser's matchSep behavior). @@ -520,6 +528,7 @@ function buildTokenBody(name: string, ctx: GrammarJsContext): string | null { */ const LR_CONFLICT_CLOSURE: string[][] = [ ['expr'], ['stmt'], ['stmt', 'decl'], ['expr', 'decl'], ['program', 'stmt'], + ['new_target'], // nested `new new Foo()` — NewTarget's recursive leading-`new` arm self-conflicts ['type', 'type_param'], ['type_param'], ['expr', 'param'], ['expr', 'new_target'], ['expr', 'block'], ['expr', 'member_name'], ['expr', 'prop'], ['member_name', 'stmt'], ['decl'], ['binding'], ['type'], ['type', 'typeof_ref'], ['type', 'param'], diff --git a/src/types.ts b/src/types.ts index fa335ae..c4b15e6 100644 --- a/src/types.ts +++ b/src/types.ts @@ -375,6 +375,11 @@ export interface PrecOperator { value: string; position: 'infix' | 'prefix' | 'postfix'; noUnaryLhs?: boolean; // infix op whose left operand may not be a bare unary-prefix expression (e.g. JS `**`) + // Operator whose left operand (infix) / operand (postfix) must be a valid assignment + // target (LeftHandSideExpression) — NOT a prefix-unary, prefix-update, or postfix-update + // expression. ECMAScript AssignmentTargetType, enforced at parse time (JS `=`/`+=`/…, + // postfix `++`/`--`). A parenthesized cover or member/element/call/non-null tail passes. + requireTarget?: boolean; } export interface PrecLevel { @@ -402,14 +407,34 @@ export type RuleExpr = | { type: 'literal'; value: string } | { type: 'ref'; name: string } | { type: 'quantifier'; body: RuleExpr; kind: '*' | '+' | '?' } - | { type: 'group'; body: RuleExpr; suppress?: string[] } // suppress: LED connectors disabled while parsing body (e.g. no-`in`) + // `ctxMode` marks a subtree as [Await]/[Yield] context (the spec's grammar parameter): + // the await-yield-fork build transform reads it to name-fork the body-reachable rule + // closure into $A/$Y/$AY families. Every OTHER consumer treats this exactly like a + // plain transparent group (recurse into `body`), so the marker is invisible outside + // the fork transform. + // `tsRelaxed`: a TREE-SITTER-ONLY alternate rendering. The parser (and every other + // generator) uses `body` — the strict form; gen-treesitter renders `tsRelaxed` instead. + // Lets a PARSER-only constraint that is correct but tree-sitter-GLR-hostile (e.g. + // at-most-one-`static`, or restricting a type predicate to return position) keep the + // derived highlighter at its cheap status-quo shape — a highlighter may over-accept a + // rare malformed form harmlessly. Like every group field, it is transparent (no node). + // capBelow: this NUD alternative is a complete assignment-level expression (an + // ArrowFunction — the LOWEST-precedence ECMAScript AssignmentExpression). It may be + // parsed only when the enclosing Pratt minBp is LOOSER than the named connector's + // binding power, and once parsed admits NO led (a tighter operator can neither take it + // as an operand nor continue it). Read only by the expression-engine Pratt core. + | { type: 'group'; body: RuleExpr; suppress?: string[]; ctxMode?: 'await' | 'yield' | 'asyncgen' | 'reset'; tsRelaxed?: RuleExpr; capBelow?: string } // suppress: LED connectors disabled while parsing body (e.g. no-`in`) // Zero-width negative lookahead: matches (consuming nothing) iff `body` does // NOT match at the current position. Used to express disambiguations the // longest-match parser can't reach by structure alone (e.g. a `<…>` type-arg // list in expression position is only a bare instantiation when it isn't // followed by something that starts an expression). Non-consuming → invisible // to highlighting / AST shape / other generators. - | { type: 'not'; body: RuleExpr } + // `reservable`: this is the bare-identifier reserved-word guard (notReservedExpr). + // The await-yield-fork transform, when cloning a rule into the $A/$Y/$AY family, + // adds that family's context keyword(s) to the inner alt — so `await`/`yield` lose + // their identifier reading inside an async/generator body. Invisible elsewhere. + | { type: 'not'; body: RuleExpr; reservable?: boolean } // Zero-width "no LineTerminator here" assertion: matches (consuming nothing) // iff the NEXT token is on the same line (no preceding newline). Encodes // ECMAScript/TS restricted productions like an array/indexed-access type's `[`, @@ -427,6 +452,18 @@ export type RuleExpr = // (`[flow]: v` is a key, `[23\n]: v` is not). Like `noCommentBefore`, non-consuming → invisible // to other generators (a no-op marker). | { type: 'noMultilineFlowBefore' } + // Zero-width LEFT-operand head-leaf guard for a Pratt LED arm (it sits at the HEAD of a LED + // alternative, before the self `$`). It gates the arm on the LEFT node's OUTERMOST (head) leaf + // token TEXT: when that text is in `words`, the LED arm is treated as NOT-matched (skipped), so + // the connector rebinds to nothing and the parse rejects. Encodes TS's rule that a qualified type + // name `A.B` has an IdentifierReference root — the keyword/literal types `void`/`null`/`true`/ + // `false`/`this` are NOT qualifiable (`void.x` has no parse tree). It mirrors the AssignmentTargetType + // gate (`_notTarget`) which reads the same head leaf, but predicated on TEXT membership rather than + // operator-tag shape. Like the other zero-width markers it consumes nothing → invisible to every + // generator (a no-op in the CFG): gen-treesitter renders it `blank()` and drops it from the seq, + // so the derived GLR grammar keeps the UNCONSTRAINED `.` LED (a left-leaf predicate is not + // expressible in GLR, and a stray `void.x` is harmless for a highlighter) — no tsRelax needed. + | { type: 'notLeftLeaf'; words: string[] } | { type: 'sep'; element: RuleExpr; delimiter: string } | { type: 'op' } | { type: 'prefix' } @@ -436,6 +473,12 @@ export interface RuleDecl { name: string; body: RuleExpr; flags: string[]; + // Set by the await-yield-fork transform on a generated [Await]/[Yield] family clone: + // the BASE rule name this fork collapses to for every DERIVED artifact (green-node + // type, AST type union, TM scope, tree-sitter rule, cst-match dispatch). The emitted + // parser keeps the distinct `name` for its memo/adoption rule identity, but reports + // `canon` as the node's rule name so trees stay byte-identical to the base grammar. + canon?: string; } export interface CstGrammar { diff --git a/test/check.ts b/test/check.ts index 8754566..defd54c 100644 --- a/test/check.ts +++ b/test/check.ts @@ -11,7 +11,8 @@ // Run: node test/check.ts # all gates // node test/check.ts yaml # only gates whose group/name contains "yaml" // ───────────────────────────────────────────────────────────────────────────── -import { execFileSync } from 'node:child_process'; +import { execFile } from 'node:child_process'; +import { cpus } from 'node:os'; interface Gate { group: string; name: string; args: string[] } const GATES: Gate[] = [ @@ -23,6 +24,9 @@ const GATES: Gate[] = [ { group: 'core', name: 'cst-match-totality', args: ['test/cst-match-totality.ts'] }, { group: 'core', name: 'incremental-verify', args: ['test/incremental-verify.ts'] }, { group: 'core', name: 'multi-doc', args: ['test/multi-doc.ts'] }, + { group: 'core', name: 'recovery', args: ['test/recovery.ts'] }, + { group: 'core', name: 'incremental-grammars', args: ['test/incremental-grammars.ts'] }, + { group: 'core', name: 'exhaustive-edits', args: ['test/exhaustive-edits.ts'] }, { group: 'core', name: 'issue-cases', args: ['test/test-issues.ts'] }, { group: 'conformance', name: 'js', args: ['test/js-conformance.ts'] }, { group: 'conformance', name: 'tsx', args: ['test/tsx-conformance.ts'] }, @@ -55,23 +59,39 @@ if (!gates.length) { console.error(`no gate matches "${filter}"`); process.exit( const lastLine = (s: string): string => { const ls = s.trimEnd().split('\n').filter((l) => l.trim()); return ls.length ? ls[ls.length - 1].trim().slice(0, 70) : ''; }; interface Result { gate: Gate; ok: boolean; ms: number; summary: string; output: string } -const results: Result[] = []; -let curGroup = ''; -for (const gate of gates) { - if (gate.group !== curGroup) { curGroup = gate.group; process.stdout.write(`\n ${curGroup}\n`); } + +// Each gate is an independent subprocess (it re-emits its own parser and reads its own +// corpus), so they run CONCURRENTLY across a worker pool — the gates share no mutable +// state and write DISTINCT /tmp/emitted-*.mjs files, so parallelism is safe and turns the +// wall-clock from sum-of-gates into ~max(sum/pool, slowest-gate). Results stream as each +// finishes (completion order); the final summary is printed in gate order. +function run(gate: Gate): Promise { const t0 = Date.now(); - let ok = true, output = ''; - try { output = execFileSync('node', gate.args, { encoding: 'utf8', stdio: ['ignore', 'pipe', 'pipe'], maxBuffer: 64 * 1024 * 1024 }); } - catch (e: any) { ok = false; output = (e.stdout ?? '') + (e.stderr ?? ''); } - const ms = Date.now() - t0; - const summary = lastLine(output); - results.push({ gate, ok, ms, summary, output }); - process.stdout.write(` ${ok ? '✓' : '✗'} ${gate.name.padEnd(22)} ${String(ms).padStart(6)}ms ${ok ? summary : ''}\n`); + return new Promise((resolve) => { + execFile('node', gate.args, { encoding: 'utf8', maxBuffer: 64 * 1024 * 1024 }, (err, stdout, stderr) => { + const output = (stdout ?? '') + (stderr ?? ''); + resolve({ gate, ok: !err, ms: Date.now() - t0, summary: lastLine(output), output }); + }); + }); +} + +const POOL = Math.max(2, cpus().length - 2); +const results: Result[] = []; +let next = 0; +async function worker(): Promise { + while (next < gates.length) { + const gate = gates[next++]; + const r = await run(gate); + results.push(r); + process.stdout.write(` ${r.ok ? '✓' : '✗'} ${(r.gate.group + '/' + r.gate.name).padEnd(34)} ${String(r.ms).padStart(6)}ms ${r.ok ? r.summary : ''}\n`); + } } +await Promise.all(Array.from({ length: Math.min(POOL, gates.length) }, worker)); -const failed = results.filter((r) => !r.ok); +const ordered = gates.map((g) => results.find((r) => r.gate === g)!); +const failed = ordered.filter((r) => !r.ok); console.log(`\n${'─'.repeat(70)}`); -console.log(` ${results.length - failed.length}/${results.length} gates pass` + (failed.length ? ` — FAILED: ${failed.map((f) => f.gate.name).join(', ')}` : ' ✓')); +console.log(` ${ordered.length - failed.length}/${ordered.length} gates pass` + (failed.length ? ` — FAILED: ${failed.map((f) => f.gate.name).join(', ')}` : ' ✓')); for (const f of failed) { console.log(`\n── ✗ ${f.gate.name} (node ${f.gate.args.join(' ')}) ──`); console.log(f.output.trimEnd().split('\n').slice(-25).join('\n')); diff --git a/test/exhaustive-edits.ts b/test/exhaustive-edits.ts new file mode 100644 index 0000000..5131132 --- /dev/null +++ b/test/exhaustive-edits.ts @@ -0,0 +1,74 @@ +// Gate: BOUNDED-EXHAUSTIVE edit/fresh equivalence. Over a small expression +// grammar, enumerate EVERY document up to N characters over the grammar's +// alphabet, and for each apply EVERY single-character edit (every deletion, +// every replacement, every insertion at every position). Each edited handle +// must be byte-identical — tree AND errors — to a fresh parse of the edited +// text. Unlike the generative gates this is complete within its bound: any +// equivalence bug reachable through small documents has a witness here. +// +// node --max-old-space-size=4096 test/exhaustive-edits.ts +import { writeFileSync } from 'node:fs'; +import { token, rule, defineGrammar, many, opt, sep, plus, oneOf, range, seq, star, noneOf } from '../src/api.ts'; +import { emitParser } from '../src/emit-parser.ts'; +import { objectify } from './emitted-obj.ts'; + +// A deliberately bracket-and-list-shaped grammar: parens force synthesis and +// paired-opener paths, ';' forces statement splits, '+' forces Pratt-free +// infix shapes through the seq machinery, idents and numbers collide at edits. +const Ident = token(plus(oneOf(range('a', 'b'))), { identifier: true }); +const Num = token(plus(oneOf(range('0', '1'))), {}); +const Expr = rule(($: unknown) => [ + Ident, + Num, + ['(', sep($, ','), ')'], + [$, '+', $], +]); +const Stmt = rule(() => [[Expr, ';']]); +const Program = rule(() => [[many(Stmt)]]); +const g = defineGrammar({ + name: 'mini', scopeName: 'source.mini', + tokens: { Ident, Num }, + rules: { Expr, Stmt, Program }, entry: Program, +}); + +const emPath = '/tmp/emitted-exhaustive.mjs'; +writeFileSync(emPath, emitParser(g)); +type Cst = { root: number; errors: object[] }; +type Parser = { parse(s: string): Cst; edit(c: Cst, e: object[]): void; visit(c: Cst, fns: object): void; tree: import('./emitted-obj.ts').TreeView }; +const em = (await import(emPath + '?v=' + process.pid)) as { createParser(): Parser }; + +const ALPHABET = ['a', '0', '(', ')', ',', '+', ';', ' ']; +const MAXLEN = Number(process.env.EXH_MAXLEN ?? 4); // ~330k steps; EXH_MAXLEN=5 for the 3.2M-step deep run + +const fresh = em.createParser(); +const edited = em.createParser(); +const H = (p: Parser, c: Cst) => JSON.stringify(objectify(p.tree, (fns) => p.visit(c, fns))) + JSON.stringify(c.errors); + +let docs = 0, edits = 0, mismatches = 0; +const docsAt: string[][] = [['']]; +for (let L = 1; L <= MAXLEN; L++) { + docsAt.push(docsAt[L - 1].flatMap(d => ALPHABET.map(ch => d + ch))); +} +for (let L = 0; L <= MAXLEN; L++) { + for (const base of docsAt[L]) { + docs++; + const variants: { start: number; end: number; text: string }[] = []; + for (let i = 0; i < base.length; i++) variants.push({ start: i, end: i + 1, text: '' }); // delete + for (let i = 0; i < base.length; i++) for (const ch of ALPHABET) if (ch !== base[i]) variants.push({ start: i, end: i + 1, text: ch }); // replace + for (let i = 0; i <= base.length; i++) for (const ch of ALPHABET) variants.push({ start: i, end: i, text: ch }); // insert + for (const v of variants) { + edits++; + const c = edited.parse(base); // re-open the handle on the base text + edited.edit(c, [v]); + const next = base.slice(0, v.start) + v.text + base.slice(v.end); + const fc = fresh.parse(next); + if (H(edited, c) !== H(fresh, fc)) { + mismatches++; + if (mismatches <= 10) console.log(` ✗ «${base}» + ${JSON.stringify(v)} → «${next}»`); + } + } + } +} +console.log(`exhaustive-edits: ${docs} documents ≤${MAXLEN} chars × every 1-char edit = ${edits} steps · ${mismatches} mismatches`); +if (mismatches > 0) { console.error('✗ edit ≢ fresh inside the exhaustive bound'); process.exit(1); } +console.log('✓ edit ≡ fresh holds COMPLETELY within the bound (tree + errors, byte-identical)'); diff --git a/test/grammar-gen.ts b/test/grammar-gen.ts index 80ac8f0..c3a8149 100644 --- a/test/grammar-gen.ts +++ b/test/grammar-gen.ts @@ -472,7 +472,7 @@ class Walker { case 'quantifier': return e.kind === '+' ? this.minExpand(e.body) : []; case 'group': return this.minExpand(e.body); case 'sep': return this.minExpand(e.element); - case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': + case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': case 'notLeftLeaf': case 'op': case 'prefix': case 'postfix': return []; } } @@ -571,7 +571,7 @@ class Walker { for (const b of el) { if (b.length * 2 + 1 <= MAX_EMS) { out.push([...b, { t: 'lit', value: e.delimiter }, ...b]); if (out.length >= cap) return out; } } return out; } - case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': + case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': case 'notLeftLeaf': case 'op': case 'prefix': case 'postfix': return [[]]; } } @@ -621,7 +621,7 @@ class Walker { for (let i = 0; i < reps; i++) { if (i) out.push({ t: 'lit', value: e.delimiter }); cappend(out, this.cover(e.element, budget - 1, ch)); } return out; } - case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': + case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': case 'notLeftLeaf': case 'op': case 'prefix': case 'postfix': return []; } } @@ -688,7 +688,7 @@ class Walker { case 'quantifier': { const out: Emission[] = []; for (const x of this.nestRec(e.body, target, nest, fuel, atTarget)) out.push(x); return out; } case 'group': return this.nestRec(e.body, target, nest, fuel, atTarget); case 'sep': { const out: Emission[] = []; for (const x of this.nestRec(e.element, target, nest, fuel, atTarget)) out.push(x); return out; } - case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': + case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': case 'notLeftLeaf': case 'op': case 'prefix': case 'postfix': return []; } } @@ -954,7 +954,7 @@ class Walker { case 'quantifier': return this.coverRec(e.body, tokenName, sampleText); // fire exactly one rep (it carries the token) case 'group': return this.coverRec(e.body, tokenName, sampleText); case 'sep': return this.coverRec(e.element, tokenName, sampleText); // one element (it carries the token) - case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': + case 'not': case 'sameLine': case 'noCommentBefore': case 'noMultilineFlowBefore': case 'notLeftLeaf': case 'op': case 'prefix': case 'postfix': return []; } } diff --git a/test/head-to-head.ts b/test/head-to-head.ts new file mode 100644 index 0000000..4613e67 --- /dev/null +++ b/test/head-to-head.ts @@ -0,0 +1,125 @@ +// Head-to-head bench: Monogram vs tsc (ts.updateSourceFile) vs official +// tree-sitter-typescript, on one large TypeScript document under the same +// single-character edit script: warm valid keystrokes, a paren-deleting +// BREAKING edit, while-broken typing, and the FIXING edit. +// +// Reproduce: +// git -C /tmp clone --depth 1 https://github.com/microsoft/TypeScript ts-repo # corpus file +// mkdir -p /tmp/tsbench && npm install --prefix /tmp/tsbench tree-sitter tree-sitter-typescript +// node test/head-to-head.ts +// +// Notes on fairness: every engine receives byte-identical edit sequences with +// positions recomputed from the current text; timers wrap ONLY the engine call +// (tree-sitter's line/col points are precomputed outside). tsc runs with +// setParentNodes=false; node-tree-sitter caps any input string at 32767 chars, +// so it reads through a 16KB chunk callback (its documented large-input path). +import { readFileSync } from 'node:fs'; +import { createRequire } from 'node:module'; +import { emitParser } from '../src/emit-parser.ts'; +import { writeFileSync } from 'node:fs'; +import ts from 'typescript'; + +const require = createRequire(import.meta.url); +const TS_BENCH = process.env.TSBENCH_DIR ?? '/tmp/tsbench'; +const CORPUS = process.env.H2H_FILE ?? '/tmp/ts-repo/tests/cases/unittests/matchFiles.ts'; +const TreeSitter = require(TS_BENCH + '/node_modules/tree-sitter'); +const TSLang = require(TS_BENCH + '/node_modules/tree-sitter-typescript').typescript; + +const grammar = (await import('../typescript.ts')).default; +const emPath = '/tmp/emitted-h2h.mjs'; +writeFileSync(emPath, emitParser(grammar)); +const { createParser } = await import(emPath + '?v=' + process.pid); + +const unit = readFileSync(CORPUS, 'utf-8'); +const BASE = unit.repeat(Math.ceil(9 * 1024 * 1024 / unit.length)); +console.log(`doc: ${(BASE.length / 1024 / 1024).toFixed(2)} MB TypeScript (${CORPUS})`); + +function posOf(text: string, off: number) { + let row = 0, last = -1; + for (let i = 0; i < off; i++) if (text.charCodeAt(i) === 10) { row++; last = i; } + return { row, column: off - last - 1 }; +} +const med = (xs: number[]) => xs.slice().sort((a, b) => a - b)[xs.length >> 1]; + +type Engine = { fresh(text: string): void; edit(text: string, start: number, end: number, ins: string): number; errors(): number }; + +function runScript(eng: Engine) { + let txt = BASE; + let t0 = performance.now(); + eng.fresh(txt); + const fresh = performance.now() - t0; + if (eng.errors() > 0) throw new Error('base doc reports errors'); + const apply = (start: number, end: number, ins: string) => { + const dt = eng.edit(txt, start, end, ins); + txt = txt.slice(0, start) + ins + txt.slice(end); + return dt; + }; + const identAt = txt.indexOf(' expected', Math.floor(txt.length / 4)) + 1; + const valid: number[] = []; + for (let i = 0; i < 5; i++) valid.push(apply(identAt + i, identAt + i, 'x')); + if (eng.errors() > 0) throw new Error('valid keystrokes broke the doc'); + const parenAt = txt.indexOf(');', Math.floor(txt.length * 0.75)); + const breaking = apply(parenAt, parenAt + 1, ''); + const breakErrs = eng.errors(); + const broken: number[] = []; + for (let i = 0; i < 10; i++) broken.push(apply(parenAt + i, parenAt + i, 'z')); + apply(parenAt, parenAt + 10, ''); + const fixing = apply(parenAt, parenAt, ')'); + return { fresh, valid: med(valid), breaking, broken: med(broken), fixing, breakErrs, fixErrs: eng.errors() }; +} + +const engines: Record = { + monogram: (() => { + const p = createParser(); + let c: { errors: unknown[] }; + return { + fresh(text: string) { c = p.parse(text); }, + edit(_text: string, start: number, end: number, ins: string) { + const t0 = performance.now(); + p.edit(c, [{ start, end, text: ins }]); + return performance.now() - t0; + }, + errors() { return c.errors.length; }, + }; + })(), + tsc: (() => { + let sf: ts.SourceFile; + return { + fresh(text: string) { sf = ts.createSourceFile('t.ts', text, ts.ScriptTarget.Latest, false, ts.ScriptKind.TS); }, + edit(text: string, start: number, end: number, ins: string) { + const newText = text.slice(0, start) + ins + text.slice(end); + const t0 = performance.now(); + sf = ts.updateSourceFile(sf, newText, { span: { start, length: end - start }, newLength: ins.length }); + return performance.now() - t0; + }, + errors() { return (sf as unknown as { parseDiagnostics: unknown[] }).parseDiagnostics.length; }, + }; + })(), + treesitter: (() => { + const p = new TreeSitter(); + p.setLanguage(TSLang); + let tree: ReturnType; + const CHUNK = 16 * 1024; + const input = (text: string) => (index: number) => (index < text.length ? text.slice(index, index + CHUNK) : null); + return { + fresh(text: string) { tree = p.parse(input(text)); }, + edit(text: string, start: number, end: number, ins: string) { + const newText = text.slice(0, start) + ins + text.slice(end); + const sp = posOf(text, start), oep = posOf(text, end), nep = posOf(newText, start + ins.length); + const t0 = performance.now(); + tree.edit({ startIndex: start, oldEndIndex: end, newEndIndex: start + ins.length, startPosition: sp, oldEndPosition: oep, newEndPosition: nep }); + tree = p.parse(input(newText), tree); + return performance.now() - t0; + }, + errors() { return tree.rootNode.hasError ? 1 : 0; }, + }; + })(), +}; + +const fmt = (x: number) => x.toFixed(2).padStart(8); +console.log('engine | fresh | valid✎ | breaking✎ | broken✎ | fixing✎ | errs(break/fix)'); +for (const [name, eng] of Object.entries(engines)) { + const r = runScript(eng); + console.log(`${name.padEnd(11)} | ${fmt(r.fresh)} | ${fmt(r.valid)} | ${fmt(r.breaking)} | ${fmt(r.broken)} | ${fmt(r.fixing)} | ${r.breakErrs}/${r.fixErrs}`); +} +console.log('(ms; ✎ = per single-character edit, median; node ' + process.version + ')'); diff --git a/test/incremental-grammars.ts b/test/incremental-grammars.ts new file mode 100644 index 0000000..34f8f0b --- /dev/null +++ b/test/incremental-grammars.ts @@ -0,0 +1,223 @@ +// Gate: INCREMENTAL ≡ FRESH for EVERY GRAMMAR — the incremental/recovery gates +// were TypeScript-only while all grammars share the same emitted runtime, so the +// non-TS incremental behavior (markup lexer modes, the fallback-lexer path, other +// token algebras) was ungated. Grammar-agnostic by construction: +// +// inputs come from the generative walker (grammar-gen), edit scripts are seeded +// char-level mutations, and every step checks THREE things on the handle API: +// 1. edited tree + errors ≡ a fresh handle parse of the same text (byte-equal) +// 2. tree SELF-CONSISTENCY: every leaf span lies inside all its ancestors' +// spans (the engine-internal invariant an external compare can miss when +// both sides share a corruption) +// 3. totality: no step may throw +// +// node test/incremental-grammars.ts +import { writeFileSync } from 'node:fs'; +import { emitParser } from '../src/emit-parser.ts'; +import { generateInputs } from './grammar-gen.ts'; +import { objectify } from './emitted-obj.ts'; + +type Edit = { start: number; end: number; text: string }; +type Diag = { offset: number; end: number; message: string }; +type Cst = { root: number; errors: Diag[] }; +type Parser = { parse(s: string): Cst; edit(cst: Cst, edits: Edit[]): void; visit(cst: Cst, fns: object): void; tree: import('./emitted-obj.ts').TreeView & { lenOf(id: number): number; leafOffsetOf(e: number, tb: number): number; leafEndOf(e: number, tb: number): number } }; +type Em = { createParser(): Parser }; + +const GRAMMARS = ['typescript', 'javascript', 'typescriptreact', 'javascriptreact', 'yaml', 'html', 'vue']; + +let seedState = 0x5EED1E55; +const rand = () => ((seedState = (seedState * 48271) % 0x7fffffff) / 0x7fffffff); +const randInt = (n: number) => Math.floor(rand() * n); +const INS = ['x', '1', ';', ' ', '"', '<', '>', '(', ')', '\n', '-', ':']; +function mutate(text: string): { next: string; edit: Edit } { + if (text.length === 0) { + const ins = INS[randInt(INS.length)]; + return { next: ins, edit: { start: 0, end: 0, text: ins } }; + } + switch (randInt(3)) { + case 0: { + const at = randInt(text.length); + const ins = INS[randInt(INS.length)]; + return { next: text.slice(0, at) + ins + text.slice(at), edit: { start: at, end: at, text: ins } }; + } + case 1: { + const at = randInt(Math.max(1, text.length - 4)); + const n = 1 + randInt(3); + const end = Math.min(text.length, at + n); + return { next: text.slice(0, at) + text.slice(end), edit: { start: at, end, text: '' } }; + } + default: { + const at = randInt(text.length); + return { next: text.slice(0, at) + 'z' + text.slice(at + 1), edit: { start: at, end: at + 1, text: 'z' } }; + } + } +} + +function selfConsistent(p: Parser, c: Cst): string | null { + const stack: [number, number][] = []; + let bad: string | null = null; + p.visit(c, { + enter(id: number, cb: number) { + const span: [number, number] = [cb, cb + p.tree.lenOf(id)]; + const top = stack[stack.length - 1]; + if (top !== undefined && (span[0] < top[0] || span[1] > top[1]) && bad === null) { + bad = `node span [${span[0]},${span[1]}) outside parent [${top[0]},${top[1]})`; + } + stack.push(span); + }, + leave() { stack.pop(); }, + leaf(e: number, tok: number) { + if (bad !== null) return; + const tb = tok - ((~e) >>> 2); + const lo = p.tree.leafOffsetOf(e, tb), hi = p.tree.leafEndOf(e, tb); + const top = stack[stack.length - 1]; + if (top !== undefined && (lo < top[0] || hi > top[1])) { + bad = `leaf span [${lo},${hi}) outside parent [${top[0]},${top[1]})`; + } + }, + }); + return bad; +} + +let totalSteps = 0, totalEqual = 0, totalErr = 0; +let fails = 0; +const failures: string[] = []; +for (const name of GRAMMARS) { + const grammar = (await import(`../${name}.ts`)).default; + const emPath = `/tmp/emitted-incr-${name}.mjs`; + writeFileSync(emPath, emitParser(grammar)); + const em = (await import(emPath + '?v=' + process.pid)) as Em; + const session = em.createParser(); + const fresh = em.createParser(); + + // a handful of generated documents per grammar, a short edit session on each + const inputs = generateInputs(grammar, { depth: 4, nestDepth: 4, cap: 5, fuzzRounds: 40, maxInputs: 24, seed: 11 }); + let docs = 0; + for (const input of inputs) { + if (input.text.length < 8) continue; + if (docs >= 8) break; + docs++; + let text = input.text; + let cst: Cst; + try { cst = session.parse(text); } catch (e) { + fails++; failures.push(`${name}: parse THREW on generated input: ${(e as Error).message.slice(0, 60)}`); + continue; + } + for (let k = 0; k < 12; k++) { + const { next, edit } = mutate(text); + totalSteps++; + if (process.env.TRACE && name === process.env.TRACE) console.log(` [${name} doc${docs} step${k}]`, JSON.stringify(edit).slice(0, 70), '→', JSON.stringify(next.slice(0, 40))); + let fc: Cst; + try { + session.edit(cst, [edit]); + fc = fresh.parse(next); + } catch (e) { + fails++; + if (failures.length < 10) failures.push(`${name} doc${docs} step${k}: THREW: ${(e as Error).message.slice(0, 80)}`); + break; + } + if (fc.errors.length > 0) totalErr++; + const a = JSON.stringify(objectify(fresh.tree, (fns) => fresh.visit(fc, fns))) + JSON.stringify(fc.errors); + const b = JSON.stringify(objectify(session.tree, (fns) => session.visit(cst, fns))) + JSON.stringify(cst.errors); + if (a !== b) { + fails++; + if (process.env.DUMP) { + console.log('DOC:', JSON.stringify(text)); + console.log('NEXT:', JSON.stringify(next)); + console.log('FRESH errors:', JSON.stringify(fc.errors)); + console.log('INC errors: ', JSON.stringify(cst.errors)); + } + if (process.env.DUMP_TREES) { + writeFileSync(`/tmp/incr-fresh-${name}-doc${docs}-step${k}.json`, JSON.stringify(JSON.parse(JSON.stringify(objectify(fresh.tree, (fns) => fresh.visit(fc, fns)))), null, 1)); + writeFileSync(`/tmp/incr-inc-${name}-doc${docs}-step${k}.json`, JSON.stringify(JSON.parse(JSON.stringify(objectify(session.tree, (fns) => session.visit(cst, fns)))), null, 1)); + console.log(`DUMP_TREES wrote /tmp/incr-{fresh,inc}-${name}-doc${docs}-step${k}.json (edit ${JSON.stringify(edit)})`); + } + if (failures.length < 10) { + let i = 0; while (i < a.length && a[i] === b[i]) i++; + failures.push(`${name} doc${docs} step${k}: edit ≠ fresh @${i} edit=${JSON.stringify(edit).slice(0, 60)}\n fresh: …${a.slice(Math.max(0, i - 40), i + 60)}…\n inc: …${b.slice(Math.max(0, i - 40), i + 60)}…`); + } + break; + } + const sc = selfConsistent(session, cst); + if (sc !== null) { + fails++; + if (failures.length < 10) failures.push(`${name} doc${docs} step${k}: SELF-INCONSISTENT: ${sc}`); + break; + } + totalEqual++; + text = next; + } + } +} + +// ── Targeted [Await]/[Yield] fork edit class ──────────────────────────────────────── +// Flipping `async`/`*` on an enclosing function changes the RULE IDENTITY of its body +// (Block -> Block$A / Block$Y / Block$AY) — exactly what the build-time name-fork must +// survive incrementally. A body row keys on its forked rid, so an async-toggle FAR from a +// body statement must re-parse the body under the new family rather than reuse a +// cross-family row, and a surgery-eligible in-body keystroke must re-run the body +// statement's rule (Stmt$A, …) with the right ambient context. The random mutator above +// only hits these by luck; this scripts them. Each step stays edit≡fresh + self-consistent. +const FORK_DOCS = [ + 'async function f(g) {\n let x = await g();\n return x;\n}\n', + 'function* gen() {\n yield 1;\n let y = 2;\n return y;\n}\n', + 'const h = async (a) => {\n await a;\n return a;\n};\n', + 'class C {\n async m() { await this.x; }\n *g() { yield 1; }\n plain() { let await = 1; return await; }\n}\n', + 'async function* ag() {\n yield await next();\n for (let i = 0; i < 3; i++) { await tick(); }\n}\n', +]; +// each op replaces the FIRST occurrence of `find` (skipped if absent in the current text) +const FORK_SCRIPT: [string, string][] = [ + ['async function', 'function'], // drop async: enclosing body Block$A -> Block + ['{\n let', '{\n let q = 0;\n let'], // surgery-path keystroke inside the now-sync body + ['function', 'async function'], // re-add async: body Block -> Block$A + ['function*', 'function'], // drop generator star: body Block$Y -> Block + ['async (a)', 'async (a, b)'], // edit an async arrow's parameter list + ['await ', 'await '], // touch an await operand site + ['yield 1', 'yield 1 + 1'], // edit a yield operand inside a generator body + ['async m()', 'm()'], // class: drop a method's async + ['*g()', 'g()'], // class: drop a method's generator star +]; +function replaceOnce(text: string, find: string, repl: string): { next: string; edit: Edit } | null { + const at = text.indexOf(find); + if (at < 0) return null; + return { next: text.slice(0, at) + repl + text.slice(at + find.length), edit: { start: at, end: at + find.length, text: repl } }; +} +for (const name of ['javascript', 'typescript']) { + const em = (await import(`/tmp/emitted-incr-${name}.mjs?v=` + process.pid)) as Em; + const session = em.createParser(); + const fresh = em.createParser(); + for (const doc of FORK_DOCS) { + let text = doc; + let cst: Cst; + try { cst = session.parse(text); } catch (e) { fails++; failures.push(`${name} fork-doc: parse THREW: ${(e as Error).message.slice(0, 60)}`); continue; } + for (const [find, repl] of FORK_SCRIPT) { + const m = replaceOnce(text, find, repl); + if (!m) continue; + totalSteps++; + let fc: Cst; + try { session.edit(cst, [m.edit]); fc = fresh.parse(m.next); } + catch (e) { fails++; if (failures.length < 10) failures.push(`${name} fork "${find}": THREW ${(e as Error).message.slice(0, 70)}`); break; } + if (fc.errors.length > 0) totalErr++; + const a = JSON.stringify(objectify(fresh.tree, (fns) => fresh.visit(fc, fns))) + JSON.stringify(fc.errors); + const b = JSON.stringify(objectify(session.tree, (fns) => session.visit(cst, fns))) + JSON.stringify(cst.errors); + if (a !== b) { + fails++; + let i = 0; while (i < a.length && a[i] === b[i]) i++; + if (failures.length < 10) failures.push(`${name} fork "${find}"->"${repl}": edit ≠ fresh @${i}\n fresh: …${a.slice(Math.max(0, i - 40), i + 60)}…\n inc: …${b.slice(Math.max(0, i - 40), i + 60)}…`); + break; + } + const sc = selfConsistent(session, cst); + if (sc !== null) { fails++; if (failures.length < 10) failures.push(`${name} fork "${find}": SELF-INCONSISTENT ${sc}`); break; } + totalEqual++; + text = m.next; + } + } +} + +console.log(`incremental-grammars: ${totalEqual}/${totalSteps} steps equal+consistent across ${GRAMMARS.length} grammars (${totalErr} recovered with errors)`); +for (const s of failures) console.log(' ✗ ' + s); +if (fails > 0) { + console.error('✗ cross-grammar incremental equivalence violated'); + process.exit(1); +} +console.log('✓ every grammar: edited re-parses byte-identical to fresh, trees self-consistent, no throws'); diff --git a/test/incremental-verify.ts b/test/incremental-verify.ts index 0178d84..361fdaa 100644 --- a/test/incremental-verify.ts +++ b/test/incremental-verify.ts @@ -14,7 +14,7 @@ const grammar = (await import('../typescript.ts')).default; const emPath = '/tmp/emitted-incremental.mjs'; writeFileSync(emPath, emitParser(grammar)); type Edit = { start: number; end: number; text: string }; -type Cst = { root: number }; +type Cst = { root: number; errors: { offset: number; end: number; message: string }[] }; type Parser = { parse(s: string): Cst; edit(cst: Cst, edits: Edit[]): void; @@ -28,14 +28,14 @@ type Em = { createParser(): Parser; }; const session = ((await import(emPath + '?session=' + process.pid)) as Em).createParser(); -const fresh = (await import(emPath + '?fresh=' + process.pid)) as Em; +const freshP = ((await import(emPath + '?fresh=' + process.pid)) as Em).createParser(); // Deterministic LCG so failures replay. let seedState = 0x2F6E2B1; const rand = () => ((seedState = (seedState * 48271) % 0x7fffffff) / 0x7fffffff); const randInt = (n: number) => Math.floor(rand() * n); -const INSERTS = ['x', '_v', '42', ' + y', '.m', '()', ' /*c*/ ', '"s"', 'await ', '!', '?']; +const INSERTS = ['x', '_v', '42', ' + y', '.m', '()', ' /*c*/ ', '"s"', 'await ', '!', '?', ';', '; ']; const STMTS = ['const q9 = 1;\n', 'function g9(a) { return a; }\n', 'if (x9) { y9(); }\n', '// note\n', 'type T9 = string | number;\n']; // Mutations return the edit RANGE too, so half the steps can exercise the edits @@ -97,6 +97,15 @@ function diffChange(a: string, b: string): Edit { } const GLUE: Array<[string, string]> = [ + // recovery-protocol pins (cross-grammar-gate finds): bar minting must be + // adoption-invariant — a pre-edit RECOVERY tree must not leak its probe reaches + // (frameMax exactness), its rows (surgery/adoption refusal), or its shape (the + // lex-recovered first run) into the edited re-parse + ['class za {" z', 'zlass za {" z'], + ['funtionzaaz( a z { }', 'funtiznzaaz( a z { }'], + ['function \\u{0} ( (aa ) { }', 'functionx \\u{0} ( (aa ) { }'], + ['const x = f(1, 2);', 'const x = f(1, 2;'], + ['function g() { return 1; }', 'function g() { return 1;'], ['const a = 1;\nconst b = 2;\n', 'const a = 1;\nconst bx = 2;\n'], ['let a = b; let c = 1;\n', 'let a = b1; let c = 1;\n'], ['if (a = b) { f(); }\n', 'if (a == b) { f(); }\n'], @@ -105,28 +114,26 @@ const GLUE: Array<[string, string]> = [ ['const t = a + b;\n', 'const t = a ++ b;\n'], ['const u = x(z);\n', 'const u = x>(z);\n'], ['f(a, b);\ng(c);\n', 'f(a, bc);\ng(c);\n'], + // expression-splitting ';' injections (structure breaks, not appended garbage) + ['const x = a + b;\n', 'const x = a; + b;\n'], + ['const y = (a + b) * c;\n', 'const y = (a +; b) * c;\n'], + ['const z = obj.m(1).n;\n', 'const z = obj.m(;1).n;\n'], ]; -let steps = 0, equal = 0, bothReject = 0, mismatch = 0; +let steps = 0, equal = 0, withErrors = 0, mismatch = 0; let tInc = 0, tFresh = 0; const failures: string[] = []; for (const [base, edited] of GLUE) { steps++; const c0 = session.parse(base); - let fe: string | null = null, ie: string | null = null; - let fr = -1; - try { fr = fresh.parse(edited); } catch (e) { fe = (e as Error).message; } - try { session.edit(c0, [diffChange(base, edited)]); } catch (e) { ie = (e as Error).message; } - if (fe !== null || ie !== null) { - if ((fe === null) !== (ie === null)) { mismatch++; if (failures.length < 5) failures.push(`glue «${edited.slice(0, 30)}»: fresh ${fe ? 'reject' : 'accept'} / incremental ${ie ? 'reject' : 'accept'}`); } - else bothReject++; - continue; - } - const a = JSON.stringify(objectify(fresh.tree, (fns) => fresh.visit(fr, fns))); - const b = JSON.stringify(objectify(session.tree, (fns) => session.visit(c0, fns))); + session.edit(c0, [diffChange(base, edited)]); + const fc = freshP.parse(edited); + if (fc.errors.length > 0) withErrors++; + const a = JSON.stringify(objectify(freshP.tree, (fns) => freshP.visit(fc, fns))) + JSON.stringify(fc.errors); + const b = JSON.stringify(objectify(session.tree, (fns) => session.visit(c0, fns))) + JSON.stringify(c0.errors); if (a === b) equal++; - else { mismatch++; if (failures.length < 5) failures.push(`glue «${edited.slice(0, 30)}»: tree diverges`); } + else { mismatch++; if (failures.length < 5) failures.push(`glue «${edited.slice(0, 30)}»: tree/errors diverge`); } } for (const f of FILES) { @@ -135,55 +142,31 @@ for (const f of FILES) { for (let k = 0; k < STEPS; k++) { const { next, edit } = mutate(text); steps++; - let freshRoot = -1, freshErr: string | null = null; + // parse/edit are TOTAL: syntax-breaking steps produce error trees compared + // exactly like valid ones (tree AND the errors field, byte-identical) const tf0 = performance.now(); - try { freshRoot = fresh.parse(next); } catch (e) { freshErr = (e as Error).message; } + const fc = freshP.parse(next); const tf1 = performance.now(); - let incErr: string | null = null; const ti0 = performance.now(); - try { session.edit(cst, [edit]); } catch (e) { incErr = (e as Error).message; } + session.edit(cst, [edit]); const ti1 = performance.now(); - if (freshErr !== null || incErr !== null) { - if ((freshErr === null) !== (incErr === null)) { - mismatch++; - if (failures.length < 5) failures.push(`${f.split('/').pop()} step ${k}: fresh ${freshErr ? 'reject' : 'accept'} / incremental ${incErr ? 'reject' : 'accept'}\n fresh: ${freshErr ?? '-'}\n inc: ${incErr ?? '-'}`); - } else bothReject++; - // REJECTED text: the handle stays on the previous tree, but the DOCUMENT - // advances (editor-buffer model — the buffer applied the change regardless, - // and the engine's docSrc tracks it). Model the editor's UNDO: revert via a - // diff edit in the rejected text's coordinates; it must be accepted and - // byte-identical to a fresh parse of the restored text. - try { - session.edit(cst, [diffChange(next, text)]); - const rfr = fresh.parse(text); - const ra = JSON.stringify(objectify(fresh.tree, (fns) => fresh.visit(rfr, fns))); - const rb = JSON.stringify(objectify(session.tree, (fns) => session.visit(cst, fns))); - if (ra !== rb) { - mismatch++; - if (failures.length < 5) failures.push(`${f.split('/').pop()} step ${k}: REVERT tree diverges`); - } - } catch (e2) { - mismatch++; - if (failures.length < 5) failures.push(`${f.split('/').pop()} step ${k}: revert rejected: ${(e2 as Error).message.slice(0, 50)}`); - } - continue; - } + if (fc.errors.length > 0) withErrors++; tFresh += tf1 - tf0; tInc += ti1 - ti0; - const a = JSON.stringify(objectify(fresh.tree, (fns) => fresh.visit(freshRoot, fns))); - const b = JSON.stringify(objectify(session.tree, (fns) => session.visit(cst, fns))); + const a = JSON.stringify(objectify(freshP.tree, (fns) => freshP.visit(fc, fns))) + JSON.stringify(fc.errors); + const b = JSON.stringify(objectify(session.tree, (fns) => session.visit(cst, fns))) + JSON.stringify(cst.errors); if (a === b) equal++; else { mismatch++; if (failures.length < 5) { let i = 0; while (i < a.length && i < b.length && a[i] === b[i]) i++; - failures.push(`${f.split('/').pop()} step ${k}: tree diverges @${i}\n fresh: …${a.slice(Math.max(0, i - 50), i + 50)}…\n inc: …${b.slice(Math.max(0, i - 50), i + 50)}…`); + failures.push(`${f.split('/').pop()} step ${k}: tree/errors diverge @${i}\n fresh: …${a.slice(Math.max(0, i - 50), i + 50)}…\n inc: …${b.slice(Math.max(0, i - 50), i + 50)}…`); } } text = next; } } -console.log(`incremental ≡ fresh: ${equal} equal · ${bothReject} both-reject · ${mismatch} MISMATCH (${steps} steps over ${FILES.length} files)`); +console.log(`incremental ≡ fresh: ${equal} equal (${withErrors} recovered with errors) · ${mismatch} MISMATCH (${steps} steps over ${FILES.length} files)`); if (tInc > 0) console.log(`time: incremental ${tInc.toFixed(1)}ms vs fresh ${tFresh.toFixed(1)}ms → ${(tFresh / tInc).toFixed(2)}× faster on accepted edits`); for (const s of failures) console.log(' ✗ ' + s); if (mismatch > 0) { diff --git a/test/multi-doc.ts b/test/multi-doc.ts index d980cbb..f5af760 100644 --- a/test/multi-doc.ts +++ b/test/multi-doc.ts @@ -1,22 +1,22 @@ -// Gate: DOCUMENTS ARE ISOLATED. The handle API (createParser → parse/edit with -// explicit tree handles) keeps one document's state per parser instance behind a -// lazily-swapped register set — a missed swap field shows up as cross-document -// corruption. Two instances edit two different sources interleaved (plus the -// module-level default-doc API mixed in between); every edited tree must be -// byte-identical (toObject) to a fresh parse of the same text. Also pins the -// handle contract: stale and foreign handles throw instead of silently reading -// an in-place-mutated tree, and a REJECTED edit leaves the old handle valid. +// Gate: DOCUMENTS ARE ISOLATED and the handle API is TOTAL. Each parser instance +// keeps one document's state behind a lazily-swapped register set — a missed swap +// field shows up as cross-document corruption. Two instances edit two different +// sources interleaved (with the module-level default-doc API mixed in between); +// every edited tree AND its errors field must be byte-identical to a fresh handle +// parse of the same text — syntax-breaking edits included (parse/edit never throw +// on input; the strict→recovering two-pass produces the error tree). Also pins the +// handle contract: in-place edits, API misuse throws, re-opening invalidates. // // node test/multi-doc.ts -import { objectify } from './emitted-obj.ts'; import { writeFileSync } from 'node:fs'; import { emitParser } from '../src/emit-parser.ts'; +import { objectify } from './emitted-obj.ts'; const grammar = (await import('../typescript.ts')).default; const emPath = '/tmp/emitted-multidoc.mjs'; writeFileSync(emPath, emitParser(grammar)); type Edit = { start: number; end: number; text: string }; -type Cst = { root: number }; +type Cst = { root: number; errors: { offset: number; end: number; message: string }[] }; type Parser = { parse(s: string): Cst; edit(cst: Cst, edits: Edit[]): void; visit(cst: Cst, fns: object): void; tree: import('./emitted-obj.ts').TreeView }; type Em = { parse(s: string): number; createParser(): Parser }; const em = (await import(emPath + '?v=' + process.pid)) as Em; @@ -33,7 +33,7 @@ let textB = `(function () {\n${mk('beta', 300)}})();\n`; let seed = 0x51C0FFEE; const rand = () => ((seed = (seed * 48271) % 0x7fffffff) / 0x7fffffff); const randInt = (n: number) => Math.floor(rand() * n); -const INS = ['x', '1', ' + q', '.m', '(/*c*/)', '"s"']; +const INS = ['x', '1', ' + q', '.m', '(/*c*/)', '"s"', ';']; function mutate(text: string): { next: string; edit: Edit } { switch (randInt(3)) { case 0: { @@ -53,115 +53,87 @@ function mutate(text: string): { next: string; edit: Edit } { } } -function diffChange(a: string, b: string): Edit { - const minL = Math.min(a.length, b.length); - let s = 0; - while (s < minL && a.charCodeAt(s) === b.charCodeAt(s)) s++; - let e = 0; - while (e < minL - s && a.charCodeAt(a.length - 1 - e) === b.charCodeAt(b.length - 1 - e)) e++; - return { start: s, end: a.length - e, text: b.slice(s, b.length - e) }; -} - const p1 = em.createParser(); const p2 = em.createParser(); const f = em.createParser(); -let cstA = p1.parse(textA); -let cstB = p2.parse(textB); +const cstA = p1.parse(textA); +const cstB = p2.parse(textB); -let steps = 0, equal = 0, bothReject = 0, mismatch = 0, reverts = 0; +let steps = 0, equal = 0, withErrors = 0, mismatch = 0; const failures: string[] = []; for (let k = 0; k < 60; k++) { const onA = (k & 1) === 0; const text = onA ? textA : textB; const { next, edit } = mutate(text); steps++; - let fe: string | null = null, ie: string | null = null; - let fc: Cst | null = null; - try { fc = f.parse(next); } catch (e) { fe = (e as Error).message; } - try { (onA ? p1 : p2).edit(onA ? cstA : cstB, [edit]); } catch (e) { ie = (e as Error).message; } - if (fe !== null || ie !== null) { - if ((fe === null) !== (ie === null)) { mismatch++; if (failures.length < 5) failures.push(`step ${k} (${onA ? 'A' : 'B'}): fresh ${fe ? 'reject' : 'accept'} / edit ${ie ? 'reject' : 'accept'}`); } - else bothReject++; - // the DOCUMENT advances on reject (editor-buffer model): later coordinates - // are against the rejected text. Model the editor's UNDO: revert to the last - // good text via a diff edit in the rejected text's coordinates — it must be - // ACCEPTED and byte-identical to a fresh parse (the post-reject recovery path - // gets exercised every time a mutation breaks the document). - const good = onA ? textA : textB; - const rv = diffChange(next, good); - try { - (onA ? p1 : p2).edit(onA ? cstA : cstB, [rv]); - const fb = f.parse(good); - const ra = JSON.stringify(objectify(f.tree, (fns) => f.visit(fb, fns))); - const qq = onA ? p1 : p2; - const rb = JSON.stringify(objectify(qq.tree, (fns) => qq.visit(onA ? cstA : cstB, fns))); - if (ra === rb) reverts++; - else { mismatch++; if (failures.length < 5) failures.push(`step ${k} (${onA ? 'A' : 'B'}): REVERT tree diverges`); } - } catch (e2) { - mismatch++; - if (failures.length < 5) failures.push(`step ${k} (${onA ? 'A' : 'B'}): revert rejected: ${(e2 as Error).message.slice(0, 50)}`); - } - continue; - } + // parse/edit are TOTAL: syntax-breaking steps produce error trees compared + // exactly like valid ones (tree AND the errors field, byte-identical) + const fc = f.parse(next); + (onA ? p1 : p2).edit(onA ? cstA : cstB, [edit]); + if (fc.errors.length > 0) withErrors++; // mix the module-level default doc in between: it must not disturb either instance if (k % 5 === 0) em.parse('const mix = ' + k + ';'); - const a = JSON.stringify(objectify(f.tree, (fns) => f.visit(fc!, fns))); + const a = JSON.stringify(objectify(f.tree, (fns) => f.visit(fc, fns))) + JSON.stringify(fc.errors); const q = onA ? p1 : p2; - const b = JSON.stringify(objectify(q.tree, (fns) => q.visit(onA ? cstA : cstB, fns))); + const b = JSON.stringify(objectify(q.tree, (fns) => q.visit(onA ? cstA : cstB, fns))) + JSON.stringify((onA ? cstA : cstB).errors); if (a === b) equal++; else { mismatch++; if (failures.length < 5) { let i = 0; while (i < a.length && a[i] === b[i]) i++; - failures.push(`step ${k} (${onA ? 'A' : 'B'}): tree diverges @${i}`); + failures.push(`step ${k} (${onA ? 'A' : 'B'}): tree/errors diverge @${i}`); } } if (onA) textA = next; else textB = next; } -// handle contract: edit mutates the handle IN PLACE (no return — no clone illusion); -// only parse() re-opening the document invalidates old handles; rejects keep the tree. +// handle contract: edit mutates the handle IN PLACE and is TOTAL — invalid text +// produces an error tree plus cst.errors, never a throw; API MISUSE (no changes, +// foreign handles, out-of-range coordinates) still throws; re-opening via parse() +// invalidates prior handles regardless of outcome. let contract = 0; { const p = em.createParser(); const c1 = p.parse('const a = 1;'); const obj = (h: Cst) => JSON.stringify(objectify(p.tree, (fns) => p.visit(h, fns))); - const before = obj(c1); + if (c1.errors.length === 0) contract++; + else failures.push('valid parse reported errors'); p.edit(c1, [{ start: 7, end: 7, text: 'b' }]); // 'const a = 1;' -> 'const ab = 1;' const after = obj(c1); - if (after !== before && after.includes('"end":8')) contract++; // same handle, new tree + if (after.includes('"end":8') && c1.errors.length === 0) contract++; // same handle, new tree else failures.push('in-place edit did not update the handle'); try { p2.edit(c1, [{ start: 0, end: 1, text: 'q' }]); failures.push('foreign handle did not throw'); } catch { contract++; } - let rejected = false; - try { p.edit(c1, [{ start: 6, end: 8, text: ']' }]); } catch { rejected = true; } // 'const ab…' -> 'const ] = 1;' - if (rejected && obj(c1) === after) contract++; // reject keeps the tree - else failures.push('reject-then-read flow broke'); - // coordinates after a REJECT are against the editor's buffer (the rejected text): - // fixing the same spot in those coordinates must recover the session - let recovered = false; - try { p.edit(c1, [{ start: 6, end: 7, text: 'ab' }]); recovered = true; } catch { /* must not throw */ } - if (recovered && obj(c1).includes('"end":13')) contract++; // 'const ] = 1;' -> 'const ab = 1;' - else failures.push('post-reject coordinates did not track the document text'); - const c2 = p.parse('let q = 1;'); - try { obj(c1); failures.push('re-opened document: old handle did not throw'); } catch { contract++; } - // missing ranges: ONE usage only — edit() without ranges must throw, not - // silently fall back to O(file) diff scans + // an INVALID edit is total: error tree + diagnostics, handle stays live + p.edit(c1, [{ start: 6, end: 8, text: ']' }]); // 'const ab…' -> 'const ] = 1;' + if (c1.errors.length > 0 && obj(c1) !== after) contract++; + else failures.push('invalid edit did not surface errors'); + // fixing it in the editor's coordinates drains the errors + p.edit(c1, [{ start: 6, end: 7, text: 'ab' }]); // -> 'const ab = 1;' + if (c1.errors.length === 0 && obj(c1) === after) contract++; + else failures.push('fixing edit did not drain errors'); + // misuse still throws let needsRanges = false; - try { (p as unknown as { edit(c: Cst): void }).edit(c2); } catch { needsRanges = true; } + try { (p as unknown as { edit(c: Cst): void }).edit(c1); } catch { needsRanges = true; } if (needsRanges) contract++; else failures.push('edit() without changes did not throw'); - // a REJECTING parse() resets the arena too — it must invalidate prior handles - try { p.parse('const ] = ;'); } catch { /* expected reject */ } + let oob = false; + try { p.edit(c1, [{ start: 5, end: 99999, text: '' }]); } catch { oob = true; } + if (oob) contract++; + else failures.push('out-of-range change did not throw'); + // a REJECTING-grammar parse() is total too, and re-opening kills old handles + const c2 = p.parse('const ] = ;'); + if (c2.errors.length > 0) contract++; + else failures.push('invalid parse() reported no errors'); let dead = false; - try { obj(c2); } catch { dead = true; } + try { obj(c1); } catch { dead = true; } if (dead) contract++; - else failures.push('rejecting parse() left the old handle readable over a reset arena'); + else failures.push('re-opened document: old handle did not throw'); } -console.log(`multi-doc: ${equal} equal · ${bothReject} both-reject (${reverts} reverts verified) · ${mismatch} MISMATCH (${steps} interleaved steps) · contract ${contract}/7`); +console.log(`multi-doc: ${equal} equal (${withErrors} recovered with errors) · ${mismatch} MISMATCH (${steps} interleaved steps) · contract ${contract}/9`); for (const s of failures) console.log(' ✗ ' + s); -if (mismatch > 0 || contract !== 7 || failures.length > 0) { +if (mismatch > 0 || contract !== 9 || failures.length > 0) { console.error('✗ document isolation / handle contract violated'); process.exit(1); } -console.log('✓ documents are isolated; handles enforce the in-place-edit contract'); +console.log('✓ documents are isolated; the total in-place handle contract holds'); diff --git a/test/recovery-conformance.ts b/test/recovery-conformance.ts new file mode 100644 index 0000000..8f1f28c --- /dev/null +++ b/test/recovery-conformance.ts @@ -0,0 +1,78 @@ +// Error-recovery conformance: on every single-file conformance test that tsc's +// PARSER rejects, compare Monogram's total-parse diagnostics against tsc's +// parseDiagnostics (the live source of the .errors.txt syntax baselines), +// BIDIRECTIONALLY: +// recall — tsc diagnostics with a Monogram diagnostic within ±SLACK chars +// precision — Monogram diagnostics with a tsc diagnostic within ±SLACK chars +// first — files where the FIRST error positions agree within ±SLACK +// Diagnostic positions are parser-policy choices (where to blame a missing +// token), so the slack absorbs token-boundary differences; the metric is about +// reporting the same BREAKAGES, not byte-equal spans. +// +// node --max-old-space-size=4096 test/recovery-conformance.ts +import { writeFileSync, readFileSync } from 'node:fs'; +import { readdir } from 'fs/promises'; +import { join } from 'path'; +import { emitParser } from '../src/emit-parser.ts'; +import ts from 'typescript'; + +const grammar = (await import('../typescript.ts')).default; +const emPath = '/tmp/emitted-recovery-conf.mjs'; +writeFileSync(emPath, emitParser(grammar)); +type Cst = { root: number; errors: { offset: number; end: number; message: string }[] }; +const em = (await import(emPath + '?v=' + process.pid)) as { createParser(): { parse(s: string): Cst } }; +const p = em.createParser(); + +const baseDir = '/tmp/ts-repo/tests/cases/conformance'; +const SLACK = 8; + +async function allTsFiles(dir: string): Promise { + const out: string[] = []; + for (const e of await readdir(dir, { withFileTypes: true })) { + const full = join(dir, e.name); + if (e.isDirectory()) out.push(...await allTsFiles(full)); + else if (e.name.endsWith('.ts') && !e.name.endsWith('.d.ts')) out.push(full); + } + return out; +} +const isMulti = (t: string) => /^\s*\/\/\s*@filename:/im.test(t); + +const files = (await allTsFiles(baseDir)).sort(); +let nFiles = 0, tTotal = 0, tHit = 0, mTotal = 0, mHit = 0, firstOK = 0, weSilent = 0, oracleCrash = 0; +const worst: { file: string; kind: string; at: number; msg: string }[] = []; + +for (const file of files) { + const code = readFileSync(file, 'utf-8'); + if (isMulti(code)) continue; + let sf; + try { + sf = ts.createSourceFile('t.ts', code, ts.ScriptTarget.Latest, false, ts.ScriptKind.TS); + } catch { oracleCrash++; continue; } + const tDiags = (sf as unknown as { parseDiagnostics: { start: number }[] }).parseDiagnostics; + if (tDiags.length === 0) continue; // parser-valid: the accept/CST gates own it + const T = [...new Set(tDiags.map(d => d.start ?? 0))].sort((a, b) => a - b); + const c = p.parse(code); + const M = [...new Set(c.errors.map(g => g.offset))].sort((a, b) => a - b); + nFiles++; + if (M.length === 0) { + weSilent++; + if (worst.length < 12) worst.push({ file: file.replace(baseDir + '/', ''), kind: 'WE-ACCEPT', at: T[0], msg: code.slice(Math.max(0, T[0] - 30), T[0] + 20).replace(/\n/g, '⏎') }); + } + const near = (xs: number[], x: number) => xs.some(y => Math.abs(y - x) <= SLACK); + tTotal += T.length; mTotal += M.length; + for (const t of T) if (near(M, t)) tHit++; else if (worst.length < 24 && M.length > 0) worst.push({ file: file.replace(baseDir + '/', ''), kind: 'MISSED', at: t, msg: code.slice(Math.max(0, t - 30), t + 20).replace(/\n/g, '⏎') }); + for (const m of M) if (near(T, m)) mHit++; + if (M.length > 0 && Math.abs(M[0] - T[0]) <= SLACK) firstOK++; +} + +const pct = (a: number, b: number) => b === 0 ? '—' : (100 * a / b).toFixed(2) + '%'; +console.log(`error-recovery conformance vs tsc parseDiagnostics (${baseDir}, slack ±${SLACK}):`); +console.log(` files tsc-parser-rejects (single-file): ${nFiles}${oracleCrash ? ` (+${oracleCrash} oracle crashes skipped)` : ''}`); +console.log(` recall (tsc errors we also report): ${tHit}/${tTotal} = ${pct(tHit, tTotal)}`); +console.log(` precision (our errors tsc also reports): ${mHit}/${mTotal} = ${pct(mHit, mTotal)}`); +console.log(` first-error agreement: ${firstOK}/${nFiles} = ${pct(firstOK, nFiles)}`); +console.log(` files we accept but tsc rejects: ${weSilent}`); +if (worst.length) { + console.log(`\n ===== sample divergences =====`); + for (const w of worst) console.log(` [${w.kind}] ${w.file} @${w.at} «${w.msg}»`); +} diff --git a/test/recovery.ts b/test/recovery.ts new file mode 100644 index 0000000..5e1d721 --- /dev/null +++ b/test/recovery.ts @@ -0,0 +1,166 @@ +// Gate: TOTAL PARSING (issue #39). The handle API never crashes on input — every +// text produces a tree plus cst.errors — under three hard invariants: +// +// 1. VALID texts parse byte-identically to the STRICT module-level parse with an +// empty errors field (the strict pass runs first and exclusively; recovery +// cannot perturb the valid path). +// 2. INVALID texts never throw, report errors exactly when strict rejects, parse +// deterministically (same input twice → identical tree + errors), and every +// diagnostic span stays inside the document. +// 3. A TYPING session through transiently-invalid states (the editor reality: +// char-by-char insertion makes most intermediate states invalid) keeps every +// intermediate edit byte-identical to a fresh handle parse — tree and errors. +// +// node test/recovery.ts +import { existsSync, readFileSync, writeFileSync } from 'node:fs'; +import { emitParser } from '../src/emit-parser.ts'; +import { objectify } from './emitted-obj.ts'; + +const grammar = (await import('../typescript.ts')).default; +const emPath = '/tmp/emitted-recovery.mjs'; +writeFileSync(emPath, emitParser(grammar)); +type Edit = { start: number; end: number; text: string }; +type Diag = { offset: number; end: number; message: string; related?: { offset: number; end: number; message: string } }; +type Cst = { root: number; errors: Diag[] }; +type Parser = { parse(s: string): Cst; edit(cst: Cst, edits: Edit[]): void; visit(cst: Cst, fns: object): void; tree: import('./emitted-obj.ts').TreeView }; +type Em = { + parse(s: string): number; + visit(entry: number, fns: object): void; + tree: import('./emitted-obj.ts').TreeView; + createParser(): Parser; +}; +const em = (await import(emPath + '?v=' + process.pid)) as Em; +const p = em.createParser(); +const q = em.createParser(); + +let fails = 0; +const bad = (msg: string) => { fails++; if (fails < 12) console.log(' ✗ ' + msg); }; +const objH = (pp: Parser, c: Cst) => JSON.stringify(objectify(pp.tree, (fns) => pp.visit(c, fns))); + +// ── 1. valid corpus: recovery-capable parse ≡ strict parse, errors empty ── +const VALID: string[] = [ + 'const a = 1;\n', + 'function f(a: number): string { return `${a}`; }\nclass C { m(x: T): T { return x; } }\n', + 'const x = a < b ? c : d;\nfor (const k of ks) { if (k) break; }\n', +]; +for (const f of [ + '/tmp/ts-repo/tests/cases/conformance/fixSignatureCaching.ts', + // parserRealSource12 (not 7): #7 has `new TypeLink[]` which is a tsc PARSE ERROR — it + // only "passed" here by exploiting the mid-line opt(';') split that statement-ASI removes. + '/tmp/ts-repo/tests/cases/conformance/parser/ecmascript5/parserRealSource12.ts', +]) if (existsSync(f)) VALID.push(readFileSync(f, 'utf-8')); +let validN = 0; +for (const text of VALID) { + const c = p.parse(text); + const strictRoot = em.parse(text); + const a = objH(p, c); + const b = JSON.stringify(objectify(em.tree, (fns) => em.visit(strictRoot, fns))); + if (a !== b) bad(`valid text: handle tree ≠ strict tree (${text.slice(0, 30)}…)`); + else if (c.errors.length !== 0) bad(`valid text reported ${c.errors.length} errors`); + else validN++; +} + +// ── 2. invalid corpus: total, error-reporting, deterministic, spans in bounds ── +const INVALID: string[] = [ + 'const ] = ;', + 'const a = 1; ]] const b = 2;\n', + 'function f( { return 1; }\n', + 'class C { m( { } \n const after = 1;\n', + 'const s = "unterminated\nconst t = 2;\n', + 'const u = `tpl ${ x ;\n', + 'const v = 1; \\ const w = 2;\n', + 'if (a { b(); }\nconst tail = 3;\n', + '@@@@\n', + '}{)(\n', + // session-found shapes: bar-ladder degeneracies, lex-recovered docs, glued junk + 'class za {" z', + 'funtionzaaz( a z { }', + 'function \\u{0} ( (aa ) { }', + 'functio aa (z az x1<) { }', +]; +let invalidN = 0; +for (const text of INVALID) { + let strictRejects = false; + try { em.parse(text); } catch { strictRejects = true; } + let c: Cst; + try { c = p.parse(text); } catch (e) { bad(`THROWS on «${text.slice(0, 24)}»: ${(e as Error).message.slice(0, 40)}`); continue; } + if (strictRejects !== c.errors.length > 0) { bad(`errors(${c.errors.length}) vs strict ${strictRejects ? 'reject' : 'accept'} on «${text.slice(0, 24)}»`); continue; } + for (const g of c.errors) { + if (!(g.offset >= 0 && g.offset <= g.end && g.end <= text.length && g.message.length > 0)) { + bad(`malformed diagnostic ${JSON.stringify(g)} on «${text.slice(0, 24)}»`); + } + } + const first = objH(p, c) + JSON.stringify(c.errors); + const c2 = p.parse(text); + const second = objH(p, c2) + JSON.stringify(c2.errors); + if (first !== second) { bad(`nondeterministic parse on «${text.slice(0, 24)}»`); continue; } + invalidN++; +} + +// ── 3. typing through invalid states: every keystroke ≡ fresh, tree AND errors ── +const BASE = 'function g(a) {\n return a + 1;\n}\nconst tail = g(2);\n'; +const TYPED = 'const x = f(1, "s");'; +let typedOk = 0; +{ + const at = BASE.indexOf('}\n') + 2; // between the function and the tail stmt + const c = p.parse(BASE); + let text = BASE; + for (let i = 0; i < TYPED.length; i++) { + const ch = TYPED[i]; + const pos = at + i; + p.edit(c, [{ start: pos, end: pos, text: ch }]); + text = text.slice(0, pos) + ch + text.slice(pos); + const fc = q.parse(text); + const a = objH(p, c) + JSON.stringify(c.errors); + const b = objH(q, fc) + JSON.stringify(fc.errors); + if (a !== b) { bad(`keystroke ${i} («${TYPED.slice(0, i + 1)}»): edit ≠ fresh`); break; } + typedOk++; + } + if (c.errors.length !== 0) bad('completed statement still reports errors'); +} + +// ── 4. missing-token synthesis: tsc-style "expected 'x'" diagnostics with the +// structure PRESERVED (a zero-width $missing leaf closes the construct instead of +// an $error absorbing the rest). Exact-match pins — quality must not regress to +// absorption silently. +const SYNTH: Array<[string, string[]]> = [ + // viable-set messages: every listed literal is PROVABLY still accepted at the + // position (trailing comma is legal, so ',' joins ')' — tsc's single "')' + // expected" under-reports); the related info names the matched opener + ['const x = f(1, 2;', ["16:expected ')' @11:to match this '('"]], + ['function g() { return 1;', ["24:expected '}' @13:to match this '{'"]], + ['if (x { y(); }', ["6:expected ',' or ')' @3:to match this '('"]], + ['const y = [1, ;', ["14:expected ',' or ']' @10:to match this '['"]], + ['const t = obj[i;', ["15:expected ']' @13:to match this '['"]], + // missing NONTERMINALS (the tsc "Expression expected" analog): required rule + // refs failing inside the bar window mint a zero-width $missing carrying the + // rule identity — committed optionals ('= Expr' after the real '='), operator + // rhs, mixfix arms, and list elements after a real separator all synthesize + ['const a = ;', ['10:expected Expr']], + ['const x = a + ;', ['14:expected Expr']], + ['const a = -;', ['11:expected Expr']], + ['x ? y : ;', ['8:expected Expr']], + ['a, ;', ['3:expected Expr']], + ["f(1, ;", ["5:expected Expr", "5:expected ')' @1:to match this '('"]], +]; +let synthN = 0; +for (const [text, want] of SYNTH) { + const c = p.parse(text); + const got = c.errors.map((g) => g.offset + ':' + g.message + + (g.related ? ` @${g.related.offset}:${g.related.message}` : '')); + if (JSON.stringify(got) !== JSON.stringify(want)) { + bad(`synthesis on «${text}»: got ${JSON.stringify(got)}, want ${JSON.stringify(want)}`); + continue; + } + let missing = 0; + p.visit(c, { enter(id: number) { if (p.tree.ruleNameOf(id) === '$missing') missing++; } }); + if (missing === 0) { bad(`synthesis on «${text}»: no $missing node in the tree`); continue; } + synthN++; +} + +console.log(`recovery: valid ${validN}/${VALID.length} ≡ strict+clean · invalid ${invalidN}/${INVALID.length} total+deterministic · typing ${typedOk}/${TYPED.length} keystrokes ≡ fresh · synthesis ${synthN}/${SYNTH.length} exact`); +if (fails > 0) { + console.error('✗ total-parsing contract violated'); + process.exit(1); +} +console.log('✓ parse/edit are total: valid path byte-identical, errors field exact, typing sessions equivalent'); diff --git a/test/refactor-guard.ts b/test/refactor-guard.ts index da2417f..712c71b 100644 --- a/test/refactor-guard.ts +++ b/test/refactor-guard.ts @@ -60,7 +60,8 @@ const should = { 'tp in out': 'type T = A;', 'tp out extends': 'type T = A;', 'tp name-out': 'type T = out;', // `out` as the param NAME, not modifier - 'tp name-in default': 'interface I {}', + 'tp name-out default': 'interface I {}', // `out` (contextual) is a valid param NAME; `in` (reserved) is NOT — `` is a tsc parse error + // declarations 'decl class': 'class C {}', 'decl abstract class': 'abstract class C {}', diff --git a/test/verify-rejects.ts b/test/verify-rejects.ts index e922f2c..bc97765 100644 --- a/test/verify-rejects.ts +++ b/test/verify-rejects.ts @@ -35,7 +35,7 @@ function ourReach(msg: string): number | null { } const files = (await allTsFiles(baseDir)).sort(); -let agree = 0, early = 0, unknown = 0; +let agree = 0, early = 0, unknown = 0, oracleCrash = 0; const earlies: { file: string; ourReach: number; tsFirst: number; ctx: string }[] = []; for (const file of files) { @@ -44,7 +44,15 @@ for (const file of files) { let msg = ''; try { parse(code); continue; } catch (e: any) { msg = e.message; } // only files we FAIL - const sf = ts.createSourceFile('t.ts', code, ts.ScriptTarget.Latest, true, ts.ScriptKind.TS); + // the oracle itself can die on malformed input (e.g. a Debug.assert inside + // tsc's `await using` paths) — a crashed oracle has no verdict, count + skip + let sf; + try { + sf = ts.createSourceFile('t.ts', code, ts.ScriptTarget.Latest, true, ts.ScriptKind.TS); + } catch { + oracleCrash++; + continue; + } const diags = (sf as any).parseDiagnostics ?? []; if (diags.length === 0) continue; // that's a REAL gap, handled elsewhere @@ -64,6 +72,7 @@ console.log(`Single-file error-tests we fail: ${agree + early + unknown}`); console.log(` AGREE (reach >= TS first error - ${SLACK}) : ${agree} ← rejected for the right reason`); console.log(` EARLY (bail before TS's error) : ${early} ← hidden gap: valid code we can't parse`); console.log(` UNKNOWN (no offset in our error) : ${unknown}`); +if (oracleCrash > 0) console.log(` ORACLE-CRASH (tsc threw; no verdict) : ${oracleCrash}`); if (earlies.length) { console.log(`\n===== EARLY (hidden gaps) =====`); earlies.sort((a, b) => (a.tsFirst - a.ourReach) - (b.tsFirst - b.ourReach)); diff --git a/tree-sitter/javascript/grammar.js b/tree-sitter/javascript/grammar.js index e03c6d0..4368c6f 100644 --- a/tree-sitter/javascript/grammar.js +++ b/tree-sitter/javascript/grammar.js @@ -33,6 +33,7 @@ module.exports = grammar({ [$.stmt, $.decl], [$.expr, $.decl], [$.program, $.stmt], + [$.new_target], [$.expr, $.param], [$.expr, $.new_target], [$.expr, $.block], @@ -91,7 +92,7 @@ module.exports = grammar({ "null", "undefined", "this", - "super", + seq("super", choice(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), seq(".", choice($.ident, $.private_field)), seq("[", $.expr, "]"))), $.ident, $.number, $.string, @@ -106,14 +107,16 @@ module.exports = grammar({ prec.left(18, seq($.expr, "instanceof", $.expr)), prec.left(18, seq($.expr, "in", $.expr)), prec.left(18, seq($.expr, $.template)), - seq("new", $.new_target, optional(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"))), - seq("new", "class", field('name', $.ident), optional(seq("extends", $.class_heritage)), "{", repeat($.class_member), "}", optional(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"))), - seq("new", "class", optional(seq("extends", $.class_heritage)), "{", repeat($.class_member), "}", optional(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"))), + seq("new", ".", "target"), + seq("new", $.new_target, choice(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), blank())), + seq("new", "class", field('name', $.ident), optional(seq("extends", $.class_heritage)), "{", repeat($.class_member), "}", choice(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), blank())), + seq("new", "class", optional(seq("extends", $.class_heritage)), "{", repeat($.class_member), "}", choice(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), blank())), seq("[", repeat(seq(optional($.expr), ",")), optional($.expr), "]"), seq("{", optional(seq($.prop, repeat(seq(",", $.prop)), optional(","))), "}"), - seq(optional("async"), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", "=>", choice($.expr, $.block)), - seq("async", $.ident, "=>", choice($.expr, $.block)), - seq($.ident, "=>", choice($.expr, $.block)), + seq("async", "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", "=>", choice($.block, $.expr)), + seq("(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", "=>", choice($.block, $.expr)), + seq("async", $.ident, "=>", choice($.block, $.expr)), + seq($.ident, "=>", choice($.block, $.expr)), seq("yield", choice(seq("*", $.expr), optional($.expr))), seq("(", $.expr, repeat(seq(",", $.expr)), ")"), seq("import", choice(seq("(", $.expr, ")"), seq(".", "meta"))), @@ -122,12 +125,15 @@ module.exports = grammar({ $.octal_number, $.binary_number, $.big_int, - seq(optional("async"), "function", optional("*"), optional(field('name', $.ident)), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), + seq("function", optional(field('name', $.ident)), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), + seq("function", "*", optional(field('name', $.ident)), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), + seq("async", "function", optional(field('name', $.ident)), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), + seq("async", "function", "*", optional(field('name', $.ident)), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq(optional($.decorator_expr), "class", field('name', $.ident), repeat(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(","))))), "{", repeat($.class_member), "}"), seq(optional($.decorator_expr), "class", repeat(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(","))))), "{", repeat($.class_member), "}") ), - prop: $ => choice(seq("...", $.expr), seq(choice("get", "set"), $.member_name, "(", optional(optional(seq($.param, repeat(seq(",", $.param)), optional(",")))), ")", $.block), seq(optional("async"), optional("*"), $.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq($.member_name, ":", $.expr), seq("[", $.expr, repeat(seq(",", $.expr)), "]", ":", $.expr), seq($.ident, choice(seq("=", $.expr), blank()))), + prop: $ => choice(seq("...", $.expr), seq(choice("get", "set"), $.member_name, "(", optional(optional(seq($.param, repeat(seq(",", $.param)), optional(",")))), ")", $.block), seq("async", "*", $.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq("async", $.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq("*", $.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq($.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq($.member_name, ":", $.expr), seq("[", $.expr, repeat(seq(",", $.expr)), "]", ":", $.expr), seq($.ident, choice(seq("=", $.expr), blank()))), member_name: $ => choice($.ident, $.private_field, $.string, $.number, $.hex_number, $.octal_number, $.binary_number, $.big_int, seq("[", $.expr, "]")), @@ -153,13 +159,13 @@ module.exports = grammar({ param: $ => seq(optional($.decorator_expr), choice(seq($.ident, optional(seq("=", $.expr))), seq($.binding_pattern, optional(seq("=", $.expr))), seq("...", choice($.ident, $.binding_pattern), optional(seq("=", $.expr))))), - for_head: $ => choice(seq(choice("let", "const", "var", "using", seq("await", "using")), optional(seq($.for_binding, repeat(seq(",", $.for_binding)), optional(","))), choice(seq(";", optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr))))), seq(choice("in", "of"), $.expr))), seq(optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr))))), seq($.expr, choice("in", "of"), $.expr)), + for_head: $ => choice(seq(choice("let", "const", "var", "using", seq("await", "using")), optional(seq($.for_binding, repeat(seq(",", $.for_binding)), optional(","))), choice(seq(";", optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr))))), seq("in", $.expr, repeat(seq(",", $.expr))), seq("of", $.expr))), seq(optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr))))), seq($.expr, "in", $.expr, repeat(seq(",", $.expr))), seq($.expr, "of", $.expr)), switch_case: $ => choice(seq("case", $.expr, repeat(seq(",", $.expr)), ":"), seq("default", ":"), $.stmt), - decl: $ => choice(seq(optional("async"), "function", optional("*"), field('name', $.ident), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq(repeat($.decorator_expr), "class", field('name', $.ident), repeat(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(","))))), "{", repeat($.class_member), "}"), seq("export", choice($.decl, $.stmt)), seq(repeat1($.decorator_expr), $.decl), seq("export", "default", choice(seq(optional("async"), "function", optional("*"), optional(field('name', $.ident)), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq($.expr, optional(";")))), seq("export", "*", choice(seq("from", $.string, optional(";")), seq("as", $.ident, "from", $.string, optional(";")))), seq("export", "{", optional(seq($.export_specifier, repeat(seq(",", $.export_specifier)), optional(","))), "}", optional(seq("from", $.string)), optional(";")), seq("import", choice(seq($.import_clause, "from", $.string, optional(";")), seq($.ident, "=", $.expr, optional(";")), seq($.string, optional(";")))), seq(repeat($.decorator_expr), "export", choice($.decl, $.stmt))), + decl: $ => choice(seq("function", field('name', $.ident), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq("function", "*", field('name', $.ident), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq("async", "function", field('name', $.ident), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq("async", "function", "*", field('name', $.ident), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq(repeat($.decorator_expr), "class", field('name', $.ident), repeat(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(","))))), "{", repeat($.class_member), "}"), seq("export", choice($.decl, $.stmt)), seq(repeat1($.decorator_expr), $.decl), seq("export", "default", choice(seq("function", optional(field('name', $.ident)), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq("function", "*", optional(field('name', $.ident)), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq("async", "function", optional(field('name', $.ident)), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq("async", "function", "*", optional(field('name', $.ident)), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq($.expr, optional(";")))), seq("export", "*", choice(seq("from", $.string, optional(";")), seq("as", $.ident, "from", $.string, optional(";")))), seq("export", "{", optional(seq($.export_specifier, repeat(seq(",", $.export_specifier)), optional(","))), "}", optional(seq("from", $.string)), optional(";")), seq("import", choice(seq($.import_clause, "from", $.string, optional(";")), seq($.ident, "=", $.expr, optional(";")), seq($.string, optional(";")))), seq(repeat($.decorator_expr), "export", choice($.decl, $.stmt))), - class_member: $ => choice(";", $.decorator_expr, seq("constructor", "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block, optional(";")), seq("static", $.block), seq(repeat(choice("static", "accessor", "async")), choice(seq("*", $.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional($.block), optional(";")), seq(choice("get", "set"), $.member_name, "(", optional(optional(seq($.param, repeat(seq(",", $.param)), optional(",")))), ")", optional($.block), optional(";")), seq($.member_name, choice(seq("(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional($.block), optional(";")), seq(optional(seq("=", $.expr)), optional(";")))))), seq($.member_name, optional(seq("=", $.expr)), optional(";")), seq($.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional($.block), optional(";"))), + class_member: $ => choice(";", seq("constructor", "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block, optional(";")), seq(repeat($.decorator_expr), repeat(choice(choice("static", "accessor"))), "static", $.block), seq(repeat($.decorator_expr), repeat(choice(choice("static", "accessor"))), choice(seq("async", repeat(choice(choice("static", "accessor"))), "*", $.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional($.block), optional(";")), seq("async", repeat(choice(choice("static", "accessor"))), choice("get", "set"), $.member_name, "(", optional(optional(seq($.param, repeat(seq(",", $.param)), optional(",")))), ")", optional($.block), optional(";")), seq("async", repeat(choice(choice("static", "accessor"))), "static", $.block), seq("async", repeat(choice(choice("static", "accessor"))), $.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional($.block), optional(";")), seq("*", $.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional($.block), optional(";")), seq(choice("get", "set"), $.member_name, "(", optional(optional(seq($.param, repeat(seq(",", $.param)), optional(",")))), ")", optional($.block), optional(";")), seq($.member_name, choice(seq("(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional($.block), optional(";")), seq(optional(seq("=", $.expr)), choice(";", blank(), blank())))))), seq($.member_name, optional(seq("=", $.expr)), choice(";", blank(), blank())), seq($.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional($.block), optional(";"))), import_clause: $ => choice(seq($.ident, optional(seq(",", choice(seq("{", optional(seq($.import_specifier, repeat(seq(",", $.import_specifier)), optional(","))), "}"), seq("*", "as", $.ident))))), seq("{", optional(seq($.import_specifier, repeat(seq(",", $.import_specifier)), optional(","))), "}"), seq("*", "as", $.ident)), @@ -187,7 +193,7 @@ module.exports = grammar({ big_int: $ => token(/[0-9]+(?:_[0-9]+)*n/), - number: $ => token(/(?:[0-9]+(?:_[0-9]+)*(?:\.[0-9]*(?:_[0-9]+)*)?|\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\-]?[0-9]+(?:_[0-9]+)*)?/), + number: $ => token(/(?:(?:0|[1-9][0-9]*(?:_[0-9]+)*)(?:\.[0-9]*(?:_[0-9]+)*)?|\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\-]?[0-9]+(?:_[0-9]+)*)?/), string: $ => token(/"(?:[^"\\]|\\(?:u\{0*(?:[0-9A-Fa-f]{1,5}|10[0-9A-Fa-f]{4})\}|u[0-9A-Fa-f]{4}|x[0-9A-Fa-f]{2}|[^ux]))*"|'(?:[^'\\]|\\(?:u\{0*(?:[0-9A-Fa-f]{1,5}|10[0-9A-Fa-f]{4})\}|u[0-9A-Fa-f]{4}|x[0-9A-Fa-f]{2}|[^ux]))*'/), diff --git a/tree-sitter/javascript/queries/highlights.scm b/tree-sitter/javascript/queries/highlights.scm index 62bce7a..36d17b7 100644 --- a/tree-sitter/javascript/queries/highlights.scm +++ b/tree-sitter/javascript/queries/highlights.scm @@ -56,9 +56,9 @@ ;; Keyword, operator, and punctuation literals. [ - "debugger" "accessor" "default" "extends" "switch" "export" "static" "const" - "using" "class" "async" "case" "with" "from" "meta" "let" - "var" "get" "set" "as" + "debugger" "accessor" "default" "extends" "switch" "export" "static" "target" + "const" "using" "class" "async" "case" "with" "from" "meta" + "let" "var" "get" "set" "as" ] @keyword [ "constructor" "function" "=>" diff --git a/tree-sitter/javascript/src/scanner.c b/tree-sitter/javascript/src/scanner.c index 98bb10a..6ca2aea 100644 --- a/tree-sitter/javascript/src/scanner.c +++ b/tree-sitter/javascript/src/scanner.c @@ -50,7 +50,7 @@ static inline void skip(TSLexer *lexer) { lexer->advance(lexer, true); } // regex-vs-division decision is already made by the LR context. We only // need to scan the literal body here. // -// Regex flag characters (derived from the token pattern): "gimsuydv" +// Regex flag characters (derived from the token pattern): "gimsuyd" // Division-after texts (informational; LR ctx handles these): ) ] ++ -- this super true false null undefined // Regex-after keywords (informational): in of instanceof typeof delete void await yield throw return case do else new static bool scan_regex(TSLexer *lexer) { @@ -69,7 +69,7 @@ static bool scan_regex(TSLexer *lexer) { advance(lexer); } // Trailing flag characters. - const char *flags = "gimsuydv"; + const char *flags = "gimsuyd"; while (lexer->lookahead != 0 && strchr(flags, (char)lexer->lookahead) != NULL) advance(lexer); lexer->result_symbol = REGEX_LITERAL; lexer->mark_end(lexer); diff --git a/tree-sitter/javascriptreact/grammar.js b/tree-sitter/javascriptreact/grammar.js index 52da7cc..1539384 100644 --- a/tree-sitter/javascriptreact/grammar.js +++ b/tree-sitter/javascriptreact/grammar.js @@ -33,6 +33,7 @@ module.exports = grammar({ [$.stmt, $.decl], [$.expr, $.decl], [$.program, $.stmt], + [$.new_target], [$.expr, $.param], [$.expr, $.new_target], [$.expr, $.block], @@ -93,7 +94,7 @@ module.exports = grammar({ "null", "undefined", "this", - "super", + seq("super", choice(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), seq(".", choice($.ident, $.private_field)), seq("[", $.expr, "]"))), $.ident, $.number, $.string, @@ -108,14 +109,16 @@ module.exports = grammar({ prec.left(18, seq($.expr, "instanceof", $.expr)), prec.left(18, seq($.expr, "in", $.expr)), prec.left(18, seq($.expr, $.template)), - seq("new", $.new_target, optional(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"))), - seq("new", "class", field('name', $.ident), optional(seq("extends", $.class_heritage)), "{", repeat($.class_member), "}", optional(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"))), - seq("new", "class", optional(seq("extends", $.class_heritage)), "{", repeat($.class_member), "}", optional(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"))), + seq("new", ".", "target"), + seq("new", $.new_target, choice(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), blank())), + seq("new", "class", field('name', $.ident), optional(seq("extends", $.class_heritage)), "{", repeat($.class_member), "}", choice(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), blank())), + seq("new", "class", optional(seq("extends", $.class_heritage)), "{", repeat($.class_member), "}", choice(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), blank())), seq("[", repeat(seq(optional($.expr), ",")), optional($.expr), "]"), seq("{", optional(seq($.prop, repeat(seq(",", $.prop)), optional(","))), "}"), - seq(optional("async"), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", "=>", choice($.expr, $.block)), - seq("async", $.ident, "=>", choice($.expr, $.block)), - seq($.ident, "=>", choice($.expr, $.block)), + seq("async", "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", "=>", choice($.block, $.expr)), + seq("(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", "=>", choice($.block, $.expr)), + seq("async", $.ident, "=>", choice($.block, $.expr)), + seq($.ident, "=>", choice($.block, $.expr)), seq("yield", choice(seq("*", $.expr), optional($.expr))), seq("(", $.expr, repeat(seq(",", $.expr)), ")"), seq("import", choice(seq("(", $.expr, ")"), seq(".", "meta"))), @@ -124,12 +127,15 @@ module.exports = grammar({ $.octal_number, $.binary_number, $.big_int, - seq(optional("async"), "function", optional("*"), optional(field('name', $.ident)), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), + seq("function", optional(field('name', $.ident)), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), + seq("function", "*", optional(field('name', $.ident)), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), + seq("async", "function", optional(field('name', $.ident)), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), + seq("async", "function", "*", optional(field('name', $.ident)), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq(optional($.decorator_expr), "class", field('name', $.ident), repeat(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(","))))), "{", repeat($.class_member), "}"), seq(optional($.decorator_expr), "class", repeat(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(","))))), "{", repeat($.class_member), "}") ), - prop: $ => choice(seq("...", $.expr), seq(choice("get", "set"), $.member_name, "(", optional(optional(seq($.param, repeat(seq(",", $.param)), optional(",")))), ")", $.block), seq(optional("async"), optional("*"), $.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq($.member_name, ":", $.expr), seq("[", $.expr, repeat(seq(",", $.expr)), "]", ":", $.expr), seq($.ident, choice(seq("=", $.expr), blank()))), + prop: $ => choice(seq("...", $.expr), seq(choice("get", "set"), $.member_name, "(", optional(optional(seq($.param, repeat(seq(",", $.param)), optional(",")))), ")", $.block), seq("async", "*", $.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq("async", $.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq("*", $.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq($.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq($.member_name, ":", $.expr), seq("[", $.expr, repeat(seq(",", $.expr)), "]", ":", $.expr), seq($.ident, choice(seq("=", $.expr), blank()))), member_name: $ => choice($.ident, $.private_field, $.string, $.number, $.hex_number, $.octal_number, $.binary_number, $.big_int, seq("[", $.expr, "]")), @@ -155,13 +161,13 @@ module.exports = grammar({ param: $ => seq(optional($.decorator_expr), choice(seq($.ident, optional(seq("=", $.expr))), seq($.binding_pattern, optional(seq("=", $.expr))), seq("...", choice($.ident, $.binding_pattern), optional(seq("=", $.expr))))), - for_head: $ => choice(seq(choice("let", "const", "var", "using", seq("await", "using")), optional(seq($.for_binding, repeat(seq(",", $.for_binding)), optional(","))), choice(seq(";", optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr))))), seq(choice("in", "of"), $.expr))), seq(optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr))))), seq($.expr, choice("in", "of"), $.expr)), + for_head: $ => choice(seq(choice("let", "const", "var", "using", seq("await", "using")), optional(seq($.for_binding, repeat(seq(",", $.for_binding)), optional(","))), choice(seq(";", optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr))))), seq("in", $.expr, repeat(seq(",", $.expr))), seq("of", $.expr))), seq(optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr))))), seq($.expr, "in", $.expr, repeat(seq(",", $.expr))), seq($.expr, "of", $.expr)), switch_case: $ => choice(seq("case", $.expr, repeat(seq(",", $.expr)), ":"), seq("default", ":"), $.stmt), - decl: $ => choice(seq(optional("async"), "function", optional("*"), field('name', $.ident), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq(repeat($.decorator_expr), "class", field('name', $.ident), repeat(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(","))))), "{", repeat($.class_member), "}"), seq("export", choice($.decl, $.stmt)), seq(repeat1($.decorator_expr), $.decl), seq("export", "default", choice(seq(optional("async"), "function", optional("*"), optional(field('name', $.ident)), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq($.expr, optional(";")))), seq("export", "*", choice(seq("from", $.string, optional(";")), seq("as", $.ident, "from", $.string, optional(";")))), seq("export", "{", optional(seq($.export_specifier, repeat(seq(",", $.export_specifier)), optional(","))), "}", optional(seq("from", $.string)), optional(";")), seq("import", choice(seq($.import_clause, "from", $.string, optional(";")), seq($.ident, "=", $.expr, optional(";")), seq($.string, optional(";")))), seq(repeat($.decorator_expr), "export", choice($.decl, $.stmt))), + decl: $ => choice(seq("function", field('name', $.ident), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq("function", "*", field('name', $.ident), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq("async", "function", field('name', $.ident), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq("async", "function", "*", field('name', $.ident), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq(repeat($.decorator_expr), "class", field('name', $.ident), repeat(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(","))))), "{", repeat($.class_member), "}"), seq("export", choice($.decl, $.stmt)), seq(repeat1($.decorator_expr), $.decl), seq("export", "default", choice(seq("function", optional(field('name', $.ident)), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq("function", "*", optional(field('name', $.ident)), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq("async", "function", optional(field('name', $.ident)), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq("async", "function", "*", optional(field('name', $.ident)), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block), seq($.expr, optional(";")))), seq("export", "*", choice(seq("from", $.string, optional(";")), seq("as", $.ident, "from", $.string, optional(";")))), seq("export", "{", optional(seq($.export_specifier, repeat(seq(",", $.export_specifier)), optional(","))), "}", optional(seq("from", $.string)), optional(";")), seq("import", choice(seq($.import_clause, "from", $.string, optional(";")), seq($.ident, "=", $.expr, optional(";")), seq($.string, optional(";")))), seq(repeat($.decorator_expr), "export", choice($.decl, $.stmt))), - class_member: $ => choice(";", $.decorator_expr, seq("constructor", "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block, optional(";")), seq("static", $.block), seq(repeat(choice("static", "accessor", "async")), choice(seq("*", $.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional($.block), optional(";")), seq(choice("get", "set"), $.member_name, "(", optional(optional(seq($.param, repeat(seq(",", $.param)), optional(",")))), ")", optional($.block), optional(";")), seq($.member_name, choice(seq("(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional($.block), optional(";")), seq(optional(seq("=", $.expr)), optional(";")))))), seq($.member_name, optional(seq("=", $.expr)), optional(";")), seq($.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional($.block), optional(";"))), + class_member: $ => choice(";", seq("constructor", "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block, optional(";")), seq(repeat($.decorator_expr), repeat(choice(choice("static", "accessor"))), "static", $.block), seq(repeat($.decorator_expr), repeat(choice(choice("static", "accessor"))), choice(seq("async", repeat(choice(choice("static", "accessor"))), "*", $.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional($.block), optional(";")), seq("async", repeat(choice(choice("static", "accessor"))), choice("get", "set"), $.member_name, "(", optional(optional(seq($.param, repeat(seq(",", $.param)), optional(",")))), ")", optional($.block), optional(";")), seq("async", repeat(choice(choice("static", "accessor"))), "static", $.block), seq("async", repeat(choice(choice("static", "accessor"))), $.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional($.block), optional(";")), seq("*", $.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional($.block), optional(";")), seq(choice("get", "set"), $.member_name, "(", optional(optional(seq($.param, repeat(seq(",", $.param)), optional(",")))), ")", optional($.block), optional(";")), seq($.member_name, choice(seq("(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional($.block), optional(";")), seq(optional(seq("=", $.expr)), choice(";", blank(), blank())))))), seq($.member_name, optional(seq("=", $.expr)), choice(";", blank(), blank())), seq($.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional($.block), optional(";"))), import_clause: $ => choice(seq($.ident, optional(seq(",", choice(seq("{", optional(seq($.import_specifier, repeat(seq(",", $.import_specifier)), optional(","))), "}"), seq("*", "as", $.ident))))), seq("{", optional(seq($.import_specifier, repeat(seq(",", $.import_specifier)), optional(","))), "}"), seq("*", "as", $.ident)), @@ -201,7 +207,7 @@ module.exports = grammar({ big_int: $ => token(/[0-9]+(?:_[0-9]+)*n/), - number: $ => token(/(?:[0-9]+(?:_[0-9]+)*(?:\.[0-9]*(?:_[0-9]+)*)?|\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\-]?[0-9]+(?:_[0-9]+)*)?/), + number: $ => token(/(?:(?:0|[1-9][0-9]*(?:_[0-9]+)*)(?:\.[0-9]*(?:_[0-9]+)*)?|\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\-]?[0-9]+(?:_[0-9]+)*)?/), string: $ => token(/"(?:[^"\\]|\\(?:u\{0*(?:[0-9A-Fa-f]{1,5}|10[0-9A-Fa-f]{4})\}|u[0-9A-Fa-f]{4}|x[0-9A-Fa-f]{2}|[^ux]))*"|'(?:[^'\\]|\\(?:u\{0*(?:[0-9A-Fa-f]{1,5}|10[0-9A-Fa-f]{4})\}|u[0-9A-Fa-f]{4}|x[0-9A-Fa-f]{2}|[^ux]))*'/), diff --git a/tree-sitter/javascriptreact/queries/highlights.scm b/tree-sitter/javascriptreact/queries/highlights.scm index 74c724d..00acec6 100644 --- a/tree-sitter/javascriptreact/queries/highlights.scm +++ b/tree-sitter/javascriptreact/queries/highlights.scm @@ -57,9 +57,9 @@ ;; Keyword, operator, and punctuation literals. [ - "debugger" "accessor" "default" "extends" "switch" "export" "static" "const" - "using" "class" "async" "case" "with" "from" "meta" "let" - "var" "get" "set" "as" + "debugger" "accessor" "default" "extends" "switch" "export" "static" "target" + "const" "using" "class" "async" "case" "with" "from" "meta" + "let" "var" "get" "set" "as" ] @keyword [ "constructor" "function" "=>" diff --git a/tree-sitter/javascriptreact/src/scanner.c b/tree-sitter/javascriptreact/src/scanner.c index 353eae9..dfc1e00 100644 --- a/tree-sitter/javascriptreact/src/scanner.c +++ b/tree-sitter/javascriptreact/src/scanner.c @@ -50,7 +50,7 @@ static inline void skip(TSLexer *lexer) { lexer->advance(lexer, true); } // regex-vs-division decision is already made by the LR context. We only // need to scan the literal body here. // -// Regex flag characters (derived from the token pattern): "gimsuydv" +// Regex flag characters (derived from the token pattern): "gimsuyd" // Division-after texts (informational; LR ctx handles these): ) ] ++ -- this super true false null undefined > } // Regex-after keywords (informational): in of instanceof typeof delete void await yield throw return case do else new static bool scan_regex(TSLexer *lexer) { @@ -69,7 +69,7 @@ static bool scan_regex(TSLexer *lexer) { advance(lexer); } // Trailing flag characters. - const char *flags = "gimsuydv"; + const char *flags = "gimsuyd"; while (lexer->lookahead != 0 && strchr(flags, (char)lexer->lookahead) != NULL) advance(lexer); lexer->result_symbol = REGEX_LITERAL; lexer->mark_end(lexer); diff --git a/tree-sitter/typescript/grammar.js b/tree-sitter/typescript/grammar.js index ee16223..5c81a38 100644 --- a/tree-sitter/typescript/grammar.js +++ b/tree-sitter/typescript/grammar.js @@ -34,6 +34,7 @@ module.exports = grammar({ [$.stmt, $.decl], [$.expr, $.decl], [$.program, $.stmt], + [$.new_target], [$.type, $.type_param], [$.type_param], [$.expr, $.param], @@ -109,9 +110,9 @@ module.exports = grammar({ rules: { program: $ => repeat(choice($.decl, $.stmt)), - type: $ => choice(seq($.ident, optional(seq("is", $.type))), seq($.type, "<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">"), seq($.type, "[", "]"), seq($.type, "|", $.type), seq($.type, "&", $.type), seq("|", $.type), seq("&", $.type), seq("keyof", $.type), seq("typeof", $.typeof_ref), seq("readonly", $.type), seq("(", $.type, ")"), seq(optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", "=>", $.type), seq(optional("abstract"), "new", optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", "=>", $.type), seq("[", repeat(seq(optional("..."), optional(seq($.ident, optional("?"), ":")), optional("..."), $.type, optional("?"), optional(","))), "]"), seq("{", repeat(seq($.type_member, optional(choice(";", ",")))), "}"), seq("asserts", $.ident, optional(seq("is", $.type))), seq($.type, "extends", $.type, "?", $.type, ":", $.type), seq("infer", $.ident, optional(seq("extends", $.type, optional(seq("?", $.type, ":", $.type))))), $.string, $.number, $.hex_number, $.octal_number, $.binary_number, $.big_int, seq("-", choice($.number, $.big_int)), "true", "false", "null", "undefined", "void", "this", seq("unique", "symbol"), seq("import", "(", $.type, ")"), $.template, seq($.type, "[", $.type, "]"), seq($.type, ".", $.ident), seq($.type, ".", "<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">"), seq("?", $.type), seq("!", $.type), "?", "*", seq("function", "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type))), seq($.type, "?"), seq($.type, "!")), + type: $ => choice(seq($.ident, optional(seq("is", $.type))), seq($.type, "<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">"), seq($.type, "[", "]"), seq($.type, "|", $.type), seq($.type, "&", $.type), seq("|", $.type), seq("&", $.type), seq("keyof", $.type), seq("typeof", $.typeof_ref), seq("readonly", $.type), seq("(", $.type, ")"), seq(optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", "=>", $.type), seq(optional("abstract"), "new", optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", "=>", $.type), seq("[", repeat(seq(optional("..."), optional(seq($.ident, optional("?"), ":")), optional("..."), $.type, optional("?"), choice(",", blank()))), "]"), seq("{", repeat(seq($.type_member, choice(";", ",", blank(), blank()))), "}"), seq("asserts", $.ident, optional(seq("is", $.type))), seq($.type, "extends", $.type, "?", $.type, ":", $.type), seq("infer", $.ident, optional(seq("extends", $.type, optional(seq("?", $.type, ":", $.type))))), $.string, $.number, $.hex_number, $.octal_number, $.binary_number, $.big_int, seq("-", choice($.number, $.big_int)), "true", "false", "null", "undefined", "void", "this", seq("unique", $.type), seq("import", "(", $.type, ")"), $.template, seq($.type, "[", $.type, "]"), seq($.type, ".", $.ident), seq($.type, ".", "<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">"), seq("?", $.type), seq("!", $.type), "?", "*", seq("function", "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type))), seq($.type, "?"), seq($.type, "!")), - type_member: $ => choice(seq(optional("new"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type))), seq(optional(choice("+", "-")), optional("readonly"), "[", choice(seq($.ident, choice(seq("in", $.type, optional(seq("as", $.type)), "]", optional(choice("+", "-")), optional("?"), ":", $.type), seq(":", $.type, "]", optional(seq(":", $.type))))), seq($.expr, "]", optional("?"), choice(seq(optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type))), optional(seq(":", $.type)))), seq("]", optional(seq(":", $.type))))), seq("readonly", $.ident, optional("?"), ":", $.type), seq(choice($.ident, $.number, $.string, $.private_field), optional("?"), choice(seq(optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type))), optional(seq(":", $.type))))), + type_member: $ => choice(seq(optional("new"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type))), seq(optional(choice("+", "-")), optional("readonly"), "[", choice(seq($.ident, choice(seq("in", $.type, optional(seq("as", $.type)), "]", optional(choice("+", "-")), optional("?"), ":", $.type), seq(":", $.type, optional(","), "]", optional(seq(":", $.type))))), seq($.expr, "]", optional("?"), choice(seq(optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type))), optional(seq(":", $.type)))), seq("]", optional(seq(":", $.type))))), seq("readonly", $.ident, optional("?"), ":", $.type), seq(choice($.ident, $.number, $.string, $.private_field), optional("?"), choice(seq(optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type))), optional(seq(":", $.type))))), decorator_expr: $ => choice(seq($.decorator, repeat(choice(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), seq("<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">", "(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), "!", seq(".", choice($.ident, $.private_field)), seq("?.", choice($.ident, $.private_field, seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), seq("[", $.expr, "]"))), $.template))), seq("@new", $.new_target, optional(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")")))), @@ -140,7 +141,7 @@ module.exports = grammar({ "null", "undefined", "this", - "super", + seq("super", choice(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), seq(".", choice($.ident, $.private_field)), seq("[", $.expr, "]"))), $.ident, $.number, $.string, @@ -151,7 +152,7 @@ module.exports = grammar({ prec.left(18, seq($.expr, "<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">")), prec.left(18, seq($.expr, "(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")")), prec.left(18, seq($.expr, ".", choice($.ident, $.private_field))), - prec.left(18, seq($.expr, "?.", choice($.ident, $.private_field, seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), seq("[", $.expr, "]"), $.template))), + prec.left(18, seq($.expr, "?.", choice($.ident, $.private_field, seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), seq("[", $.expr, "]"), $.template, seq("<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">", "(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")")))), prec.left(18, seq($.expr, "[", $.expr, "]")), prec.left(18, seq($.expr, "!")), prec.left(18, seq($.expr, "?", $.expr, ":", $.expr)), @@ -159,38 +160,43 @@ module.exports = grammar({ prec.left(18, seq($.expr, "instanceof", $.expr)), prec.left(18, seq($.expr, "in", $.expr)), prec.left(18, seq($.expr, $.template)), - seq("new", $.new_target, optional(choice(seq("<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">", optional(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"))), seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")")))), - seq("new", "class", field('name', $.ident), optional($.type_params), optional(seq("extends", $.class_heritage)), optional(seq("implements", optional(seq($.type, repeat(seq(",", $.type)), optional(","))))), "{", repeat($.class_member), "}", optional(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"))), - seq("new", "class", optional($.type_params), optional(seq("extends", $.class_heritage)), optional(seq("implements", optional(seq($.type, repeat(seq(",", $.type)), optional(","))))), "{", repeat($.class_member), "}", optional(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"))), + seq("new", ".", "target"), + seq("new", $.new_target, choice(seq("<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">", choice(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), blank())), seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), blank())), + seq("new", "class", field('name', $.ident), optional($.type_params), optional(seq("extends", $.class_heritage)), optional(seq("implements", optional(seq($.type, repeat(seq(",", $.type)), optional(","))))), "{", repeat($.class_member), "}", choice(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), blank())), + seq("new", "class", optional($.type_params), optional(seq("extends", $.class_heritage)), optional(seq("implements", optional(seq($.type, repeat(seq(",", $.type)), optional(","))))), "{", repeat($.class_member), "}", choice(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), blank())), seq("[", repeat(seq(optional($.expr), ",")), optional($.expr), "]"), seq("{", optional(seq($.prop, repeat(seq(",", $.prop)), optional(","))), "}"), - seq(optional("async"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), "=>", choice($.expr, $.block)), - seq("async", $.ident, "=>", choice($.expr, $.block)), - seq($.ident, "=>", choice($.expr, $.block)), + seq("async", optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), "=>", choice($.block, $.expr)), + seq(optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), "=>", choice($.block, $.expr)), + seq("async", $.ident, "=>", choice($.block, $.expr)), + seq($.ident, "=>", choice($.block, $.expr)), seq("yield", choice(seq("*", $.expr), optional($.expr))), seq("(", $.expr, repeat(seq(",", $.expr)), ")"), prec.left(18, seq($.expr, "satisfies", $.type)), - seq("import", choice(seq("(", $.expr, ")"), seq(".", "meta"))), + seq("import", choice(seq("(", $.expr, ")"), seq(".", "meta"), seq("<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">", optional(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"))))), $.private_field, $.hex_number, $.octal_number, $.binary_number, $.big_int, - seq(optional("async"), "function", optional("*"), optional($.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), + seq("function", optional($.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), + seq("function", "*", optional($.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), + seq("async", "function", optional($.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), + seq("async", "function", "*", optional($.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), seq(repeat($.decorator_expr), "class", field('name', $.ident), optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat($.class_member), "}"), seq(repeat($.decorator_expr), "class", optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat($.class_member), "}"), seq("<", $.type, ">", $.expr) ), - prop: $ => choice(seq("...", $.expr), seq(repeat(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "async", "export", "declare", "in", "out")), choice("get", "set"), $.member_name, "(", optional(optional(seq($.param, repeat(seq(",", $.param)), optional(",")))), ")", optional(seq(":", $.type)), $.block), seq(repeat1(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "async", "export", "declare", "in", "out")), optional("*"), $.member_name, optional("?"), optional("!"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), seq(optional("async"), optional("*"), $.member_name, optional("?"), optional("!"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), seq(repeat1(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "async", "export", "declare", "in", "out")), $.member_name, optional("?"), optional("!"), ":", $.expr), seq($.member_name, optional("?"), optional("!"), ":", $.expr), seq("[", $.expr, repeat(seq(",", $.expr)), "]", ":", $.expr), seq($.ident, optional("?"), optional("!"), optional(seq("=", $.expr)))), + prop: $ => choice(seq("...", $.expr), seq(repeat(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "export", "declare", "in", "out")), choice("get", "set"), $.member_name, "(", optional(optional(seq($.param, repeat(seq(",", $.param)), optional(",")))), ")", optional(seq(":", $.type)), optional($.block)), seq(repeat1(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "export", "declare", "in", "out")), optional("*"), $.member_name, optional("?"), optional("!"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), seq("async", repeat(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "export", "declare", "in", "out")), "*", $.member_name, optional("?"), optional("!"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), seq("async", repeat(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "export", "declare", "in", "out")), choice("get", "set"), $.member_name, "(", optional(optional(seq($.param, repeat(seq(",", $.param)), optional(",")))), ")", optional(seq(":", $.type)), optional($.block)), seq("async", repeat(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "export", "declare", "in", "out")), $.member_name, optional("?"), optional("!"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), seq("*", $.member_name, optional("?"), optional("!"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), seq($.member_name, optional("?"), optional("!"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), seq(repeat1(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "export", "declare", "in", "out")), $.member_name, optional("?"), optional("!"), ":", $.expr), seq($.member_name, optional("?"), optional("!"), ":", $.expr), seq("[", $.expr, repeat(seq(",", $.expr)), "]", ":", $.expr), seq($.ident, optional("?"), optional("!"), optional(seq("=", $.expr)))), member_name: $ => choice($.ident, $.private_field, $.string, $.number, $.hex_number, $.octal_number, $.binary_number, $.big_int, seq("[", $.expr, "]")), - new_target: $ => choice($.ident, seq($.new_target, ".", $.ident), seq($.new_target, "[", $.expr, "]"), seq("(", $.expr, ")")), + new_target: $ => choice($.ident, seq("new", $.new_target, optional(choice(seq("<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">", optional(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"))), seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")")))), seq($.new_target, ".", $.ident), seq($.new_target, "[", $.expr, "]"), seq("(", $.expr, ")")), - class_heritage: $ => choice($.ident, $.number, $.string, "true", "false", "null", "undefined", seq("(", $.expr, ")"), seq("class", optional($.ident), optional($.type_params), optional(seq("extends", $.class_heritage)), optional(seq("implements", optional(seq($.type, repeat(seq(",", $.type)), optional(","))))), "{", repeat($.class_member), "}"), seq($.class_heritage, ".", $.ident), seq($.class_heritage, "?.", $.ident), seq($.class_heritage, "<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">"), seq($.class_heritage, "(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")")), + class_heritage: $ => choice($.number, $.string, "true", "false", "null", "undefined", $.ident, seq("(", $.expr, ")"), seq("class", optional($.ident), optional($.type_params), optional(seq("extends", $.class_heritage)), optional(seq("implements", optional(seq($.type, repeat(seq(",", $.type)), optional(","))))), "{", repeat($.class_member), "}"), seq($.class_heritage, ".", $.ident), seq($.class_heritage, "?.", $.ident), seq($.class_heritage, "<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">"), seq($.class_heritage, "(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")")), - stmt: $ => choice($.block, seq(choice("let", "const", "var"), optional(seq($.binding, repeat(seq(",", $.binding)), optional(","))), optional(";")), seq("if", "(", $.expr, repeat(seq(",", $.expr)), ")", $.stmt, optional(seq("else", $.stmt))), seq("for", optional("await"), "(", $.for_head, ")", $.stmt), seq("while", "(", $.expr, repeat(seq(",", $.expr)), ")", $.stmt), seq("do", $.stmt, "while", "(", $.expr, repeat(seq(",", $.expr)), ")", optional(";")), seq("switch", "(", $.expr, repeat(seq(",", $.expr)), ")", "{", repeat($.switch_case), "}"), seq("return", optional(seq($.expr, repeat(seq(",", $.expr)))), optional(";")), seq("throw", $.expr, repeat(seq(",", $.expr)), optional(";")), seq("break", optional($.ident), optional(";")), seq("continue", optional($.ident), optional(";")), seq("try", $.block, optional(seq("catch", optional(seq("(", choice($.param, $.binding_pattern), ")")), $.block)), optional(seq("finally", $.block))), seq($.ident, ":", $.stmt), ";", seq("debugger", optional(";")), seq("with", "(", $.expr, ")", $.stmt), seq(optional("await"), "using", optional(seq($.binding, repeat(seq(",", $.binding)), optional(","))), optional(";")), $.decl, seq($.expr, repeat(seq(",", $.expr)), optional(";"))), + stmt: $ => choice($.block, seq(choice("let", "const", "var"), optional(seq($.binding, repeat(seq(",", $.binding)), optional(","))), choice(";", blank(), blank())), seq("if", "(", $.expr, repeat(seq(",", $.expr)), ")", $.stmt, optional(seq("else", $.stmt))), seq("for", optional("await"), "(", $.for_head, ")", $.stmt), seq("while", "(", $.expr, repeat(seq(",", $.expr)), ")", $.stmt), seq("do", $.stmt, "while", "(", $.expr, repeat(seq(",", $.expr)), ")", optional(";")), seq("switch", "(", $.expr, repeat(seq(",", $.expr)), ")", "{", repeat($.switch_case), "}"), seq("return", optional(seq($.expr, repeat(seq(",", $.expr)))), choice(";", blank(), blank())), seq("throw", $.expr, repeat(seq(",", $.expr)), choice(";", blank(), blank())), seq("break", optional($.ident), choice(";", blank(), blank())), seq("continue", optional($.ident), choice(";", blank(), blank())), seq("try", $.block, optional(seq("catch", optional(seq("(", choice($.param, $.binding_pattern), ")")), $.block)), optional(seq("finally", $.block))), seq($.ident, ":", $.stmt), ";", seq("debugger", choice(";", blank(), blank())), seq("with", "(", $.expr, ")", $.stmt), seq(optional("await"), "using", optional(seq($.binding, repeat(seq(",", $.binding)), optional(","))), choice(";", blank(), blank())), $.decl, seq($.expr, repeat(seq(",", $.expr)), choice(";", blank(), blank()))), block: $ => seq("{", repeat($.stmt), "}"), @@ -206,29 +212,29 @@ module.exports = grammar({ for_binding: $ => seq(choice(seq($.ident, optional("!")), $.binding_pattern), optional(seq(":", $.type)), optional(seq("=", $.expr))), - param: $ => choice(seq("this", ":", $.type), seq(optional($.decorator_expr), repeat1(choice("public", "private", "protected", "readonly", "override", "static", "abstract", "accessor", "async", "export", "declare", "in", "out")), choice(seq($.ident, optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr))), seq($.binding_pattern, optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr))), seq("...", choice($.ident, $.binding_pattern), optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr))))), seq(optional($.decorator_expr), choice(seq($.ident, optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr))), seq($.binding_pattern, optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr))), seq("...", choice($.ident, $.binding_pattern), optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr)))))), + param: $ => choice(seq("this", optional(seq(":", $.type))), seq(optional($.decorator_expr), repeat1(choice("public", "private", "protected", "readonly", "override", "static", "abstract", "accessor", "async", "export", "declare", "in", "out")), choice(seq($.ident, optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr))), seq($.binding_pattern, optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr))), seq("...", choice($.ident, $.binding_pattern), optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr))))), seq(optional($.decorator_expr), choice(seq($.ident, optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr))), seq($.binding_pattern, optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr))), seq("...", choice($.ident, $.binding_pattern), optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr)))))), - for_head: $ => choice(seq(choice("let", "const", "var", "using", seq("await", "using")), optional(seq($.for_binding, repeat(seq(",", $.for_binding)), optional(","))), choice(seq(";", optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr))))), seq(choice("in", "of"), $.expr))), seq(optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr))))), seq($.expr, choice("in", "of"), $.expr)), + for_head: $ => choice(seq(choice("let", "const", "var", "using", seq("await", "using")), optional(seq($.for_binding, repeat(seq(",", $.for_binding)), optional(","))), choice(seq(";", optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr))))), seq("in", $.expr, repeat(seq(",", $.expr))), seq("of", $.expr))), seq(optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr))))), seq($.expr, "in", $.expr, repeat(seq(",", $.expr))), seq($.expr, "of", $.expr)), switch_case: $ => choice(seq("case", $.expr, repeat(seq(",", $.expr)), ":"), seq("default", ":"), $.stmt), type_params: $ => seq("<", optional(seq($.type_param, repeat(seq(",", $.type_param)), optional(","))), ">"), - type_param: $ => choice(seq(repeat1(choice("const", "in", "out", "public", "private", "protected", "readonly")), $.ident, optional(seq("extends", $.type)), optional(seq("=", $.type))), seq(choice("const", "in", "out", "public", "private", "protected", "readonly"), choice($.ident, "in", "out"), optional(seq("extends", $.type)), optional(seq("=", $.type))), seq(choice($.ident, "in", "out"), optional(seq("extends", $.type)), optional(seq("=", $.type)))), + type_param: $ => choice(seq(repeat1(choice("const", "in", "out", "public", "private", "protected", "readonly")), $.ident, optional(seq("extends", $.type)), optional(seq("=", $.type))), seq(choice("const", "in", "out", "public", "private", "protected", "readonly"), $.ident, optional(seq("extends", $.type)), optional(seq("=", $.type))), seq($.ident, optional(seq("extends", $.type)), optional(seq("=", $.type)))), - decl: $ => choice(seq(optional("async"), "function", optional("*"), field('name', $.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), choice($.block, optional(";"))), seq("interface", field('name', $.ident), optional($.type_params), optional(seq("extends", optional(seq($.type, repeat(seq(",", $.type)), optional(","))))), "{", repeat(seq($.interface_member, optional(choice(";", ",")))), "}"), seq("type", field('name', $.ident), optional($.type_params), "=", $.type, optional(";")), seq(repeat($.decorator_expr), optional("abstract"), "class", field('name', $.ident), optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat($.class_member), "}"), seq(repeat($.decorator_expr), optional("abstract"), "class", optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat($.class_member), "}"), seq("enum", field('name', $.ident), "{", optional(seq($.enum_member, repeat(seq(",", $.enum_member)), optional(","))), "}"), seq("declare", "function", optional("*"), field('name', $.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), optional(";")), seq("declare", choice($.decl, $.stmt)), seq("namespace", field('name', $.ident), repeat(seq(".", $.ident)), "{", repeat($.stmt), "}"), seq("module", choice(seq($.ident, repeat(seq(".", $.ident))), $.string), "{", repeat($.stmt), "}"), seq("export", choice($.decl, $.stmt)), seq(repeat1($.decorator_expr), choice($.decl, seq(choice("let", "const", "var"), optional(seq($.binding, repeat(seq(",", $.binding)), optional(","))), optional(";")), seq(optional("await"), "using", $.binding, repeat(seq(",", $.binding)), optional(";")))), seq("export", repeat($.decorator_expr), "default", choice(seq(optional("async"), "function", optional("*"), optional($.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), choice($.block, optional(";"))), seq("abstract", "class", field('name', $.ident), optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat($.class_member), "}"), seq("abstract", "class", optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat($.class_member), "}"), seq($.expr, optional(";")))), seq("export", "*", choice(seq("from", $.string, optional(";")), seq("as", $.ident, "from", $.string, optional(";")))), seq("export", "{", optional(seq($.export_specifier, repeat(seq(",", $.export_specifier)), optional(","))), "}", optional(seq("from", $.string)), optional(";")), seq("export", "=", $.expr, optional(";")), seq("export", "type", "{", optional(seq($.export_specifier, repeat(seq(",", $.export_specifier)), optional(","))), "}", optional(seq("from", $.string)), optional(";")), seq("const", "enum", field('name', $.ident), "{", optional(seq($.enum_member, repeat(seq(",", $.enum_member)), optional(","))), "}"), seq("import", choice(seq($.import_clause, "from", $.string, optional(";")), seq("type", $.import_clause, "from", $.string, optional(";")), seq($.ident, "=", $.expr, optional(";")), seq($.string, optional(";")))), seq(repeat($.decorator_expr), "export", choice($.decl, $.stmt))), + decl: $ => choice(seq("function", field('name', $.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), choice($.block, optional(";"))), seq("function", "*", field('name', $.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), choice($.block, optional(";"))), seq("async", "function", field('name', $.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), choice($.block, optional(";"))), seq("async", "function", "*", field('name', $.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), choice($.block, optional(";"))), seq("interface", field('name', $.ident), optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat(seq($.interface_member, optional(choice(";", ",")))), "}"), seq("type", field('name', $.ident), optional($.type_params), "=", $.type, optional(";")), seq(repeat($.decorator_expr), optional("abstract"), "class", field('name', $.ident), optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat($.class_member), "}"), seq(repeat($.decorator_expr), optional("abstract"), "class", optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat($.class_member), "}"), seq("enum", field('name', $.ident), "{", optional(seq($.enum_member, repeat(seq(",", $.enum_member)), optional(","))), "}"), seq("declare", "function", optional("*"), field('name', $.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), optional(";")), seq("declare", "module", $.string, optional(";")), seq("declare", "global", "{", repeat($.stmt), "}"), seq("declare", choice($.decl, $.stmt)), seq(repeat1(choice("abstract", "public", "private", "protected", "readonly", "static", "override", "accessor")), choice($.decl, seq(choice("let", "const", "var"), optional(seq($.binding, repeat(seq(",", $.binding)), optional(","))), choice(";", blank(), blank())), seq(optional("await"), "using", $.binding, repeat(seq(",", $.binding)), optional(";")))), seq("async", $.decl), seq("namespace", field('name', $.ident), repeat(seq(".", $.ident)), "{", repeat($.stmt), "}"), seq("module", choice(seq($.ident, repeat(seq(".", $.ident))), $.string), "{", repeat($.stmt), "}"), seq("export", "as", "namespace", field('name', $.ident), optional(";")), seq("export", choice($.decl, $.stmt)), seq(repeat1($.decorator_expr), choice($.decl, seq(choice("let", "const", "var"), optional(seq($.binding, repeat(seq(",", $.binding)), optional(","))), choice(";", blank(), blank())), seq(optional("await"), "using", $.binding, repeat(seq(",", $.binding)), optional(";")))), seq("export", repeat($.decorator_expr), "default", choice(seq("function", optional($.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), choice($.block, optional(";"))), seq("function", "*", optional($.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), choice($.block, optional(";"))), seq("async", "function", optional($.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), choice($.block, optional(";"))), seq("async", "function", "*", optional($.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), choice($.block, optional(";"))), seq("abstract", "class", field('name', $.ident), optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat($.class_member), "}"), seq("abstract", "class", optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat($.class_member), "}"), seq("interface", field('name', $.ident), optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat(seq($.interface_member, optional(choice(";", ",")))), "}"), seq($.expr, optional(";")))), seq("export", optional("type"), "*", choice(seq("from", $.string, optional(";")), seq("as", choice($.ident, $.string), "from", $.string, optional(";")))), seq("export", "{", optional(seq($.export_specifier, repeat(seq(",", $.export_specifier)), optional(","))), "}", optional(seq("from", $.string)), optional(";")), seq("export", "=", $.expr, optional(";")), seq("export", "type", "{", optional(seq($.export_specifier, repeat(seq(",", $.export_specifier)), optional(","))), "}", optional(seq("from", $.string)), optional(";")), seq("const", "enum", field('name', $.ident), "{", optional(seq($.enum_member, repeat(seq(",", $.enum_member)), optional(","))), "}"), seq("import", choice(seq($.import_clause, "from", $.string, optional(";")), seq("type", $.import_clause, "from", $.string, optional(";")), seq("type", field('name', $.ident), "=", $.expr, optional(";")), seq($.ident, "=", $.expr, optional(";")), seq($.string, optional(";")))), seq(repeat($.decorator_expr), "export", choice($.decl, $.stmt))), interface_member: $ => choice(seq(optional("new"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type))), seq(choice("get", "set"), $.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type))), seq(optional("static"), optional(choice("+", "-")), optional("readonly"), "[", $.ident, "in", $.type, optional(seq("as", $.type)), "]", optional(choice("+", "-")), optional("?"), ":", $.type), seq("readonly", $.member_name, optional("?"), ":", $.type), seq($.member_name, optional("?"), choice(seq(optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type))), optional(seq(":", $.type)))), seq(optional("static"), optional("readonly"), "[", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), "]", optional(seq(":", $.type)))), - class_member: $ => choice(";", $.decorator_expr, seq("constructor", "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block, optional(";")), seq("static", $.block), seq(repeat(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "async")), choice(seq("*", $.member_name, optional("?"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), optional($.block), optional(";")), seq(choice("get", "set"), $.member_name, "(", optional(optional(seq($.param, repeat(seq(",", $.param)), optional(",")))), ")", optional(seq(":", $.type)), optional($.block), optional(";")), seq("[", $.ident, ":", $.type, "]", ":", $.type, optional(";")), seq($.member_name, choice(seq(optional("?"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), optional($.block), optional(";")), seq(optional("!"), optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr)), optional(";")))))), seq($.member_name, optional("!"), optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr)), optional(";")), seq($.member_name, optional("?"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), optional($.block), optional(";"))), + class_member: $ => choice(";", seq("constructor", "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block, optional(";")), seq(repeat($.decorator_expr), repeat(choice(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "declare", "export", "in", "out", "const"))), "static", $.block), seq(repeat($.decorator_expr), repeat(choice(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "declare", "export", "in", "out", "const"))), choice(seq("async", repeat(choice(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "declare", "export", "in", "out", "const"))), "*", $.member_name, optional("?"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), optional($.block), optional(";")), seq("async", repeat(choice(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "declare", "export", "in", "out", "const"))), choice("get", "set"), $.member_name, optional($.type_params), "(", optional(optional(seq($.param, repeat(seq(",", $.param)), optional(",")))), ")", optional(seq(":", $.type)), optional($.block), optional(";")), seq("async", repeat(choice(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "declare", "export", "in", "out", "const"))), "static", $.block), seq("async", repeat(choice(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "declare", "export", "in", "out", "const"))), $.member_name, optional("?"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), optional($.block), optional(";")), seq("*", $.member_name, optional("?"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), optional($.block), optional(";")), seq(choice("get", "set"), $.member_name, optional($.type_params), "(", optional(optional(seq($.param, repeat(seq(",", $.param)), optional(",")))), ")", optional(seq(":", $.type)), optional($.block), optional(";")), seq("[", $.ident, ":", $.type, optional(","), "]", optional(seq(":", $.type)), choice(";", blank(), blank())), seq("constructor", optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), optional($.block), optional(";")), seq($.member_name, choice(seq(optional("?"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), optional($.block), optional(";")), seq(optional("!"), optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr)), choice(";", blank(), blank())))))), seq($.member_name, optional("!"), optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr)), choice(";", blank(), blank())), seq($.member_name, optional("?"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), optional($.block), optional(";"))), enum_member: $ => seq($.member_name, optional(seq("=", $.expr))), import_clause: $ => choice(seq("defer", "*", "as", $.ident), seq($.ident, optional(seq(",", choice(seq("{", optional(seq($.import_specifier, repeat(seq(",", $.import_specifier)), optional(","))), "}"), seq("*", "as", $.ident))))), seq("{", optional(seq($.import_specifier, repeat(seq(",", $.import_specifier)), optional(","))), "}"), seq("*", "as", $.ident)), - import_specifier: $ => choice(seq($.ident, optional(seq("as", $.ident))), seq($.string, "as", $.ident)), + import_specifier: $ => choice(seq(optional("type"), $.ident, optional(seq("as", $.ident))), seq(optional("type"), $.string, "as", $.ident)), - export_specifier: $ => seq(choice($.ident, $.string), optional(seq("as", choice($.ident, $.string)))), + export_specifier: $ => choice(seq("type", choice($.ident, $.string), optional(seq("as", choice($.ident, $.string)))), seq("type", "as", choice(blank(), seq("as", choice($.ident, $.string)))), seq(choice($.ident, $.string), optional(seq("as", choice($.ident, $.string))))), shebang: $ => token(/#![^\n]*/), @@ -250,7 +256,7 @@ module.exports = grammar({ big_int: $ => token(/[0-9]+(?:_[0-9]+)*n/), - number: $ => token(/(?:[0-9]+(?:_[0-9]+)*(?:\.[0-9]*(?:_[0-9]+)*)?|\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\-]?[0-9]+(?:_[0-9]+)*)?/), + number: $ => token(/(?:(?:0|[1-9][0-9]*(?:_[0-9]+)*)(?:\.[0-9]*(?:_[0-9]+)*)?|\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\-]?[0-9]+(?:_[0-9]+)*)?/), string: $ => token(/"(?:[^"\\]|\\(?:u\{0*(?:[0-9A-Fa-f]{1,5}|10[0-9A-Fa-f]{4})\}|u[0-9A-Fa-f]{4}|x[0-9A-Fa-f]{2}|[^ux]))*"|'(?:[^'\\]|\\(?:u\{0*(?:[0-9A-Fa-f]{1,5}|10[0-9A-Fa-f]{4})\}|u[0-9A-Fa-f]{4}|x[0-9A-Fa-f]{2}|[^ux]))*'/), diff --git a/tree-sitter/typescript/queries/highlights.scm b/tree-sitter/typescript/queries/highlights.scm index 92d92f5..de7d11a 100644 --- a/tree-sitter/typescript/queries/highlights.scm +++ b/tree-sitter/typescript/queries/highlights.scm @@ -62,15 +62,15 @@ ;; Builtin / global / constant identifier names. ((ident) @variable.builtin - (#any-of? @variable.builtin "console" "window" "document" "process" "require" "exports" "global" "globalThis")) + (#any-of? @variable.builtin "console" "window" "document" "process" "require" "exports" "globalThis")) ;; Keyword, operator, and punctuation literals. [ "implements" "interface" "namespace" "protected" "debugger" "readonly" "abstract" "override" "accessor" "default" "private" "declare" "extends" "switch" "export" "module" - "public" "static" "unique" "const" "using" "class" "async" "case" - "with" "from" "type" "enum" "@new" "meta" "let" "var" - "get" "set" "out" + "public" "static" "unique" "target" "const" "using" "class" "async" + "case" "with" "from" "type" "enum" "@new" "meta" "let" + "var" "get" "set" "out" ] @keyword [ "constructor" "function" "=>" @@ -94,12 +94,11 @@ "instanceof" "satisfies" "asserts" "typeof" "delete" "keyof" "infer" "void" "new" "as" "is" ] @keyword.operator -"symbol" @type.builtin [ "undefined" "false" "true" "null" ] @constant.builtin [ - "super" "this" + "global" "super" "this" ] @variable.builtin [ ">>>=" "**=" "<<=" ">>=" "??=" "||=" "&&=" "===" diff --git a/tree-sitter/typescript/src/scanner.c b/tree-sitter/typescript/src/scanner.c index 4656509..9c4d854 100644 --- a/tree-sitter/typescript/src/scanner.c +++ b/tree-sitter/typescript/src/scanner.c @@ -50,7 +50,7 @@ static inline void skip(TSLexer *lexer) { lexer->advance(lexer, true); } // regex-vs-division decision is already made by the LR context. We only // need to scan the literal body here. // -// Regex flag characters (derived from the token pattern): "gimsuydv" +// Regex flag characters (derived from the token pattern): "gimsuyd" // Division-after texts (informational; LR ctx handles these): ) ] ++ -- this super true false null undefined // Regex-after keywords (informational): in of instanceof typeof delete void await yield throw return case do else new static bool scan_regex(TSLexer *lexer) { @@ -69,7 +69,7 @@ static bool scan_regex(TSLexer *lexer) { advance(lexer); } // Trailing flag characters. - const char *flags = "gimsuydv"; + const char *flags = "gimsuyd"; while (lexer->lookahead != 0 && strchr(flags, (char)lexer->lookahead) != NULL) advance(lexer); lexer->result_symbol = REGEX_LITERAL; lexer->mark_end(lexer); diff --git a/tree-sitter/typescriptreact/grammar.js b/tree-sitter/typescriptreact/grammar.js index f0d68db..f728184 100644 --- a/tree-sitter/typescriptreact/grammar.js +++ b/tree-sitter/typescriptreact/grammar.js @@ -34,6 +34,7 @@ module.exports = grammar({ [$.stmt, $.decl], [$.expr, $.decl], [$.program, $.stmt], + [$.new_target], [$.type, $.type_param], [$.type_param], [$.expr, $.param], @@ -111,9 +112,9 @@ module.exports = grammar({ rules: { program: $ => repeat(choice($.decl, $.stmt)), - type: $ => choice(seq($.ident, optional(seq("is", $.type))), seq($.type, "<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">"), seq($.type, "[", "]"), seq($.type, "|", $.type), seq($.type, "&", $.type), seq("|", $.type), seq("&", $.type), seq("keyof", $.type), seq("typeof", $.typeof_ref), seq("readonly", $.type), seq("(", $.type, ")"), seq(optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", "=>", $.type), seq(optional("abstract"), "new", optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", "=>", $.type), seq("[", repeat(seq(optional("..."), optional(seq($.ident, optional("?"), ":")), optional("..."), $.type, optional("?"), optional(","))), "]"), seq("{", repeat(seq($.type_member, optional(choice(";", ",")))), "}"), seq("asserts", $.ident, optional(seq("is", $.type))), seq($.type, "extends", $.type, "?", $.type, ":", $.type), seq("infer", $.ident, optional(seq("extends", $.type, optional(seq("?", $.type, ":", $.type))))), $.string, $.number, $.hex_number, $.octal_number, $.binary_number, $.big_int, seq("-", choice($.number, $.big_int)), "true", "false", "null", "undefined", "void", "this", seq("unique", "symbol"), seq("import", "(", $.type, ")"), $.template, seq($.type, "[", $.type, "]"), seq($.type, ".", $.ident), seq($.type, ".", "<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">"), seq("?", $.type), seq("!", $.type), "?", "*", seq("function", "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type))), seq($.type, "?"), seq($.type, "!")), + type: $ => choice(seq($.ident, optional(seq("is", $.type))), seq($.type, "<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">"), seq($.type, "[", "]"), seq($.type, "|", $.type), seq($.type, "&", $.type), seq("|", $.type), seq("&", $.type), seq("keyof", $.type), seq("typeof", $.typeof_ref), seq("readonly", $.type), seq("(", $.type, ")"), seq(optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", "=>", $.type), seq(optional("abstract"), "new", optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", "=>", $.type), seq("[", repeat(seq(optional("..."), optional(seq($.ident, optional("?"), ":")), optional("..."), $.type, optional("?"), choice(",", blank()))), "]"), seq("{", repeat(seq($.type_member, choice(";", ",", blank(), blank()))), "}"), seq("asserts", $.ident, optional(seq("is", $.type))), seq($.type, "extends", $.type, "?", $.type, ":", $.type), seq("infer", $.ident, optional(seq("extends", $.type, optional(seq("?", $.type, ":", $.type))))), $.string, $.number, $.hex_number, $.octal_number, $.binary_number, $.big_int, seq("-", choice($.number, $.big_int)), "true", "false", "null", "undefined", "void", "this", seq("unique", $.type), seq("import", "(", $.type, ")"), $.template, seq($.type, "[", $.type, "]"), seq($.type, ".", $.ident), seq($.type, ".", "<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">"), seq("?", $.type), seq("!", $.type), "?", "*", seq("function", "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type))), seq($.type, "?"), seq($.type, "!")), - type_member: $ => choice(seq(optional("new"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type))), seq(optional(choice("+", "-")), optional("readonly"), "[", choice(seq($.ident, choice(seq("in", $.type, optional(seq("as", $.type)), "]", optional(choice("+", "-")), optional("?"), ":", $.type), seq(":", $.type, "]", optional(seq(":", $.type))))), seq($.expr, "]", optional("?"), choice(seq(optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type))), optional(seq(":", $.type)))), seq("]", optional(seq(":", $.type))))), seq("readonly", $.ident, optional("?"), ":", $.type), seq(choice($.ident, $.number, $.string, $.private_field), optional("?"), choice(seq(optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type))), optional(seq(":", $.type))))), + type_member: $ => choice(seq(optional("new"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type))), seq(optional(choice("+", "-")), optional("readonly"), "[", choice(seq($.ident, choice(seq("in", $.type, optional(seq("as", $.type)), "]", optional(choice("+", "-")), optional("?"), ":", $.type), seq(":", $.type, optional(","), "]", optional(seq(":", $.type))))), seq($.expr, "]", optional("?"), choice(seq(optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type))), optional(seq(":", $.type)))), seq("]", optional(seq(":", $.type))))), seq("readonly", $.ident, optional("?"), ":", $.type), seq(choice($.ident, $.number, $.string, $.private_field), optional("?"), choice(seq(optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type))), optional(seq(":", $.type))))), decorator_expr: $ => choice(seq($.decorator, repeat(choice(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), seq("<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">", "(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), "!", seq(".", choice($.ident, $.private_field)), seq("?.", choice($.ident, $.private_field, seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), seq("[", $.expr, "]"))), $.template))), seq("@new", $.new_target, optional(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")")))), @@ -143,7 +144,7 @@ module.exports = grammar({ "null", "undefined", "this", - "super", + seq("super", choice(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), seq(".", choice($.ident, $.private_field)), seq("[", $.expr, "]"))), $.ident, $.number, $.string, @@ -154,7 +155,7 @@ module.exports = grammar({ prec.left(18, seq($.expr, "<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">")), prec.left(18, seq($.expr, "(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")")), prec.left(18, seq($.expr, ".", choice($.ident, $.private_field))), - prec.left(18, seq($.expr, "?.", choice($.ident, $.private_field, seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), seq("[", $.expr, "]"), $.template))), + prec.left(18, seq($.expr, "?.", choice($.ident, $.private_field, seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), seq("[", $.expr, "]"), $.template, seq("<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">", "(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")")))), prec.left(18, seq($.expr, "[", $.expr, "]")), prec.left(18, seq($.expr, "!")), prec.left(18, seq($.expr, "?", $.expr, ":", $.expr)), @@ -162,37 +163,42 @@ module.exports = grammar({ prec.left(18, seq($.expr, "instanceof", $.expr)), prec.left(18, seq($.expr, "in", $.expr)), prec.left(18, seq($.expr, $.template)), - seq("new", $.new_target, optional(choice(seq("<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">", optional(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"))), seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")")))), - seq("new", "class", field('name', $.ident), optional($.type_params), optional(seq("extends", $.class_heritage)), optional(seq("implements", optional(seq($.type, repeat(seq(",", $.type)), optional(","))))), "{", repeat($.class_member), "}", optional(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"))), - seq("new", "class", optional($.type_params), optional(seq("extends", $.class_heritage)), optional(seq("implements", optional(seq($.type, repeat(seq(",", $.type)), optional(","))))), "{", repeat($.class_member), "}", optional(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"))), + seq("new", ".", "target"), + seq("new", $.new_target, choice(seq("<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">", choice(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), blank())), seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), blank())), + seq("new", "class", field('name', $.ident), optional($.type_params), optional(seq("extends", $.class_heritage)), optional(seq("implements", optional(seq($.type, repeat(seq(",", $.type)), optional(","))))), "{", repeat($.class_member), "}", choice(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), blank())), + seq("new", "class", optional($.type_params), optional(seq("extends", $.class_heritage)), optional(seq("implements", optional(seq($.type, repeat(seq(",", $.type)), optional(","))))), "{", repeat($.class_member), "}", choice(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"), blank())), seq("[", repeat(seq(optional($.expr), ",")), optional($.expr), "]"), seq("{", optional(seq($.prop, repeat(seq(",", $.prop)), optional(","))), "}"), - seq(optional("async"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), "=>", choice($.expr, $.block)), - seq("async", $.ident, "=>", choice($.expr, $.block)), - seq($.ident, "=>", choice($.expr, $.block)), + seq("async", optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), "=>", choice($.block, $.expr)), + seq(optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), "=>", choice($.block, $.expr)), + seq("async", $.ident, "=>", choice($.block, $.expr)), + seq($.ident, "=>", choice($.block, $.expr)), seq("yield", choice(seq("*", $.expr), optional($.expr))), seq("(", $.expr, repeat(seq(",", $.expr)), ")"), prec.left(18, seq($.expr, "satisfies", $.type)), - seq("import", choice(seq("(", $.expr, ")"), seq(".", "meta"))), + seq("import", choice(seq("(", $.expr, ")"), seq(".", "meta"), seq("<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">", optional(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"))))), $.private_field, $.hex_number, $.octal_number, $.binary_number, $.big_int, - seq(optional("async"), "function", optional("*"), optional($.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), + seq("function", optional($.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), + seq("function", "*", optional($.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), + seq("async", "function", optional($.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), + seq("async", "function", "*", optional($.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), seq(repeat($.decorator_expr), "class", field('name', $.ident), optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat($.class_member), "}"), seq(repeat($.decorator_expr), "class", optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat($.class_member), "}") ), - prop: $ => choice(seq("...", $.expr), seq(repeat(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "async", "export", "declare", "in", "out")), choice("get", "set"), $.member_name, "(", optional(optional(seq($.param, repeat(seq(",", $.param)), optional(",")))), ")", optional(seq(":", $.type)), $.block), seq(repeat1(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "async", "export", "declare", "in", "out")), optional("*"), $.member_name, optional("?"), optional("!"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), seq(optional("async"), optional("*"), $.member_name, optional("?"), optional("!"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), seq(repeat1(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "async", "export", "declare", "in", "out")), $.member_name, optional("?"), optional("!"), ":", $.expr), seq($.member_name, optional("?"), optional("!"), ":", $.expr), seq("[", $.expr, repeat(seq(",", $.expr)), "]", ":", $.expr), seq($.ident, optional("?"), optional("!"), optional(seq("=", $.expr)))), + prop: $ => choice(seq("...", $.expr), seq(repeat(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "export", "declare", "in", "out")), choice("get", "set"), $.member_name, "(", optional(optional(seq($.param, repeat(seq(",", $.param)), optional(",")))), ")", optional(seq(":", $.type)), optional($.block)), seq(repeat1(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "export", "declare", "in", "out")), optional("*"), $.member_name, optional("?"), optional("!"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), seq("async", repeat(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "export", "declare", "in", "out")), "*", $.member_name, optional("?"), optional("!"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), seq("async", repeat(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "export", "declare", "in", "out")), choice("get", "set"), $.member_name, "(", optional(optional(seq($.param, repeat(seq(",", $.param)), optional(",")))), ")", optional(seq(":", $.type)), optional($.block)), seq("async", repeat(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "export", "declare", "in", "out")), $.member_name, optional("?"), optional("!"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), seq("*", $.member_name, optional("?"), optional("!"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), seq($.member_name, optional("?"), optional("!"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), $.block), seq(repeat1(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "export", "declare", "in", "out")), $.member_name, optional("?"), optional("!"), ":", $.expr), seq($.member_name, optional("?"), optional("!"), ":", $.expr), seq("[", $.expr, repeat(seq(",", $.expr)), "]", ":", $.expr), seq($.ident, optional("?"), optional("!"), optional(seq("=", $.expr)))), member_name: $ => choice($.ident, $.private_field, $.string, $.number, $.hex_number, $.octal_number, $.binary_number, $.big_int, seq("[", $.expr, "]")), - new_target: $ => choice($.ident, seq($.new_target, ".", $.ident), seq($.new_target, "[", $.expr, "]"), seq("(", $.expr, ")")), + new_target: $ => choice($.ident, seq("new", $.new_target, optional(choice(seq("<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">", optional(seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")"))), seq("(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")")))), seq($.new_target, ".", $.ident), seq($.new_target, "[", $.expr, "]"), seq("(", $.expr, ")")), - class_heritage: $ => choice($.ident, $.number, $.string, "true", "false", "null", "undefined", seq("(", $.expr, ")"), seq("class", optional($.ident), optional($.type_params), optional(seq("extends", $.class_heritage)), optional(seq("implements", optional(seq($.type, repeat(seq(",", $.type)), optional(","))))), "{", repeat($.class_member), "}"), seq($.class_heritage, ".", $.ident), seq($.class_heritage, "?.", $.ident), seq($.class_heritage, "<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">"), seq($.class_heritage, "(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")")), + class_heritage: $ => choice($.number, $.string, "true", "false", "null", "undefined", $.ident, seq("(", $.expr, ")"), seq("class", optional($.ident), optional($.type_params), optional(seq("extends", $.class_heritage)), optional(seq("implements", optional(seq($.type, repeat(seq(",", $.type)), optional(","))))), "{", repeat($.class_member), "}"), seq($.class_heritage, ".", $.ident), seq($.class_heritage, "?.", $.ident), seq($.class_heritage, "<", optional(seq($.type, repeat(seq(",", $.type)), optional(","))), ">"), seq($.class_heritage, "(", optional(seq($.expr, repeat(seq(",", $.expr)), optional(","))), ")")), - stmt: $ => choice($.block, seq(choice("let", "const", "var"), optional(seq($.binding, repeat(seq(",", $.binding)), optional(","))), optional(";")), seq("if", "(", $.expr, repeat(seq(",", $.expr)), ")", $.stmt, optional(seq("else", $.stmt))), seq("for", optional("await"), "(", $.for_head, ")", $.stmt), seq("while", "(", $.expr, repeat(seq(",", $.expr)), ")", $.stmt), seq("do", $.stmt, "while", "(", $.expr, repeat(seq(",", $.expr)), ")", optional(";")), seq("switch", "(", $.expr, repeat(seq(",", $.expr)), ")", "{", repeat($.switch_case), "}"), seq("return", optional(seq($.expr, repeat(seq(",", $.expr)))), optional(";")), seq("throw", $.expr, repeat(seq(",", $.expr)), optional(";")), seq("break", optional($.ident), optional(";")), seq("continue", optional($.ident), optional(";")), seq("try", $.block, optional(seq("catch", optional(seq("(", choice($.param, $.binding_pattern), ")")), $.block)), optional(seq("finally", $.block))), seq($.ident, ":", $.stmt), ";", seq("debugger", optional(";")), seq("with", "(", $.expr, ")", $.stmt), seq(optional("await"), "using", optional(seq($.binding, repeat(seq(",", $.binding)), optional(","))), optional(";")), $.decl, seq($.expr, repeat(seq(",", $.expr)), optional(";"))), + stmt: $ => choice($.block, seq(choice("let", "const", "var"), optional(seq($.binding, repeat(seq(",", $.binding)), optional(","))), choice(";", blank(), blank())), seq("if", "(", $.expr, repeat(seq(",", $.expr)), ")", $.stmt, optional(seq("else", $.stmt))), seq("for", optional("await"), "(", $.for_head, ")", $.stmt), seq("while", "(", $.expr, repeat(seq(",", $.expr)), ")", $.stmt), seq("do", $.stmt, "while", "(", $.expr, repeat(seq(",", $.expr)), ")", optional(";")), seq("switch", "(", $.expr, repeat(seq(",", $.expr)), ")", "{", repeat($.switch_case), "}"), seq("return", optional(seq($.expr, repeat(seq(",", $.expr)))), choice(";", blank(), blank())), seq("throw", $.expr, repeat(seq(",", $.expr)), choice(";", blank(), blank())), seq("break", optional($.ident), choice(";", blank(), blank())), seq("continue", optional($.ident), choice(";", blank(), blank())), seq("try", $.block, optional(seq("catch", optional(seq("(", choice($.param, $.binding_pattern), ")")), $.block)), optional(seq("finally", $.block))), seq($.ident, ":", $.stmt), ";", seq("debugger", choice(";", blank(), blank())), seq("with", "(", $.expr, ")", $.stmt), seq(optional("await"), "using", optional(seq($.binding, repeat(seq(",", $.binding)), optional(","))), choice(";", blank(), blank())), $.decl, seq($.expr, repeat(seq(",", $.expr)), choice(";", blank(), blank()))), block: $ => seq("{", repeat($.stmt), "}"), @@ -208,29 +214,29 @@ module.exports = grammar({ for_binding: $ => seq(choice(seq($.ident, optional("!")), $.binding_pattern), optional(seq(":", $.type)), optional(seq("=", $.expr))), - param: $ => choice(seq("this", ":", $.type), seq(optional($.decorator_expr), repeat1(choice("public", "private", "protected", "readonly", "override", "static", "abstract", "accessor", "async", "export", "declare", "in", "out")), choice(seq($.ident, optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr))), seq($.binding_pattern, optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr))), seq("...", choice($.ident, $.binding_pattern), optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr))))), seq(optional($.decorator_expr), choice(seq($.ident, optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr))), seq($.binding_pattern, optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr))), seq("...", choice($.ident, $.binding_pattern), optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr)))))), + param: $ => choice(seq("this", optional(seq(":", $.type))), seq(optional($.decorator_expr), repeat1(choice("public", "private", "protected", "readonly", "override", "static", "abstract", "accessor", "async", "export", "declare", "in", "out")), choice(seq($.ident, optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr))), seq($.binding_pattern, optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr))), seq("...", choice($.ident, $.binding_pattern), optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr))))), seq(optional($.decorator_expr), choice(seq($.ident, optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr))), seq($.binding_pattern, optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr))), seq("...", choice($.ident, $.binding_pattern), optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr)))))), - for_head: $ => choice(seq(choice("let", "const", "var", "using", seq("await", "using")), optional(seq($.for_binding, repeat(seq(",", $.for_binding)), optional(","))), choice(seq(";", optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr))))), seq(choice("in", "of"), $.expr))), seq(optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr))))), seq($.expr, choice("in", "of"), $.expr)), + for_head: $ => choice(seq(choice("let", "const", "var", "using", seq("await", "using")), optional(seq($.for_binding, repeat(seq(",", $.for_binding)), optional(","))), choice(seq(";", optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr))))), seq("in", $.expr, repeat(seq(",", $.expr))), seq("of", $.expr))), seq(optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr)))), ";", optional(seq($.expr, repeat(seq(",", $.expr))))), seq($.expr, "in", $.expr, repeat(seq(",", $.expr))), seq($.expr, "of", $.expr)), switch_case: $ => choice(seq("case", $.expr, repeat(seq(",", $.expr)), ":"), seq("default", ":"), $.stmt), type_params: $ => seq("<", optional(seq($.type_param, repeat(seq(",", $.type_param)), optional(","))), ">"), - type_param: $ => choice(seq(repeat1(choice("const", "in", "out", "public", "private", "protected", "readonly")), $.ident, optional(seq("extends", $.type)), optional(seq("=", $.type))), seq(choice("const", "in", "out", "public", "private", "protected", "readonly"), choice($.ident, "in", "out"), optional(seq("extends", $.type)), optional(seq("=", $.type))), seq(choice($.ident, "in", "out"), optional(seq("extends", $.type)), optional(seq("=", $.type)))), + type_param: $ => choice(seq(repeat1(choice("const", "in", "out", "public", "private", "protected", "readonly")), $.ident, optional(seq("extends", $.type)), optional(seq("=", $.type))), seq(choice("const", "in", "out", "public", "private", "protected", "readonly"), $.ident, optional(seq("extends", $.type)), optional(seq("=", $.type))), seq($.ident, optional(seq("extends", $.type)), optional(seq("=", $.type)))), - decl: $ => choice(seq(optional("async"), "function", optional("*"), field('name', $.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), choice($.block, optional(";"))), seq("interface", field('name', $.ident), optional($.type_params), optional(seq("extends", optional(seq($.type, repeat(seq(",", $.type)), optional(","))))), "{", repeat(seq($.interface_member, optional(choice(";", ",")))), "}"), seq("type", field('name', $.ident), optional($.type_params), "=", $.type, optional(";")), seq(repeat($.decorator_expr), optional("abstract"), "class", field('name', $.ident), optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat($.class_member), "}"), seq(repeat($.decorator_expr), optional("abstract"), "class", optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat($.class_member), "}"), seq("enum", field('name', $.ident), "{", optional(seq($.enum_member, repeat(seq(",", $.enum_member)), optional(","))), "}"), seq("declare", "function", optional("*"), field('name', $.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), optional(";")), seq("declare", choice($.decl, $.stmt)), seq("namespace", field('name', $.ident), repeat(seq(".", $.ident)), "{", repeat($.stmt), "}"), seq("module", choice(seq($.ident, repeat(seq(".", $.ident))), $.string), "{", repeat($.stmt), "}"), seq("export", choice($.decl, $.stmt)), seq(repeat1($.decorator_expr), choice($.decl, seq(choice("let", "const", "var"), optional(seq($.binding, repeat(seq(",", $.binding)), optional(","))), optional(";")), seq(optional("await"), "using", $.binding, repeat(seq(",", $.binding)), optional(";")))), seq("export", repeat($.decorator_expr), "default", choice(seq(optional("async"), "function", optional("*"), optional($.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), choice($.block, optional(";"))), seq("abstract", "class", field('name', $.ident), optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat($.class_member), "}"), seq("abstract", "class", optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat($.class_member), "}"), seq($.expr, optional(";")))), seq("export", "*", choice(seq("from", $.string, optional(";")), seq("as", $.ident, "from", $.string, optional(";")))), seq("export", "{", optional(seq($.export_specifier, repeat(seq(",", $.export_specifier)), optional(","))), "}", optional(seq("from", $.string)), optional(";")), seq("export", "=", $.expr, optional(";")), seq("export", "type", "{", optional(seq($.export_specifier, repeat(seq(",", $.export_specifier)), optional(","))), "}", optional(seq("from", $.string)), optional(";")), seq("const", "enum", field('name', $.ident), "{", optional(seq($.enum_member, repeat(seq(",", $.enum_member)), optional(","))), "}"), seq("import", choice(seq($.import_clause, "from", $.string, optional(";")), seq("type", $.import_clause, "from", $.string, optional(";")), seq($.ident, "=", $.expr, optional(";")), seq($.string, optional(";")))), seq(repeat($.decorator_expr), "export", choice($.decl, $.stmt))), + decl: $ => choice(seq("function", field('name', $.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), choice($.block, optional(";"))), seq("function", "*", field('name', $.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), choice($.block, optional(";"))), seq("async", "function", field('name', $.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), choice($.block, optional(";"))), seq("async", "function", "*", field('name', $.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), choice($.block, optional(";"))), seq("interface", field('name', $.ident), optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat(seq($.interface_member, optional(choice(";", ",")))), "}"), seq("type", field('name', $.ident), optional($.type_params), "=", $.type, optional(";")), seq(repeat($.decorator_expr), optional("abstract"), "class", field('name', $.ident), optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat($.class_member), "}"), seq(repeat($.decorator_expr), optional("abstract"), "class", optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat($.class_member), "}"), seq("enum", field('name', $.ident), "{", optional(seq($.enum_member, repeat(seq(",", $.enum_member)), optional(","))), "}"), seq("declare", "function", optional("*"), field('name', $.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), optional(";")), seq("declare", "module", $.string, optional(";")), seq("declare", "global", "{", repeat($.stmt), "}"), seq("declare", choice($.decl, $.stmt)), seq(repeat1(choice("abstract", "public", "private", "protected", "readonly", "static", "override", "accessor")), choice($.decl, seq(choice("let", "const", "var"), optional(seq($.binding, repeat(seq(",", $.binding)), optional(","))), choice(";", blank(), blank())), seq(optional("await"), "using", $.binding, repeat(seq(",", $.binding)), optional(";")))), seq("async", $.decl), seq("namespace", field('name', $.ident), repeat(seq(".", $.ident)), "{", repeat($.stmt), "}"), seq("module", choice(seq($.ident, repeat(seq(".", $.ident))), $.string), "{", repeat($.stmt), "}"), seq("export", "as", "namespace", field('name', $.ident), optional(";")), seq("export", choice($.decl, $.stmt)), seq(repeat1($.decorator_expr), choice($.decl, seq(choice("let", "const", "var"), optional(seq($.binding, repeat(seq(",", $.binding)), optional(","))), choice(";", blank(), blank())), seq(optional("await"), "using", $.binding, repeat(seq(",", $.binding)), optional(";")))), seq("export", repeat($.decorator_expr), "default", choice(seq("function", optional($.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), choice($.block, optional(";"))), seq("function", "*", optional($.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), choice($.block, optional(";"))), seq("async", "function", optional($.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), choice($.block, optional(";"))), seq("async", "function", "*", optional($.ident), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), choice($.block, optional(";"))), seq("abstract", "class", field('name', $.ident), optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat($.class_member), "}"), seq("abstract", "class", optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat($.class_member), "}"), seq("interface", field('name', $.ident), optional($.type_params), repeat(choice(seq("extends", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))), seq("implements", optional(seq(choice($.class_heritage), repeat(seq(",", choice($.class_heritage))), optional(",")))))), "{", repeat(seq($.interface_member, optional(choice(";", ",")))), "}"), seq($.expr, optional(";")))), seq("export", optional("type"), "*", choice(seq("from", $.string, optional(";")), seq("as", choice($.ident, $.string), "from", $.string, optional(";")))), seq("export", "{", optional(seq($.export_specifier, repeat(seq(",", $.export_specifier)), optional(","))), "}", optional(seq("from", $.string)), optional(";")), seq("export", "=", $.expr, optional(";")), seq("export", "type", "{", optional(seq($.export_specifier, repeat(seq(",", $.export_specifier)), optional(","))), "}", optional(seq("from", $.string)), optional(";")), seq("const", "enum", field('name', $.ident), "{", optional(seq($.enum_member, repeat(seq(",", $.enum_member)), optional(","))), "}"), seq("import", choice(seq($.import_clause, "from", $.string, optional(";")), seq("type", $.import_clause, "from", $.string, optional(";")), seq("type", field('name', $.ident), "=", $.expr, optional(";")), seq($.ident, "=", $.expr, optional(";")), seq($.string, optional(";")))), seq(repeat($.decorator_expr), "export", choice($.decl, $.stmt))), interface_member: $ => choice(seq(optional("new"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type))), seq(choice("get", "set"), $.member_name, "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type))), seq(optional("static"), optional(choice("+", "-")), optional("readonly"), "[", $.ident, "in", $.type, optional(seq("as", $.type)), "]", optional(choice("+", "-")), optional("?"), ":", $.type), seq("readonly", $.member_name, optional("?"), ":", $.type), seq($.member_name, optional("?"), choice(seq(optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type))), optional(seq(":", $.type)))), seq(optional("static"), optional("readonly"), "[", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), "]", optional(seq(":", $.type)))), - class_member: $ => choice(";", $.decorator_expr, seq("constructor", "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block, optional(";")), seq("static", $.block), seq(repeat(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "async")), choice(seq("*", $.member_name, optional("?"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), optional($.block), optional(";")), seq(choice("get", "set"), $.member_name, "(", optional(optional(seq($.param, repeat(seq(",", $.param)), optional(",")))), ")", optional(seq(":", $.type)), optional($.block), optional(";")), seq("[", $.ident, ":", $.type, "]", ":", $.type, optional(";")), seq($.member_name, choice(seq(optional("?"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), optional($.block), optional(";")), seq(optional("!"), optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr)), optional(";")))))), seq($.member_name, optional("!"), optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr)), optional(";")), seq($.member_name, optional("?"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), optional($.block), optional(";"))), + class_member: $ => choice(";", seq("constructor", "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", $.block, optional(";")), seq(repeat($.decorator_expr), repeat(choice(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "declare", "export", "in", "out", "const"))), "static", $.block), seq(repeat($.decorator_expr), repeat(choice(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "declare", "export", "in", "out", "const"))), choice(seq("async", repeat(choice(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "declare", "export", "in", "out", "const"))), "*", $.member_name, optional("?"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), optional($.block), optional(";")), seq("async", repeat(choice(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "declare", "export", "in", "out", "const"))), choice("get", "set"), $.member_name, optional($.type_params), "(", optional(optional(seq($.param, repeat(seq(",", $.param)), optional(",")))), ")", optional(seq(":", $.type)), optional($.block), optional(";")), seq("async", repeat(choice(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "declare", "export", "in", "out", "const"))), "static", $.block), seq("async", repeat(choice(choice("public", "private", "protected", "static", "abstract", "readonly", "override", "accessor", "declare", "export", "in", "out", "const"))), $.member_name, optional("?"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), optional($.block), optional(";")), seq("*", $.member_name, optional("?"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), optional($.block), optional(";")), seq(choice("get", "set"), $.member_name, optional($.type_params), "(", optional(optional(seq($.param, repeat(seq(",", $.param)), optional(",")))), ")", optional(seq(":", $.type)), optional($.block), optional(";")), seq("[", $.ident, ":", $.type, optional(","), "]", optional(seq(":", $.type)), choice(";", blank(), blank())), seq("constructor", optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), optional($.block), optional(";")), seq($.member_name, choice(seq(optional("?"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), optional($.block), optional(";")), seq(optional("!"), optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr)), choice(";", blank(), blank())))))), seq($.member_name, optional("!"), optional("?"), optional(seq(":", $.type)), optional(seq("=", $.expr)), choice(";", blank(), blank())), seq($.member_name, optional("?"), optional($.type_params), "(", optional(seq($.param, repeat(seq(",", $.param)), optional(","))), ")", optional(seq(":", $.type)), optional($.block), optional(";"))), enum_member: $ => seq($.member_name, optional(seq("=", $.expr))), import_clause: $ => choice(seq("defer", "*", "as", $.ident), seq($.ident, optional(seq(",", choice(seq("{", optional(seq($.import_specifier, repeat(seq(",", $.import_specifier)), optional(","))), "}"), seq("*", "as", $.ident))))), seq("{", optional(seq($.import_specifier, repeat(seq(",", $.import_specifier)), optional(","))), "}"), seq("*", "as", $.ident)), - import_specifier: $ => choice(seq($.ident, optional(seq("as", $.ident))), seq($.string, "as", $.ident)), + import_specifier: $ => choice(seq(optional("type"), $.ident, optional(seq("as", $.ident))), seq(optional("type"), $.string, "as", $.ident)), - export_specifier: $ => seq(choice($.ident, $.string), optional(seq("as", choice($.ident, $.string)))), + export_specifier: $ => choice(seq("type", choice($.ident, $.string), optional(seq("as", choice($.ident, $.string)))), seq("type", "as", choice(blank(), seq("as", choice($.ident, $.string)))), seq(choice($.ident, $.string), optional(seq("as", choice($.ident, $.string))))), jsxtag_name: $ => seq($.ident, repeat(choice(seq(".", $.ident), seq(":", $.ident), seq("-", $.ident)))), @@ -264,7 +270,7 @@ module.exports = grammar({ big_int: $ => token(/[0-9]+(?:_[0-9]+)*n/), - number: $ => token(/(?:[0-9]+(?:_[0-9]+)*(?:\.[0-9]*(?:_[0-9]+)*)?|\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\-]?[0-9]+(?:_[0-9]+)*)?/), + number: $ => token(/(?:(?:0|[1-9][0-9]*(?:_[0-9]+)*)(?:\.[0-9]*(?:_[0-9]+)*)?|\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\-]?[0-9]+(?:_[0-9]+)*)?/), string: $ => token(/"(?:[^"\\]|\\(?:u\{0*(?:[0-9A-Fa-f]{1,5}|10[0-9A-Fa-f]{4})\}|u[0-9A-Fa-f]{4}|x[0-9A-Fa-f]{2}|[^ux]))*"|'(?:[^'\\]|\\(?:u\{0*(?:[0-9A-Fa-f]{1,5}|10[0-9A-Fa-f]{4})\}|u[0-9A-Fa-f]{4}|x[0-9A-Fa-f]{2}|[^ux]))*'/), diff --git a/tree-sitter/typescriptreact/queries/highlights.scm b/tree-sitter/typescriptreact/queries/highlights.scm index f9f05f0..903ce20 100644 --- a/tree-sitter/typescriptreact/queries/highlights.scm +++ b/tree-sitter/typescriptreact/queries/highlights.scm @@ -63,15 +63,15 @@ ;; Builtin / global / constant identifier names. ((ident) @variable.builtin - (#any-of? @variable.builtin "console" "window" "document" "process" "require" "exports" "global" "globalThis")) + (#any-of? @variable.builtin "console" "window" "document" "process" "require" "exports" "globalThis")) ;; Keyword, operator, and punctuation literals. [ "implements" "interface" "namespace" "protected" "debugger" "readonly" "abstract" "override" "accessor" "default" "private" "declare" "extends" "switch" "export" "module" - "public" "static" "unique" "const" "using" "class" "async" "case" - "with" "from" "type" "enum" "@new" "meta" "let" "var" - "get" "set" "out" + "public" "static" "unique" "target" "const" "using" "class" "async" + "case" "with" "from" "type" "enum" "@new" "meta" "let" + "var" "get" "set" "out" ] @keyword [ "constructor" "function" "=>" @@ -95,12 +95,11 @@ "instanceof" "satisfies" "asserts" "typeof" "delete" "keyof" "infer" "void" "new" "as" "is" ] @keyword.operator -"symbol" @type.builtin [ "undefined" "false" "true" "null" ] @constant.builtin [ - "super" "this" + "global" "super" "this" ] @variable.builtin [ ">>>=" "**=" "<<=" ">>=" "??=" "||=" "&&=" "===" diff --git a/tree-sitter/typescriptreact/src/scanner.c b/tree-sitter/typescriptreact/src/scanner.c index a76ba0d..34265e5 100644 --- a/tree-sitter/typescriptreact/src/scanner.c +++ b/tree-sitter/typescriptreact/src/scanner.c @@ -50,7 +50,7 @@ static inline void skip(TSLexer *lexer) { lexer->advance(lexer, true); } // regex-vs-division decision is already made by the LR context. We only // need to scan the literal body here. // -// Regex flag characters (derived from the token pattern): "gimsuydv" +// Regex flag characters (derived from the token pattern): "gimsuyd" // Division-after texts (informational; LR ctx handles these): ) ] ++ -- this super true false null undefined > } // Regex-after keywords (informational): in of instanceof typeof delete void await yield throw return case do else new static bool scan_regex(TSLexer *lexer) { @@ -69,7 +69,7 @@ static bool scan_regex(TSLexer *lexer) { advance(lexer); } // Trailing flag characters. - const char *flags = "gimsuydv"; + const char *flags = "gimsuyd"; while (lexer->lookahead != 0 && strchr(flags, (char)lexer->lookahead) != NULL) advance(lexer); lexer->result_symbol = REGEX_LITERAL; lexer->mark_end(lexer); diff --git a/typescript.monarch.json b/typescript.monarch.json index 95fbf28..f0e9e6a 100644 --- a/typescript.monarch.json +++ b/typescript.monarch.json @@ -356,10 +356,11 @@ "(?:[a-zA-Z_$]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})(?:[a-zA-Z0-9_$]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})*", { "cases": { - "is": "operator", "keyof": "operator", "typeof": "operator", "readonly": "keyword", + "this": "keyword", + "is": "operator", "abstract": "keyword", "new": "operator", "asserts": "operator", @@ -370,9 +371,7 @@ "null": "keyword", "undefined": "keyword", "void": "operator", - "this": "keyword", "unique": "keyword", - "symbol": "keyword", "import": "keyword", "function": "keyword", "in": "keyword", @@ -380,6 +379,7 @@ "@new": "keyword", "super": "keyword", "instanceof": "operator", + "target": "keyword", "class": "keyword", "implements": "keyword", "async": "keyword", @@ -423,8 +423,8 @@ "interface": "keyword", "type": "keyword", "enum": "keyword", - "namespace": "keyword", "module": "keyword", + "namespace": "keyword", "from": "keyword", "constructor": "keyword", "defer": "keyword", @@ -433,6 +433,7 @@ "number": "keyword", "boolean": "keyword", "object": "keyword", + "symbol": "keyword", "bigint": "keyword", "any": "keyword", "unknown": "keyword", @@ -550,7 +551,7 @@ } ], [ - "(?:[0-9]+(?:_[0-9]+)*(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])", + "(?:(?:0|[1-9][0-9]*(?:_[0-9]+)*)(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])", { "token": "number", "switchTo": "@value" @@ -574,10 +575,6 @@ "(?:[a-zA-Z_$]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})(?:[a-zA-Z0-9_$]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})*", { "cases": { - "is": { - "token": "operator", - "switchTo": "@root" - }, "keyof": { "token": "operator", "switchTo": "@root" @@ -590,6 +587,14 @@ "token": "keyword", "switchTo": "@root" }, + "this": { + "token": "keyword", + "switchTo": "@value" + }, + "is": { + "token": "operator", + "switchTo": "@root" + }, "abstract": { "token": "keyword", "switchTo": "@root" @@ -630,18 +635,10 @@ "token": "operator", "switchTo": "@root" }, - "this": { - "token": "keyword", - "switchTo": "@value" - }, "unique": { "token": "keyword", "switchTo": "@root" }, - "symbol": { - "token": "keyword", - "switchTo": "@value" - }, "import": { "token": "keyword", "switchTo": "@root" @@ -670,6 +667,10 @@ "token": "operator", "switchTo": "@root" }, + "target": { + "token": "keyword", + "switchTo": "@root" + }, "class": { "token": "keyword", "switchTo": "@root" @@ -842,11 +843,15 @@ "token": "keyword", "switchTo": "@root" }, - "namespace": { + "module": { "token": "keyword", "switchTo": "@root" }, - "module": { + "global": { + "token": "variable", + "switchTo": "@value" + }, + "namespace": { "token": "keyword", "switchTo": "@root" }, @@ -882,6 +887,10 @@ "token": "keyword", "switchTo": "@value" }, + "symbol": { + "token": "keyword", + "switchTo": "@value" + }, "bigint": { "token": "keyword", "switchTo": "@value" @@ -970,10 +979,6 @@ "token": "variable", "switchTo": "@value" }, - "global": { - "token": "variable", - "switchTo": "@value" - }, "globalThis": { "token": "variable", "switchTo": "@value" @@ -1047,7 +1052,7 @@ "include": "@exprBody" }, [ - "/(?:[^/\\\\\\[\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])+/[gimsuydv]*", + "/(?:[^/\\\\\\[*\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])(?:[^/\\\\\\[\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])*/(?:[a-zA-Z0-9_$]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})*", { "token": "regexp", "switchTo": "@value" @@ -1105,7 +1110,7 @@ "number" ], [ - "(?:[0-9]+(?:_[0-9]+)*(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])", + "(?:(?:0|[1-9][0-9]*(?:_[0-9]+)*)(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])", "number" ], [ @@ -1120,10 +1125,11 @@ "(?:[a-zA-Z_$]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})(?:[a-zA-Z0-9_$]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})*", { "cases": { - "is": "operator", "keyof": "operator", "typeof": "operator", "readonly": "keyword", + "this": "keyword", + "is": "operator", "abstract": "keyword", "new": "operator", "asserts": "operator", @@ -1134,9 +1140,7 @@ "null": "keyword", "undefined": "keyword", "void": "operator", - "this": "keyword", "unique": "keyword", - "symbol": "keyword", "import": "keyword", "function": "keyword", "in": "keyword", @@ -1144,6 +1148,7 @@ "@new": "keyword", "super": "keyword", "instanceof": "operator", + "target": "keyword", "class": "keyword", "implements": "keyword", "async": "keyword", @@ -1187,8 +1192,9 @@ "interface": "keyword", "type": "keyword", "enum": "keyword", - "namespace": "keyword", "module": "keyword", + "global": "variable", + "namespace": "keyword", "from": "keyword", "constructor": "keyword", "defer": "keyword", @@ -1197,6 +1203,7 @@ "number": "keyword", "boolean": "keyword", "object": "keyword", + "symbol": "keyword", "bigint": "keyword", "any": "keyword", "unknown": "keyword", @@ -1219,14 +1226,13 @@ "process": "variable", "require": "variable", "exports": "variable", - "global": "variable", "globalThis": "variable", "@default": "identifier" } } ], [ - "/(?:[^/\\\\\\[\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])+/[gimsuydv]*", + "/(?:[^/\\\\\\[*\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])(?:[^/\\\\\\[\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])*/(?:[a-zA-Z0-9_$]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})*", "regexp" ], [ diff --git a/typescript.tmLanguage.json b/typescript.tmLanguage.json index 0ad2c4e..e762f51 100644 --- a/typescript.tmLanguage.json +++ b/typescript.tmLanguage.json @@ -118,14 +118,17 @@ "include": "#import-default-binding" }, { - "include": "#type-predicate-operator" + "include": "#keyof-typekw" }, { - "include": "#keyof-typekw" + "include": "#type-predicate-operator" }, { "include": "#extends-typekw" }, + { + "include": "#unique-typekw" + }, { "include": "#as-typekw" }, @@ -154,10 +157,10 @@ "include": "#scope-keyword-operator-expression" }, { - "include": "#scope-keyword-operator-expression-is" + "include": "#scope-keyword-operator-expression-keyof" }, { - "include": "#scope-keyword-operator-expression-keyof" + "include": "#scope-keyword-operator-expression-is" }, { "include": "#scope-keyword-operator-expression-asserts" @@ -165,9 +168,6 @@ { "include": "#scope-keyword-operator-expression-infer" }, - { - "include": "#scope-keyword-operator-expression-as" - }, { "include": "#scope-keyword-operator-expression-satisfies" }, @@ -177,9 +177,6 @@ { "include": "#scope-keyword-control-loop" }, - { - "include": "#scope-keyword-control-loop-of" - }, { "include": "#scope-keyword-control-flow" }, @@ -204,9 +201,6 @@ { "include": "#scope-storage-modifier" }, - { - "include": "#scope-storage-modifier-accessibility" - }, { "include": "#scope-keyword-other-extends" }, @@ -250,10 +244,10 @@ "include": "#scope-constant-language-null" }, { - "include": "#scope-support-type-primitive" + "include": "#this-literal" }, { - "include": "#this-literal" + "include": "#scope-support-type-primitive" }, { "include": "#super-literal" @@ -453,7 +447,7 @@ }, "regex-literal-prefix-ops": { "name": "string.regexp.ts", - "begin": "(?:(?<=[=|\\^&<>+\\-*%~,\\[(?:{;.])|(?<=\\bis)|(?<=\\bkeyof)|(?<=\\btypeof)|(?<=\\breadonly)|(?<=\\bnew)|(?<=\\bextends)|(?<=\\bin)|(?<=\\bas)|(?<=\\b@new)|(?<=\\binstanceof)|(?<=\\bclass)|(?<=\\basync)|(?<=\\byield)|(?<=\\bsatisfies)|(?<=\\bfunction)|(?<=\\bget)|(?<=\\bset)|(?<=\\bpublic)|(?<=\\bprivate)|(?<=\\bprotected)|(?<=\\bstatic)|(?<=\\babstract)|(?<=\\boverride)|(?<=\\baccessor)|(?<=\\bexport)|(?<=\\bdeclare)|(?<=\\bout)|(?<=\\belse)|(?<=\\bdo)|(?<=\\breturn)|(?<=\\bthrow)|(?<=\\btry)|(?<=\\bfinally)|(?<=\\bcatch)|(?<=\\bof)|(?<=\\bcase)|(?<=\\busing)|(?<=\\bdefault)|(?<=\\bimport)|(?<=\\btype)|(?<=\\bvoid)|(?<=\\bdelete)|(?<=\\bawait)|(?<=^))\\s*([!](?:\\s*[!])*)\\s*(?:((?:/\\*\\*(?!/)[\\s\\S]*?\\*/|/\\*[\\s\\S]*?\\*/)\\s*))?(/)(?![*/])", + "begin": "(?:(?<=[=|\\^&<>+\\-*%~,\\[(?:{;.])|(?<=\\bkeyof)|(?<=\\btypeof)|(?<=\\breadonly)|(?<=\\bis)|(?<=\\bnew)|(?<=\\bextends)|(?<=\\bunique)|(?<=\\bin)|(?<=\\bas)|(?<=\\b@new)|(?<=\\binstanceof)|(?<=\\bclass)|(?<=\\basync)|(?<=\\byield)|(?<=\\bsatisfies)|(?<=\\bfunction)|(?<=\\bget)|(?<=\\bset)|(?<=\\bpublic)|(?<=\\bprivate)|(?<=\\bprotected)|(?<=\\bstatic)|(?<=\\babstract)|(?<=\\boverride)|(?<=\\baccessor)|(?<=\\bexport)|(?<=\\bdeclare)|(?<=\\bout)|(?<=\\belse)|(?<=\\bdo)|(?<=\\breturn)|(?<=\\bthrow)|(?<=\\btry)|(?<=\\bfinally)|(?<=\\bcatch)|(?<=\\bof)|(?<=\\bcase)|(?<=\\bdefault)|(?<=\\bimport)|(?<=\\btype)|(?<=\\bconstructor)|(?<=\\bvoid)|(?<=\\bdelete)|(?<=\\bawait)|(?<=^))\\s*([!](?:\\s*[!])*)\\s*(?:((?:/\\*\\*(?!/)[\\s\\S]*?\\*/|/\\*[\\s\\S]*?\\*/)\\s*))?(/)(?![*/])", "beginCaptures": { "1": { "name": "keyword.operator.logical.prefix.ts" @@ -465,7 +459,7 @@ "name": "punctuation.definition.string.begin.regexp.ts" } }, - "end": "(/)([gimsuydv]*)", + "end": "(/)([a-z]*)", "endCaptures": { "1": { "name": "punctuation.definition.string.end.regexp.ts" @@ -1205,7 +1199,7 @@ }, "number": { "name": "constant.numeric.decimal.ts", - "match": "(?:[0-9]+(?:_[0-9]+)*(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])" + "match": "(?:(?:0|[1-9][0-9]*(?:_[0-9]+)*)(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])" }, "template": { "name": "string.quoted.other.template.ts", @@ -2400,7 +2394,7 @@ "name": "keyword.operator.expression.keyof.ts" } }, - "end": "(?=[)}{\\],;=>]|\\b(?:is|keyof|extends|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", + "end": "(?=[)}{\\],;=>]|\\b(?:keyof|is|extends|unique|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", "patterns": [ { "include": "#type" @@ -2415,7 +2409,22 @@ "name": "keyword.other.extends.extends.ts" } }, - "end": "(?=[)}{\\],;=>]|\\b(?:is|keyof|extends|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", + "end": "(?=[)}{\\],;=>]|\\b(?:keyof|is|extends|unique|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", + "patterns": [ + { + "include": "#type" + } + ] + }, + "unique-typekw": { + "name": "meta.type.unique.ts", + "begin": "\\b(unique)\\b", + "beginCaptures": { + "1": { + "name": "keyword.other.unique.ts" + } + }, + "end": "(?=[)}{\\],;=>]|\\b(?:keyof|is|extends|unique|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", "patterns": [ { "include": "#type" @@ -2424,13 +2433,13 @@ }, "as-typekw": { "name": "meta.type.as.ts", - "begin": "\\b(as)\\b(?=\\s+[[:alpha:][:digit:]_$\"`({\\[\\-]|\\s*$)", + "begin": "\\b(as)\\b", "beginCaptures": { "1": { "name": "keyword.operator.expression.as.ts" } }, - "end": "(?=[)}{\\],;=>]|\\b(?:is|keyof|extends|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", + "end": "(?=[)}{\\],;=>]|\\b(?:keyof|is|extends|unique|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", "patterns": [ { "include": "#type" @@ -2445,7 +2454,7 @@ "name": "keyword.other.extends.implements.ts" } }, - "end": "(?=[)}{\\],;=>]|\\b(?:is|keyof|extends|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", + "end": "(?=[)}{\\],;=>]|\\b(?:keyof|is|extends|unique|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", "patterns": [ { "include": "#type" @@ -2460,7 +2469,7 @@ "name": "keyword.operator.expression.satisfies.ts" } }, - "end": "(?=[)}{\\],;=>]|\\b(?:is|keyof|extends|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", + "end": "(?=[)}{\\],;=>]|\\b(?:keyof|is|extends|unique|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", "patterns": [ { "include": "#type" @@ -2504,17 +2513,17 @@ ] }, "scope-keyword-operator-expression": { - "match": "\\b(typeof|new|void|instanceof|delete)\\b", - "name": "keyword.operator.expression.ts" - }, - "scope-keyword-operator-expression-is": { - "match": "\\b(is)\\b(?=\\s+[[:alpha:][:digit:]_$\"`({\\[\\-]|\\s*$)", + "match": "\\b(typeof|new|void|as|instanceof|delete)\\b", "name": "keyword.operator.expression.ts" }, "scope-keyword-operator-expression-keyof": { "match": "\\b(keyof)\\b(?=\\s+[[:alpha:][:digit:]_$\"`({\\[\\-]|\\s*$)", "name": "keyword.operator.expression.ts" }, + "scope-keyword-operator-expression-is": { + "match": "\\b(is)\\b(?=\\s+[[:alpha:][:digit:]_$\"`({\\[\\-]|\\s*$)", + "name": "keyword.operator.expression.ts" + }, "scope-keyword-operator-expression-asserts": { "match": "\\b(asserts)\\b(?=\\s+[[:alpha:][:digit:]_$\"`({\\[\\-]|\\s*$)", "name": "keyword.operator.expression.ts" @@ -2523,20 +2532,12 @@ "match": "\\b(infer)\\b(?=\\s+[[:alpha:][:digit:]_$\"`({\\[\\-]|\\s*$)", "name": "keyword.operator.expression.ts" }, - "scope-keyword-operator-expression-as": { - "match": "\\b(as)\\b(?=\\s+[[:alpha:][:digit:]_$\"`({\\[\\-]|\\s*$)", - "name": "keyword.operator.expression.ts" - }, "scope-keyword-operator-expression-satisfies": { "match": "\\b(satisfies)\\b(?=\\s+[[:alpha:][:digit:]_$\"`({\\[\\-]|\\s*$)", "name": "keyword.operator.expression.ts" }, "scope-storage-modifier": { - "match": "\\b(readonly|async|static|declare)\\b", - "name": "storage.modifier.ts" - }, - "scope-storage-modifier-accessibility": { - "match": "\\b(abstract|public|private|protected|override|accessor)\\b(?=\\s+(?:\\.\\.\\.|[[:alpha:]_$\\[*#{\"'0-9]))", + "match": "\\b(readonly|abstract|async|public|private|protected|static|override|accessor|declare)\\b", "name": "storage.modifier.ts" }, "scope-keyword-other-extends": { @@ -2556,11 +2557,11 @@ "name": "constant.language.null.ts" }, "scope-support-type-primitive": { - "match": "\\b(void|symbol|string|number|boolean|object|bigint|any|unknown|never)\\b", + "match": "\\b(void|string|number|boolean|object|symbol|bigint|any|unknown|never)\\b", "name": "support.type.primitive.ts" }, "scope-keyword-other": { - "match": "\\b(unique|@new|meta|out)\\b", + "match": "\\b(unique|@new|target|meta|out)\\b", "name": "keyword.other.ts" }, "scope-keyword-control-import": { @@ -2568,15 +2569,11 @@ "name": "keyword.control.import.ts" }, "scope-storage-type-function": { - "match": "\\b(function)\\b", + "match": "\\b(function|constructor)\\b", "name": "storage.type.function.ts" }, "scope-keyword-control-loop": { - "match": "\\b(in|for|while|do|break|continue)\\b", - "name": "keyword.control.loop.ts" - }, - "scope-keyword-control-loop-of": { - "match": "\\b(of)\\b(?=\\s+[[:alpha:][:digit:]_$\"`({\\[\\-]|\\s*$|\\s*[({\\[\"`/\\-])", + "match": "\\b(in|for|while|do|break|continue|of)\\b", "name": "keyword.control.loop.ts" }, "scope-storage-type-class": { @@ -2628,11 +2625,11 @@ "name": "storage.type.enum.ts" }, "scope-storage-type-namespace": { - "match": "\\b(namespace|module)\\b", + "match": "\\b(module|namespace)\\b", "name": "storage.type.namespace.ts" }, "scope-support-variable": { - "match": "\\b(module|console|window|document|process|require|exports|global|globalThis)\\b", + "match": "\\b(module|global|console|window|document|process|require|exports|globalThis)\\b", "name": "support.variable.ts" }, "scope-keyword-control-from-from": { @@ -2913,10 +2910,10 @@ "include": "#import-default-binding" }, { - "include": "#type-predicate-operator" + "include": "#keyof-typekw" }, { - "include": "#keyof-typekw" + "include": "#type-predicate-operator" }, { "include": "#extends-typekw" @@ -2949,10 +2946,10 @@ "include": "#scope-keyword-operator-expression" }, { - "include": "#scope-keyword-operator-expression-is" + "include": "#scope-keyword-operator-expression-keyof" }, { - "include": "#scope-keyword-operator-expression-keyof" + "include": "#scope-keyword-operator-expression-is" }, { "include": "#scope-keyword-operator-expression-asserts" @@ -2960,9 +2957,6 @@ { "include": "#scope-keyword-operator-expression-infer" }, - { - "include": "#scope-keyword-operator-expression-as" - }, { "include": "#scope-keyword-operator-expression-satisfies" }, @@ -3009,10 +3003,10 @@ "include": "#scope-constant-language-null" }, { - "include": "#scope-support-type-primitive" + "include": "#this-literal" }, { - "include": "#this-literal" + "include": "#scope-support-type-primitive" }, { "include": "#super-literal" @@ -3145,10 +3139,10 @@ "include": "#scope-keyword-operator-expression" }, { - "include": "#scope-keyword-operator-expression-is" + "include": "#scope-keyword-operator-expression-keyof" }, { - "include": "#scope-keyword-operator-expression-keyof" + "include": "#scope-keyword-operator-expression-is" }, { "include": "#scope-keyword-operator-expression-asserts" @@ -3156,9 +3150,6 @@ { "include": "#scope-keyword-operator-expression-infer" }, - { - "include": "#scope-keyword-operator-expression-as" - }, { "include": "#scope-keyword-operator-expression-satisfies" }, @@ -3246,7 +3237,7 @@ }, "regex": { "name": "string.regexp.ts", - "begin": "(?:(?<=[=|\\^&<>+\\-*%~,\\[(?:{;.])|(?<=\\bis)|(?<=\\bkeyof)|(?<=\\btypeof)|(?<=\\breadonly)|(?<=\\bnew)|(?<=\\bextends)|(?<=\\bin)|(?<=\\bas)|(?<=\\b@new)|(?<=\\binstanceof)|(?<=\\bclass)|(?<=\\basync)|(?<=\\byield)|(?<=\\bsatisfies)|(?<=\\bfunction)|(?<=\\bget)|(?<=\\bset)|(?<=\\bpublic)|(?<=\\bprivate)|(?<=\\bprotected)|(?<=\\bstatic)|(?<=\\babstract)|(?<=\\boverride)|(?<=\\baccessor)|(?<=\\bexport)|(?<=\\bdeclare)|(?<=\\bout)|(?<=\\belse)|(?<=\\bdo)|(?<=\\breturn)|(?<=\\bthrow)|(?<=\\btry)|(?<=\\bfinally)|(?<=\\bcatch)|(?<=\\bof)|(?<=\\bcase)|(?<=\\busing)|(?<=\\bdefault)|(?<=\\bimport)|(?<=\\btype)|(?<=\\bvoid)|(?<=\\bdelete)|(?<=\\bawait)|(?<=^))\\s*(?:((?:/\\*\\*(?!/)[\\s\\S]*?\\*/|/\\*[\\s\\S]*?\\*/)\\s*))?(/)(?![*/])", + "begin": "(?:(?<=[=|\\^&<>+\\-*%~,\\[(?:{;.])|(?<=\\bkeyof)|(?<=\\btypeof)|(?<=\\breadonly)|(?<=\\bis)|(?<=\\bnew)|(?<=\\bextends)|(?<=\\bunique)|(?<=\\bin)|(?<=\\bas)|(?<=\\b@new)|(?<=\\binstanceof)|(?<=\\bclass)|(?<=\\basync)|(?<=\\byield)|(?<=\\bsatisfies)|(?<=\\bfunction)|(?<=\\bget)|(?<=\\bset)|(?<=\\bpublic)|(?<=\\bprivate)|(?<=\\bprotected)|(?<=\\bstatic)|(?<=\\babstract)|(?<=\\boverride)|(?<=\\baccessor)|(?<=\\bexport)|(?<=\\bdeclare)|(?<=\\bout)|(?<=\\belse)|(?<=\\bdo)|(?<=\\breturn)|(?<=\\bthrow)|(?<=\\btry)|(?<=\\bfinally)|(?<=\\bcatch)|(?<=\\bof)|(?<=\\bcase)|(?<=\\bdefault)|(?<=\\bimport)|(?<=\\btype)|(?<=\\bconstructor)|(?<=\\bvoid)|(?<=\\bdelete)|(?<=\\bawait)|(?<=^))\\s*(?:((?:/\\*\\*(?!/)[\\s\\S]*?\\*/|/\\*[\\s\\S]*?\\*/)\\s*))?(/)(?![*/])", "beginCaptures": { "1": { "name": "comment.block.ts" @@ -3255,7 +3246,7 @@ "name": "punctuation.definition.string.begin.regexp.ts" } }, - "end": "(/)([gimsuydv]*)", + "end": "(/)([a-z]*)", "endCaptures": { "1": { "name": "punctuation.definition.string.end.regexp.ts" @@ -3390,7 +3381,7 @@ "include": "$self" } ], - "while": "^(?=\\s*(?:[<,\\[|&(...?:{;\\-.!*]|(?:is|keyof|typeof|readonly|abstract|new|asserts|extends|infer|true|false|null|undefined|void|this|unique|symbol)\\b|//|/\\*|[>\\])}](?:\\s*[>\\])}])*\\s*(?=[<,\\[|&(...?:{;\\-.!*=])|(?!(?:if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|let|const|var|using|function|constructor|class|interface|type|enum|namespace|module|public|private|protected|static|override|declare|async|accessor|get|set)\\b)(?:[a-zA-Z_$\\p{L}\\p{Nl}]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})(?:[a-zA-Z0-9_$\\p{L}\\p{Nl}\\p{Nd}\\p{Mn}\\p{Mc}\\p{Pc}]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})*\\b(?!\\s*[.(])))" + "while": "^(?=\\s*(?:[<,\\[|&(...?:{;\\-.!*]|(?:keyof|typeof|readonly|this|is|abstract|new|asserts|extends|infer|true|false|null|undefined|void|unique)\\b|//|/\\*|[>\\])}](?:\\s*[>\\])}])*\\s*(?=[<,\\[|&(...?:{;\\-.!*=])|(?!(?:if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|let|const|var|using|function|constructor|class|interface|type|enum|namespace|module|public|private|protected|static|override|declare|async|accessor|get|set)\\b)(?:[a-zA-Z_$\\p{L}\\p{Nl}]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})(?:[a-zA-Z0-9_$\\p{L}\\p{Nl}\\p{Nd}\\p{Mn}\\p{Mc}\\p{Pc}]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})*\\b(?!\\s*[.(])))" }, "type-object": { "name": "meta.object-type.ts", @@ -3474,7 +3465,7 @@ "name": "keyword.operator.expression.is.ts" } }, - "end": "(?=[)}{\\],;=>]|\\b(?:is|keyof|extends|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", + "end": "(?=[)}{\\],;=>]|\\b(?:keyof|is|extends|unique|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", "patterns": [ { "include": "#type" diff --git a/typescript.ts b/typescript.ts index 105c79b..26a71b2 100644 --- a/typescript.ts +++ b/typescript.ts @@ -1,8 +1,29 @@ import { rule, defineGrammar, op, prefix, postfix, sameLine, - sep, opt, many, many1, alt, exclude, not, + sep, opt, many, many1, alt, exclude, not, tsRelax, capExpr, notLeftLeaf, + awaitCtx, yieldCtx, asyncGenCtx, resetCtx, } from './src/api.ts'; + +// Build the four async×generator arms of a TypeScript `function` form, routing each +// arm's params and body to its [Await]/[Yield] family (plain resets, generator -> +// yield, async -> await, async-generator -> both). Type params and the return-type +// annotation are NOT [Await]/[Yield]-parameterized, so they stay plain. `nameParts` +// is spread in after `function` (and `*`); `body` is the function body element. +// Param/Block/Type/TypeParams resolve at thunk-eval time (defined below). +function tsFnArms(nameParts, body) { + return [ + ['function', ...nameParts, opt(TypeParams), '(', sep(Param, ','), ')', opt(":", ReturnType), resetCtx(body)], + ['function', '*', ...nameParts, opt(TypeParams), '(', sep(yieldCtx(Param), ','), ')', opt(":", ReturnType), yieldCtx(body)], + ['async', 'function', ...nameParts, opt(TypeParams), '(', sep(awaitCtx(Param), ','), ')', opt(":", ReturnType), awaitCtx(body)], + ['async', 'function', '*', ...nameParts, opt(TypeParams), '(', sep(asyncGenCtx(Param), ','), ')', opt(":", ReturnType), asyncGenCtx(body)], + ]; +} + +// Statement ASI terminator: a `;`, OR a line-terminator before the next token (newline +// ASI), OR the next token is `}` (block end). A same-line non-`;`/`}` token can NOT end +// the statement, so a mid-line split (`var x = a[]`) stays one statement (tsc-shaped). +const asi = () => alt([';'], [not(sameLine)], [not(not('}'))]); // JavaScript is the SUBSET / base of the ECMAScript family; TypeScript is the // SUPERSET (JS + a type layer). The shared, type-free vocabulary — token consts, // the `notReserved`/`notReservedExpr` reserved-word guards, the precedence ladder @@ -45,7 +66,7 @@ const DecoratorExpr = rule($ => [ // optional chain: ?.y | ?.#y | ?.(args) | ?.[i] — unlike plain element access, // `?.[` is unambiguous (a computed class member never starts with `?.`), so tsc // parses it in decorator position and we mirror. - ['?.', alt(Ident, PrivateField, ['(', sep(Expr, ','), ')'], ['[', Expr, ']'])], + ['?.', alt(Ident, PrivateField, ['(', sep(Expr, ','), ')'], ['[', Expr, ']'])], // `?.#y` is valid current ES (see Expr `?.` below) Template, // tagged template: @x`…` ))], // `@new x` — the decorator expression is a NewExpression. The lexer maximal-munches @@ -57,7 +78,7 @@ const DecoratorExpr = rule($ => [ // ── Types ── const TypeMember = rule($ => { - const callSig = [opt(TypeParams), '(', sep(Param, ','), ')', opt(':', Type)]; // `( … ): Ret` + const callSig = [opt(TypeParams), '(', sep(Param, ','), ')', opt(":", ReturnType)]; // `( … ): Ret` const propOrMethod = alt(callSig, [opt(':', Type)]); // after a name: method (callSig) | property return [ // call / construct signature (no member name): a construct sig is just a @@ -68,7 +89,7 @@ const TypeMember = rule($ => { [opt(alt('+', '-')), opt('readonly'), '[', alt( [Ident, alt( ['in', Type, opt('as', Type), ']', opt(alt('+', '-')), opt('?'), ':', Type], // mapped: K in T (as U)? - [':', Type, ']', opt(':', Type)], // index: k: T + [':', Type, opt(','), ']', opt(':', Type)], // index: k: T (trailing comma tolerated) )], [Expr, ']', opt('?'), propOrMethod], // computed: expr [']', opt(':', Type)], // empty index sig: [] / []: T @@ -81,10 +102,18 @@ const TypeMember = rule($ => { }); const Type = rule($ => { - const fnType = [opt(TypeParams), '(', sep(Param, ','), ')', '=>', $]; // (a: T) => R / (…) => R + const fnType = [opt(TypeParams), '(', sep(Param, ','), ')', '=>', ReturnType]; // (a: T) => R / (…) => R (the return may be a type predicate) return [ - [Ident, opt('is', $)], // T | type predicate `x is T` - [$, '<', sep($, ','), '>'], + // A bare type reference / entity name. The type-predicate `x is T` is NOT here for the + // PARSER: tsc's parser accepts `x is T` ONLY in a function RETURN-TYPE position (see + // ReturnType below), so a predicate in any other type slot (var/param/property + // annotation, cast, type argument, union member, …) is a parse error. (`asserts x` is + // different — tsc's parser accepts it in EVERY position, so it stays in this general + // Type, below.) tsRelax: tree-sitter KEEPS the predicate in the general type (its + // status-quo shape, GLR-cheap), since a highlighter may over-accept a stray predicate + // — adding the return-only ReturnType to ~18 slots for tree-sitter inflates its table. + tsRelax(Ident, [Ident, opt('is', Type)]), + [$, sameLine, '<', sep($, ','), '>'], // type-arg application T — `<` must be on the same line (no ASI), like the postfix `[`/`!` arms below [$, sameLine, '[', ']'], // array type T[] — `[` must be on the same line (no ASI) [$, '|', $], [$, '&', $], @@ -99,8 +128,16 @@ const Type = rule($ => { // tuple element: `...`? (name `?`? `:`)? `...`? Type `?`? — the second `...` // covers a named rest member `n: ...T[]` (TS: RestType after the label); the // trailing `?` covers optional members `n: T?` / `T?` (TS: OptionalType). - ['[', many(opt('...'), opt(Ident, opt('?'), ':'), opt('...'), $, opt('?'), opt(',')), ']'], - ['{', many(TypeMember, opt(alt(';', ','))), '}'], + // Elements are comma-SEPARATED: a `,` is required between elements (`[A B]` and + // even `[A\n B]` are tsc's "',' expected" — unlike object types, a newline does NOT + // separate tuple members), while the LAST element needs none (`]`-ahead). Trailing + // comma is covered by the `,` arm before the closing-`]` iteration fails to start. + ['[', many(opt('...'), opt(Ident, opt('?'), ':'), opt('...'), $, opt('?'), alt([','], [not(not(']'))])), ']'], + // object type literal: members are SEPARATED by `;` / `,` / a newline (the type + // analog of statement ASI) — two members on one line with no separator reject + // (`{ a: T b: U }` is tsc's "';' expected"). The `}`-ahead arm lets the last member + // need no trailing separator; `;`/`,` also cover an explicit trailing delimiter. + ['{', many(TypeMember, alt([';'], [','], [not(sameLine)], [not(not('}'))])), '}'], ['asserts', Ident, opt('is', $)], [$, 'extends', $, '?', $, ':', $], // infer U | infer U extends T | infer U extends T ? X : Y (conditional binds to the infer) @@ -110,15 +147,23 @@ const Type = rule($ => { HexNumber, OctalNumber, BinaryNumber, BigInt_, ['-', alt(Number_, BigInt_)], 'true', 'false', 'null', 'undefined', 'void', 'this', - ['unique', 'symbol'], + ['unique', $], // `unique` is a general prefix type operator (tsc parses `unique `); `unique symbol` is the checker-valid case ['import', '(', $, ')'], Template, [$, sameLine, '[', $, ']'], // indexed access T[K] — `[` must be on the same line (no ASI) - [$, '.', Ident], + // qualified type name `A.B`: a TypeName's root is an IdentifierReference, so the + // keyword/literal types `void`/`null`/`true`/`false`/`this` are NOT `.`-qualifiable + // (`void.x` has no parse tree — tsc rejects; @babel/parser is lenient but the spec + // PRODUCTIONS make it underivable). `undefined`/`number`/`string`/… are identifier-rooted + // and stay qualifiable. `notLeftLeaf(...)` gates the arm on the LEFT node's head leaf; it is + // zero-width, so tree-sitter DROPS it (the derived GLR grammar keeps the unconstrained `.` + // LED — a left-leaf predicate is not expressible in GLR, and a stray `void.x` is harmless for + // a highlighter). No tsRelax wrapper is needed: the marker is itself the relaxation point. + [notLeftLeaf('void', 'null', 'true', 'false', 'this'), $, '.', Ident], // ── JSDoc types — tsc parses these in NORMAL TS type positions (the checker // rejects them with "JSDoc types can only be used inside documentation // comments"), so the parse surface must accept them. ── - [$, '.', '<', sep($, ','), '>'], // dotted type arguments: Array. + [notLeftLeaf('void', 'null', 'true', 'false', 'this'), $, '.', '<', sep($, ','), '>'], // dotted type arguments: Array. ['?', $], // prefix nullable: ?number ['!', $], // prefix non-nullable: !string '?', // JSDocUnknownType: a bare `?` (when no type follows) @@ -137,21 +182,32 @@ const Type = rule($ => { // ── Expressions ── const Prop = rule($ => { - const method = ['(', sep(Param, ','), ')', opt(':', Type), Block]; // ( … ): T { … } + // ( … ): T { … }, params+body routed to a [Await]/[Yield] family (see memTail); the + // MemberName and return type stay outside it (a computed key inherits the enclosing + // context, type positions are not parameterized). + const propTail = (ctx) => ['(', sep(ctx(Param), ','), ')', opt(":", ReturnType), ctx(Block)]; // tsc parses a full modifier soup before ANY object-literal member and a `?` then // `!` after its name (`{ static m() {} }`, `{ export p: 1 }`, `{ a! }`, `{ a?() {} }` // are all parse-clean — rejecting them is the checker's job). `const`/`default` are // NOT parsed as modifiers there (tsc parse errors), so they stay out of the soup. // The soup arms are many1 + a plain fallback arm, so a member NAMED like a modifier // (`{ static: 1 }`, `{ async }`) falls through to the plain shapes. - const propMod = alt('public', 'private', 'protected', 'static', 'abstract', 'readonly', 'override', 'accessor', 'async', 'export', 'declare', 'in', 'out'); + // `async` is pulled out of the soup into the dedicated async method arms below (so the + // body gets its [Await] context); `static`/`get`/… stay lenient modifiers. + const propMod = alt('public', 'private', 'protected', 'static', 'abstract', 'readonly', 'override', 'accessor', 'export', 'declare', 'in', 'out'); return [ ['...', Expr], // spread - // accessor (get/set), with any modifier soup (lenient, tsc-shaped) - [many(propMod), alt('get', 'set'), MemberName, '(', opt(sep(Param, ',')), ')', opt(':', Type), Block], + // accessor (get/set), with any modifier soup (lenient, tsc-shaped) — body resets + [many(propMod), alt('get', 'set'), MemberName, '(', opt(sep(resetCtx(Param), ',')), ')', opt(":", ReturnType), opt(resetCtx(Block))], // body optional: `{ get foo() }` is a tsc-clean (error-recovery) parse // method: modifiers?/generator?, any member name (incl `#x`, computed `[e]`), then ( … ) { … } - [many1(propMod), opt('*'), MemberName, opt('?'), opt('!'), opt(TypeParams), ...method], - [opt('async'), opt('*'), MemberName, opt('?'), opt('!'), opt(TypeParams), ...method], + [many1(propMod), opt('*'), MemberName, opt('?'), opt('!'), opt(TypeParams), ...propTail(resetCtx)], + // async/generator method, 4-way split (each routes params+body to its family). + // async carries its own modifier run (order-free, like the class member arms). + ['async', many(propMod), '*', MemberName, opt('?'), opt('!'), opt(TypeParams), ...propTail(asyncGenCtx)], + ['async', many(propMod), alt('get', 'set'), MemberName, '(', opt(sep(awaitCtx(Param), ',')), ')', opt(":", ReturnType), opt(awaitCtx(Block))], // async accessor (semantic error; parses) + ['async', many(propMod), MemberName, opt('?'), opt('!'), opt(TypeParams), ...propTail(awaitCtx)], + ['*', MemberName, opt('?'), opt('!'), opt(TypeParams), ...propTail(yieldCtx)], + [MemberName, opt('?'), opt('!'), opt(TypeParams), ...propTail(resetCtx)], // value property — any member name incl computed `[e]: v` (MemberName covers `[Expr]`) [many1(propMod), MemberName, opt('?'), opt('!'), ':', Expr], [MemberName, opt('?'), opt('!'), ':', Expr], @@ -164,12 +220,34 @@ const Prop = rule($ => { ]; }); +// A function/method/accessor/arrow/fn-type RETURN type. Beyond an ordinary Type it may be +// a TYPE PREDICATE `x is T` / `this is T` — a narrowing guard tsc's parser accepts ONLY in +// return position. The SUBJECT is a bare identifier or `this` (a number/string/qualified/ +// parenthesized subject rejects); `await`/`yield` are accepted as ordinary-identifier +// subjects. The `is` TARGET is an ordinary (non-predicate) Type, so `x is y is z` rejects. +// `asserts` predicates are NOT here — they live in the general Type (tsc parses them in any +// position), and a return type written `asserts x` falls through to the Type arm below. +// A function/method/accessor/arrow/fn-type RETURN type. For the PARSER it adds the type +// predicate `x is T` / `this is T` (subject = identifier or `this`; target = an ordinary +// non-predicate Type, so `x is y is z` rejects) on top of an ordinary Type — and the +// predicate appears ONLY here (return position), nowhere else. It stays TRANSPARENT (the +// strict side is a plain `alt`, not a rule), so a normal return is a bare `Type` node — +// identical CST shape to a pre-predicate return slot, leaving AST lowering / cst-match +// unaffected. tsRelax: tree-sitter renders just `Type` here (the predicate lives in its +// general type instead), so adding ReturnType to ~18 slots doesn't inflate its GLR table. +const ReturnType = tsRelax(alt([alt(Ident, 'this'), 'is', Type], Type), Type); + const ClassHeritage = rule($ => [ - Ident, - // (leds below also cover `A?.B` — tsc parses optional chains in heritage cleanly) // Non-constructor primaries: tsc PARSES `extends undefined/true/42/"x"` cleanly - // (rejecting them is the CHECKER's job), so the heritage grammar must too. + // (rejecting them is the CHECKER's job), so the heritage grammar must too. The + // identifier-reference head is reserved-guarded (notReservedExpr, the same guard the + // expression NUD uses): a prefix-operator / statement keyword with NO bare-expression + // role — `void`, `typeof`, `delete`, `enum`, `case`, `throw`, … — is not a valid base + // (tsc parses `extends void {}` as "Expression expected"), while `this`/`await`/`yield`/ + // `async`/plain identifiers are. Literals stay listed first so they keep their leaf scope. + // (leds below also cover `A?.B` — tsc parses optional chains in heritage cleanly) Number_, String_, 'true', 'false', 'null', 'undefined', + [notReservedExpr, Ident], // The heritage clause is a LeftHandSideExpression, not just a dotted name: a // parenthesized expression (`extends (B)`, `extends (cond ? A : B)`) and a class // EXPRESSION (`extends class {}`, `extends class Q extends P {}`) are both valid @@ -194,6 +272,9 @@ const heritageClauses = many(alt( const NewTarget = rule($ => [ Ident, + // a `new` expression is itself a valid new-target (NewExpression : `new` NewExpression), + // so `new new Foo()()` / `new new f` chain — mirrors the Expr `new` arm but recurses here. + ['new', not('<'), $, opt(alt(['<', sep(Type, ','), '>', opt('(', sep(Expr, ','), ')')], ['(', sep(Expr, ','), ')']))], [$, '.', Ident], [$, '[', Expr, ']'], ['(', Expr, ')'], @@ -209,8 +290,18 @@ const Expr = rule($ => [ // (both are one token) goes to the first-listed alternative, so listing the literals // first makes `this`/`true`/… arrive as $keyword leaves — the tree records what the // word IS instead of the bare-identifier fallback winning the tie and stamping Ident. - 'true', 'false', 'null', 'undefined', 'this', 'super', - [notReservedExpr, Ident], + 'true', 'false', 'null', 'undefined', 'this', + // `super` is a CONSTRAINED primary (mirrors tsc's parseSuperExpression): it MUST be + // immediately followed by a call `(args)`, a member `.name`/`.#priv`, or an element + // `[expr]` access. Bare `super`, `super()`, `super?.x`, a super-tagged-template, and + // `super = …` are all parse errors. Modeling super as a bare atom would let the generic + // LEDs (type-arg call, optional chain, tagged template, assignment) attach and re-open + // that whole class; further access chains off the RESULT normally (`super.x()`). + ['super', alt(['(', sep($, ','), ')'], ['.', alt(Ident, PrivateField)], ['[', $, ']'])], + // bare-identifier NUD — excludes `super` AND `new` (reserved one-token text matches + // handled by their own arms above; without these guards a failed `super`/`new` arm would + // slide the keyword in here as an Ident — e.g. `new Foo()` reparsing as `(new < T) > Foo()`). + [not('super'), not('new'), notReservedExpr, Ident], Number_, String_, Template, @@ -233,37 +324,64 @@ const Expr = rule($ => [ [$, '<', sep(Type, ','), '>', not(Expr)], [$, '(', sep($, ','), ')'], [$, '.', alt(Ident, PrivateField)], - // optional chaining: ?.x | ?.#x | ?.(args) | ?.[i] | ?.`…` - [$, '?.', alt(Ident, PrivateField, ['(', sep($, ','), ')'], ['[', $, ']'], Template)], + // optional chaining: ?.x | ?.#x | ?.(args) | ?.[i] | ?.`…` | ?.(args). A private member + // `a?.#x` IS valid current ECMAScript (V8 + Babel accept; tsc's lone parse rejection is a bug + // being removed in TS#60263), so PrivateField stays — the CST producer models the syntax, not + // a tsc-only restriction. Any "no private in optional chain" rule, were it real, would be a + // Static-Semantics check in a CST consumer, never a parse-level exclusion here. + [$, '?.', alt(Ident, PrivateField, ['(', sep($, ','), ')'], ['[', $, ']'], Template, ['<', sep(Type, ','), '>', '(', sep($, ','), ')'])], // optional typed call `a?.(args)` [$, '[', $, ']'], - [$, '!'], // TS non-null assertion — a LHS-chain tail (access can follow: `x!.y`, `x!()`), unlike update `++`/`--` + [$, sameLine, '!'], // TS non-null assertion — RESTRICTED (no line break before `!`, like postfix ++/--); a LHS-chain tail (access can follow: `x!.y`, `x!()`) [$, '?', $, ':', $], [$, 'as', Type], [$, 'instanceof', $], [$, 'in', $], [$, Template], - // new T | new T(args) | new T | new T(args) - ['new', NewTarget, opt(alt( - ['<', sep(Type, ','), '>', opt('(', sep($, ','), ')')], + // `new.target` meta-property — the ONLY form where `new` is not followed by a target. + // Listed before the `new T` arm and matched by the dedicated `new` arms (NOT the bare + // identifier nud, which excludes `new`), so `new Foo()` — where the `new T` arm fails + // on the leading `<` — can no longer fall through to `new` as an identifier and reparse + // as the comparison `(new < T) > Foo()` (tsc: "Expression expected"). + ['new', '.', 'target'], + // new T | new T(args) | new T | new T(args). An optional chain may NOT follow a bare + // `new` (no Arguments): a NewExpression is not a valid `?.` base (the base must be a + // MemberExpression / CallExpression — i.e. a `new` WITH `( )`), so `new a?.b`, `new a?.b`, + // `new class{}?.x`, `new new a()?.x` have no parse tree (tsc + V8 + babel all reject). The + // `not('?.')` guards exactly the no-call exits; `new a()?.b` (Arguments consumed) chains via + // the outer `?.` LED unchanged. + ['new', NewTarget, alt( + ['<', sep(Type, ','), '>', alt(['(', sep($, ','), ')'], not('?.'))], ['(', sep($, ','), ')'], - ))], - ['new', 'class', notReserved, Ident, opt(TypeParams), opt('extends', ClassHeritage), opt('implements', sep(Type, ',')), '{', many(ClassMember), '}', opt('(', sep($, ','), ')')], - ['new', 'class', opt(TypeParams), opt('extends', ClassHeritage), opt('implements', sep(Type, ',')), '{', many(ClassMember), '}', opt('(', sep($, ','), ')')], + not('?.'), + )], + ['new', 'class', notReserved, Ident, opt(TypeParams), opt('extends', ClassHeritage), opt('implements', sep(Type, ',')), '{', many(ClassMember), '}', alt(['(', sep($, ','), ')'], not('?.'))], + ['new', 'class', opt(TypeParams), opt('extends', ClassHeritage), opt('implements', sep(Type, ',')), '{', many(ClassMember), '}', alt(['(', sep($, ','), ')'], not('?.'))], ['[', many(opt($), ','), opt($), ']'], ['{', sep(Prop, ','), '}'], - [opt('async'), opt(TypeParams), '(', sep(Param, ','), ')', opt(':', Type), '=>', alt($, Block)], + // Arrow functions, async/non-async SPLIT so the [Await] grammar parameter routes + // each arm's params + body to the right rule family (await-yield-fork.ts): an async + // arrow's params and body are await-context (`async (a = await) =>` rejects), a + // plain arrow's body resets. Type params/annotations stay PLAIN (not await-context). + // capExpr('?'): an ArrowFunction is the LOWEST-precedence AssignmentExpression — neither a + // binary/logical/conditional operand nor an assignment target — so each arm is capped BELOW + // the conditional `?`: it parses only at an assignment-or-looser minBp and admits no led once + // parsed (`() => {} || a` rejects, NOT `(() => {}) || a`); a `||`/`?:` INSIDE an expression + // body (`() => a || b`) is unaffected. Body `alt(Block, $)` (Block FIRST) = the spec's + // ConciseBody `[lookahead ≠ {] AssignmentExpression | { FunctionBody }`. + capExpr('?', 'async', opt(TypeParams), '(', sep(awaitCtx(Param), ','), ')', opt(":", ReturnType), '=>', awaitCtx(alt(Block, $))), + capExpr('?', opt(TypeParams), '(', sep(Param, ','), ')', opt(":", ReturnType), '=>', resetCtx(alt(Block, $))), // async arrow with a BARE parameter: `async err => …`. tsc requires async and the // parameter on the same line (`async\nx => …` is `async;` then a plain arrow — ASI). // Without this arm the bare form only "parsed" by splitting into two statements. - ['async', sameLine, Ident, '=>', alt($, Block)], - [Ident, '=>', alt($, Block)], + capExpr('?', 'async', sameLine, awaitCtx(notReservedExpr, Ident), '=>', awaitCtx(alt(Block, $))), + capExpr('?', notReservedExpr, Ident, '=>', resetCtx(alt(Block, $))), ['yield', alt(['*', $], [opt($)])], // yield e | yield* e (delegate) | yield ['(', $, many(',', $), ')'], [$, 'satisfies', Type], - ['import', alt(['(', $, ')'], ['.', 'meta'])], + ['import', alt(['(', $, ')'], ['.', 'meta'], ['<', sep(Type, ','), '>', opt('(', sep($, ','), ')')])], // import(e) | import.meta | import(args) (instantiation-expression; checker rejects) PrivateField, HexNumber, OctalNumber, BinaryNumber, BigInt_, - [opt('async'), 'function', opt('*'), opt(notReserved, Ident), opt(TypeParams), '(', sep(Param, ','), ')', opt(':', Type), Block], + ...tsFnArms([opt(notReserved, Ident)], Block), // named vs anonymous kept separate (greedy opt(Ident) would eat a leading // `extends`/`implements`); decorator dimension is a `many` (a class expression may // carry ≥2 decorators, `x = @d @d class C {}`, like the declaration arm below). @@ -321,22 +439,28 @@ const ForBinding = rule($ => [ [alt([notReserved, Ident, opt('!')], BindingPattern), opt(':', Type), opt('=', exclude('in', Expr))], ]); + const Param = rule($ => { const tail = [opt('?'), opt(':', Type), opt('=', Expr)]; // ? : T = E const body = alt( - // NOTE: a plain parameter name is NOT reserved-guarded — `this` is a valid first - // parameter even without an annotation (`function f(this, a)`: the implicit-any - // `this`-param), and `this` is an always-reserved word; guarding here would reject - // that valid form. (A truly reserved param name like `function f(while)` stays an - // accepted over-accept; it's out of this gap's scope.) - [Ident, ...tail], + // The plain-name arm EXCLUDES `this`: tsc's parser treats `this` as a special + // parameter form accepting ONLY bare `this` or `this: T` (the dedicated arm below) + // — `this?`, `this = 1`, `this: T = 1`, and any decorated/modified `this` + // (`@dec this`, `public this`) are parse errors there. Letting `this` match as a + // plain Ident here would re-open that whole class via the tail/decorator/modifier + // paths. (A truly reserved param name like `function f(while)` stays an accepted + // over-accept; it's out of this gap's scope.) + [not('this'), Ident, ...tail], [BindingPattern, ...tail], // a rest element, by contrast, can never validly be a reserved word (`...while`), // and `...this` is invalid too, so guarding the rest name is FN-safe. ['...', alt([notReserved, Ident], BindingPattern), opt('?'), opt(':', Type), opt('=', Expr)], // rest (`?`/initializer are CHECKER errors in tsc, not parse errors) ); return [ - ['this', ':', Type], + // `this`-param: bare `this` or `this: T` ONLY — no `?`, no default, no decorator, + // no modifier (tsc's parser rejects all of those). This is the SOLE way `this` + // reaches param position; the plain-name arm above excludes it. + ['this', opt(':', Type)], // optional decorators + optional parameter modifiers, then the binding. // many1 → with modifiers; the no-modifier branch also catches a param NAMED // like a modifier (`public: T`), which many() would otherwise eat. tsc parses @@ -353,16 +477,25 @@ const ForHead = rule($ => { return [ // declared head: `let/const/var/using/await using ` then C-style or in/of. // ForBinding gives a no-`in` initializer so `for (var a = 1 in xs)` parses. - [alt('let', 'const', 'var', 'using', ['await', 'using']), sep(ForBinding, ','), alt( + // `for (using of of …)` has no parse tree: the spec's `[lookahead != using of]` on the + // `using` ForDeclaration arm suppresses the using-DECL reading, and `using` as an + // identifier then fails (`using of of` reads as two for-of keywords). Guard the exact + // triple only — `for (using of ;…)` (C-style, binding named `of`) and `for (await using + // of of …)` (the await-using arm) stay valid. + [not(['using', 'of', 'of']), alt('let', 'const', 'var', 'using', ['await', 'using']), sep(ForBinding, ','), alt( cTail, - [alt('in', 'of'), Expr], + // the for-in OBJECT is a full Expression (comma included: `for (a in b, c)`); + // for-of takes an AssignmentExpression - no comma (tsc rejects `for (x of a, b)`) + ['in', Expr, many(',', Expr)], + ['of', Expr], )], [opt(Expr, many(',', Expr)), ...cTail], // C-style, no declaration: `for (i=0; …; …)` / `for (;;)` // for-in/of, no declaration: `for (x of xs)`. The target Expr parses in a no-`in` // context (same exclude as binding initializers): the `in` belongs to the for-head, // not to an in-LED inside the target — without it `for (key in obj)` swallowed the // `in`, the arm failed, and the statement fell back to a CALL parse `for(...)`. - [exclude('in', Expr), alt('in', 'of'), Expr], + [exclude('in', Expr), 'in', Expr, many(',', Expr)], + [exclude('in', Expr), 'of', Expr], ]; }); @@ -374,25 +507,30 @@ const SwitchCase = rule($ => [ const Stmt = rule($ => [ Block, - [alt('let', 'const', 'var'), sep(Binding, ','), opt(';')], + [alt('let', 'const', 'var'), sep(Binding, ','), asi()], ['if', '(', Expr, many(',', Expr), ')', $, opt('else', $)], ['for', opt('await'), '(', ForHead, ')', $], ['while', '(', Expr, many(',', Expr), ')', $], ['do', $, 'while', '(', Expr, many(',', Expr), ')', opt(';')], ['switch', '(', Expr, many(',', Expr), ')', '{', many(SwitchCase), '}'], - ['return', opt(Expr, many(',', Expr)), opt(';')], - ['throw', Expr, many(',', Expr), opt(';')], + ['return', opt(Expr, many(',', Expr)), asi()], + ['throw', Expr, many(',', Expr), asi()], // The label is a RESTRICTED production (`break [no LineTerminator here] Label`) // and a label can't be a reserved word — without both, `break` ⏎ `case "X":` // inside a switch eats `case` as the label and the whole switch cascades. - ['break', opt(sameLine, notReserved, Ident), opt(';')], - ['continue', opt(sameLine, notReserved, Ident), opt(';')], + ['break', opt(sameLine, notReserved, Ident), asi()], + ['continue', opt(sameLine, notReserved, Ident), asi()], ['try', Block, opt('catch', opt('(', alt(Param, BindingPattern), ')'), Block), opt('finally', Block)], - [Ident, ':', $], + [notReserved, Ident, ':', $], ';', - ['debugger', opt(';')], + ['debugger', asi()], ['with', '(', Expr, ')', $], - [opt('await'), 'using', sep(Binding, ','), opt(';')], + // A `using` / `await using` declaration binding is a BindingIdentifier — NOT a pattern. The + // `not(alt('[','{'))` routes a `[`/`{` start to the expression arm instead: `using [a] = b` + // is `using[a] = b` (element-assignment on the identifier `using`) and stays valid, while + // `using {a} = b` / `await using [a] = null` (no derivation — V8 + babel reject; tsc is + // lenient on the `{` form) correctly fail. (Guards the first binding; see ForHead for for-of.) + [opt('await'), 'using', not(alt('[', '{')), sep(Binding, ','), asi()], Decl, // ExpressionStatement lookahead restriction (ES2023 §14.5): a statement may not // begin with `function` / `async function` — those are declarations at statement @@ -404,7 +542,7 @@ const Stmt = rule($ => [ // (extends-expression heritage, bare `;` class elements, decorator placements), so // 31 tsc-valid corpus files still rely on the class-EXPRESSION fallback — widen the // declaration arm first, then guard. - [not(alt('function', 'class', ['async', 'function'])), Expr, many(',', Expr), opt(';')], + [not(alt('function', 'class', ['async', 'function'], ['let', '['])), Expr, many(',', Expr), asi()], ]); // ── Type Parameters ── @@ -417,11 +555,16 @@ const TypeParam = rule($ => { // second is the name). Longest-match picks among: const tail = [opt('extends', Type), opt('=', Type)]; const mod = alt('const', 'in', 'out', 'public', 'private', 'protected', 'readonly'); - const name = alt(Ident, 'in', 'out'); // a name may itself be a contextual variance keyword + // The type-param NAME is `notReserved, Ident`: `in` LEXES as an Ident, so an un-guarded + // Ident would wrongly accept it as the name — but `in` is a reserved word there (tsc + // rejects ``/``/`` "'in' is a reserved word"). `notReserved` forbids + // `in` while allowing `out` and the other contextual keywords; `in` stays a variance + // modifier (``/``/`` parse). Guards arm 1's name too. + const name = [notReserved, Ident]; return [ - [many1(mod), Ident, ...tail], // modifier soup + real-ident name: ``, `` - [mod, name, ...tail], // single modifier + in/out-named param: ``, `` - [name, ...tail], // bare name, incl. ``, ``: ``, `` + [many1(mod), ...name, ...tail], // modifier soup + name: ``, ``, `` + [mod, ...name, ...tail], // single modifier + name: ``, `` + [...name, ...tail], // bare name: ``, `` (NOT ``) ]; }); @@ -432,13 +575,13 @@ const TypeParams = rule($ => [ // ── Declarations ── const InterfaceMember = rule($ => { - const callSig = [opt(TypeParams), '(', sep(Param, ','), ')', opt(':', Type)]; // `( … ): Ret` + const callSig = [opt(TypeParams), '(', sep(Param, ','), ')', opt(":", ReturnType)]; // `( … ): Ret` const propOrMethod = alt(callSig, [opt(':', Type)]); // after a name: method | property (bare = implicit any) return [ // call / construct signature (construct = call sig with a leading `new`) [opt('new'), ...callSig], // getter / setter (`get`/`set` as a member NAME falls through to the named branch) - [alt('get', 'set'), MemberName, '(', sep(Param, ','), ')', opt(':', Type)], + [alt('get', 'set'), MemberName, '(', sep(Param, ','), ')', opt(":", ReturnType)], // mapped type: static? (+/-)? readonly? [ K in T (as U)? ] (+/-)? ?? : T [opt('static'), opt(alt('+', '-')), opt('readonly'), '[', Ident, 'in', Type, opt('as', Type), ']', opt(alt('+', '-')), opt('?'), ':', Type], // readonly property (readonly index sig is the bracketed branch below) @@ -470,47 +613,114 @@ const MemberName = rule($ => [ // member's shared `modifiers …` prefix isn't re-parsed per alternative. Inner // alt() is first-match, so branches are ordered specific-before-general // (generator/accessor/index-sig before the MemberName method/field split). -const Modifier = alt('public', 'private', 'protected', 'static', 'abstract', 'readonly', 'override', 'accessor', 'async'); -const callTail = ['(', sep(Param, ','), ')', opt(':', Type), opt(Block), opt(';')] as const; +// A modifier KEYWORD counts as a modifier only when what follows can still be a +// member (tsc's disambiguation): followed by '('/'='/':'/';'/'?'/'!'/'<'/'{'/'}' +// it is the member NAME instead ('public() {}', 'static = 1'). 'declare' is a real +// class modifier; 'export'/'in'/'out' are parse-tolerated by tsc (semantic errors). +// `async` is NOT a generic class-member modifier here: it leads the async/async-generator +// method arms below (which give the body its [Await] context), so the modifier soup must +// not swallow it into a plain method (the class analog of the Decl modifier-prefix fix). +const Modifier = alt([alt('public', 'private', 'protected', 'static', 'abstract', 'readonly', 'override', 'accessor', 'declare', 'export', 'in', 'out', 'const'), not(alt('(', '=', ':', ';', '?', '!', '<', '{', '}'))]); +// A class-member modifier run allows AT MOST ONE `static` — this is SYNTAX, not a deferred +// duplicate-modifier check: ECMAScript's ClassElement production has a single `static` slot, +// and `static static x` is rejected by BOTH tsc AND babel (the only valid reading of a second +// `static` is a member NAME — `static static(){}` / `static static = 1` parse — so once the +// name slot is taken, a trailing field name has no production). Two static MODIFIERS is simply +// not a grammar-sanctioned tree. (Duplicate NON-static modifiers like `public public` are a +// different matter — tsc parses them as a checker error, babel parse-rejects them; we follow +// tsc and keep them in the run as a faithful CST, leaving the duplicate as a downstream +// semantic check.) So the run is: non-static modifiers, then OPTIONALLY one `static` followed +// by more non-static modifiers. (The second `many` sits INSIDE the opt — two adjacent +// delimiter-less `many`s would be ambiguous.) This precise shape DOUBLES the modifier-vs- +// member-name decision boundaries against the member alt, which explodes tree-sitter's GLR +// table — so it is wrapped in tsRelax with plain `many(Modifier)` as the relaxed rendering: a +// legitimate CAPABILITY bridge (GLR cannot express the at-most-one-static refinement cheaply), +// and a highlighter over-accepting `static static` is harmless and measured. +const NonStaticMod = alt([alt('public', 'private', 'protected', 'abstract', 'readonly', 'override', 'accessor', 'declare', 'export', 'in', 'out', 'const'), not(alt('(', '=', ':', ';', '?', '!', '<', '{', '}'))]); +const modRun = tsRelax([many(NonStaticMod), opt('static', many(NonStaticMod))], many(Modifier)); +const callTail = ['(', sep(Param, ','), ')', opt(":", ReturnType), opt(Block), opt(';')] as const; +// Class member ( params ): T body, params+body routed to a [Await]/[Yield] family: +// plain methods reset (a method body has its OWN, non-inherited context — the spec's +// implicit function boundary), generators yield, async await, async-generators both. +// MemberName, type params, and the return type stay OUTSIDE the family (a computed key +// `[e]` is evaluated in the ENCLOSING context, and type positions are not parameterized). +const memTail = (ctx) => ['(', sep(ctx(Param), ','), ')', opt(":", ReturnType), opt(ctx(Block)), opt(';')]; const ClassMember = rule($ => [ ';', // tsc's SemicolonClassElement: `class C { ; }` is parse-clean - DecoratorExpr, - ['constructor', '(', sep(Param, ','), ')', Block, opt(';')], - ['static', Block], + ['constructor', '(', sep(resetCtx(Param), ','), ')', resetCtx(Block), opt(';')], + [many(DecoratorExpr), many(Modifier), 'static', awaitCtx(Block)], // static block body is [+Await] (await reserved); decorators/modifiers parse (SEMANTIC errors) + // decorators PREFIX a member, before any modifier — tsc parse-rejects + // `public @dec method()` ("Decorators are not valid here") and an orphan + // `@dec` with no member, which a standalone sibling alternative tolerated [ - many(Modifier), + many(DecoratorExpr), + modRun, alt( - ['*', MemberName, opt('?'), opt(TypeParams), ...callTail], // generator method - [alt('get', 'set'), MemberName, '(', opt(sep(Param, ',')), ')', opt(':', Type), opt(Block), opt(';')], // accessor - ['[', Ident, ':', Type, ']', ':', Type, opt(';')], // index signature - [MemberName, alt( - [opt('?'), opt(TypeParams), ...callTail], // method (requires `(`) - [opt('!'), opt('?'), opt(':', Type), opt('=', Expr), opt(';')], // field (all-optional → catch-all) + // `async` is order-free among modifiers (tsc parses any order; the checker + // validates), so it carries its own inner modifier run and an async member's + // body is [+Await]/[+Await,+Yield]. + ['async', many(Modifier), '*', MemberName, opt('?'), opt(TypeParams), ...memTail(asyncGenCtx)], // async generator method + ['async', many(Modifier), alt('get', 'set'), MemberName, opt(TypeParams), '(', opt(sep(awaitCtx(Param), ',')), ')', opt(":", ReturnType), opt(awaitCtx(Block)), opt(';')], // async accessor (semantic error; parses) + ['async', many(Modifier), 'static', awaitCtx(Block)], // `async static { }` (semantic error; parses) + ['async', many(Modifier), MemberName, opt('?'), opt(TypeParams), ...memTail(awaitCtx)], // async method + ['*', MemberName, opt('?'), opt(TypeParams), ...memTail(yieldCtx)], // generator method + [alt('get', 'set'), MemberName, opt(TypeParams), '(', opt(sep(resetCtx(Param), ',')), ')', opt(":", ReturnType), opt(resetCtx(Block)), opt(';')], // accessor (type params parse; semantic error) + ['[', Ident, ':', Type, opt(','), ']', opt(':', Type), asi()], // index signature; member separator = ; / newline / } + // a bare identifier `constructor` member MUST be a call signature — tsc rejects a + // `constructor` field/property ("'(' expected"): `constructor;`, `constructor = 1`, + // `constructor: T`, even modified (`public constructor;`). TypeParams parse; `?`/`!` + // do not. A string / #private / computed name `constructor` is NOT the identifier, + // so it stays a valid field (the `not('constructor')` generic arm below covers it). + ['constructor', opt(TypeParams), ...memTail(resetCtx)], + [not('constructor'), MemberName, alt( + [opt('?'), opt(TypeParams), ...memTail(resetCtx)], // method (requires `(`) + // field (all-optional → catch-all). A field NOT ended by ';' must not be + // followed by a SAME-LINE decorator: tsc reads that '@' as belonging to + // THIS property ("Decorators must precede the name and all keywords") — + // `x @dec y()` and `x = 1 @dec y()` reject, `x; @dec` and newline accept + [opt('!'), opt('?'), opt(':', Type), opt('=', resetCtx(Expr)), alt([';'], [not(sameLine)], [not(not('}'))])], )], ), ], // Fallbacks for a member NAMED like a modifier (`static = 1`, `get = 1`, `async() {}`): // many(Modifier) would eat the name, so the member kind alt fails and we land here. - [MemberName, opt('!'), opt('?'), opt(':', Type), opt('=', Expr), opt(';')], - [MemberName, opt('?'), opt(TypeParams), '(', sep(Param, ','), ')', opt(':', Type), opt(Block), opt(';')], + [not('constructor'), MemberName, opt('!'), opt('?'), opt(':', Type), opt('=', resetCtx(Expr)), alt([';'], [not(sameLine)], [not(not('}'))])], + // `constructor` excluded here too (`constructor?()`/`constructor!()` are tsc parse + // errors): every VALID `constructor(…)` is caught by the dedicated arms above, so a + // `constructor` reaching this method fallback is always a malformed form. + [not('constructor'), MemberName, opt('?'), opt(TypeParams), '(', sep(resetCtx(Param), ','), ')', opt(":", ReturnType), opt(resetCtx(Block)), opt(';')], ]); const EnumMember = rule($ => [ [MemberName, opt('=', Expr)], ]); +// Per-specifier `type` modifier (`import { type A }`, `export { type A as B }`). A LONE +// `type` is the specifier NAME (`{ type }`, `{ type as B }`, `{ type, x }`), so the +// modifier reading fires only when a real binding name follows on the same line — the +// not(',', '}', 'as') guard keeps the bare-name reading reachable. +const typeMod = () => opt('type', sameLine, not(alt(',', '}', 'as'))); const ImportSpecifier = rule($ => [ - [Ident, opt('as', Ident)], + [typeMod(), Ident, opt('as', Ident)], // arbitrary module namespace identifier (ES2022): `import { "str" as x }`. The // string form REQUIRES the rename (`{ "a" }` / `{ "a" as "b" }` are tsc parse // errors on the import side — the local binding must be an identifier). - [String_, 'as', Ident], + [typeMod(), String_, 'as', Ident], ]); // Export specifiers are WIDER than import ones: a ModuleExportName (identifier or // string) is valid on BOTH sides and may stand alone (`export { x as "s" }`, // `export { "a" as "b" } from "m"`, `export { "a" }` — all tsc parse-clean). const ExportSpecifier = rule($ => [ + // `type` modifier disambiguation (tsc's multi-token lookahead). `type` is the modifier + // when followed by a real name that ISN'T `as` (arm 1), or by `as` that is itself the + // name — `{ type as }`, no rename target after (arm 2). Otherwise `type` is the name: + // `{ type }`, `{ type as B }` (renamed), `{ type, x }` all take arm 3. + ['type', sameLine, not('as'), not(alt(',', '}')), alt(Ident, String_), opt('as', alt(Ident, String_))], + // name is `as`: `{ type as }` (no rename) or `{ type as as Y }` (DOUBLE as = rename). + // A single `{ type as Y }` is NOT this arm — the not(Ident/String) / second-`as` guard + // rejects it so it falls to arm 3 as name=`type` renamed to Y. + ['type', sameLine, 'as', alt([not(alt(Ident, String_))], ['as', alt(Ident, String_)])], [alt(Ident, String_), opt('as', alt(Ident, String_))], ]); @@ -531,12 +741,15 @@ const Decl = rule($ => [ // leading `function` is preferred as a declaration over an IIFE expression- // statement: Program tries Decl before Stmt, so `function f(){}\n()=>{}` parses // as a declaration + arrow rather than longest-matching `function f(){}()` (IIFE). - [opt('async'), 'function', opt('*'), notReserved, Ident, opt(TypeParams), '(', sep(Param, ','), ')', opt(':', Type), alt(Block, opt(';'))], + ...tsFnArms([notReserved, Ident], alt(Block, [not('{'), opt(';')])), // The declaration NAME slots below carry `notReserved` (same guard as the type-alias // name): a reserved word is not a legal declaration name (`interface void {}`, // `class while {}`, `enum for {}`, `namespace debugger {}` — all TS errors), while a // contextual keyword name (`interface any`, `class string`, `enum number`) stays valid. - ['interface', notReserved, Ident, opt(TypeParams), opt('extends', sep(Type, ',')), '{', many(InterfaceMember, opt(alt(';', ','))), '}'], + // tsc parses REPEATED `extends` clauses on an interface (`interface I extends A + // extends B`) — the parser accepts them and the checker reports the duplicate; + // mirror with many() rather than a single opt() clause. + ['interface', notReserved, Ident, opt(TypeParams), heritageClauses, '{', many(InterfaceMember, opt(alt(';', ','))), '}'], // shared heritage: repeated/order-free extends+implements, `extends Foo?.Bar`, empty `extends {` ['type', notReserved, Ident, opt(TypeParams), '=', Type, opt(';')], // type-alias name can't be a reserved word (`type void = …`); contextual type keywords (`string`/`any`/…) stay valid // class decl: optional decorators + optional `abstract`. gen-tm expands the // opt()/many() to recover the `class Ident … { … }` shape for highlighting. @@ -546,10 +759,36 @@ const Decl = rule($ => [ // Named/anonymous are separate arms, mirroring the class-expression pair above. [many(DecoratorExpr), opt('abstract'), 'class', opt(TypeParams), heritageClauses, '{', many(ClassMember), '}'], ['enum', notReserved, Ident, '{', sep(EnumMember, ','), '}'], - ['declare', 'function', opt('*'), notReserved, Ident, opt(TypeParams), '(', sep(Param, ','), ')', opt(':', Type), opt(';')], + ['declare', 'function', opt('*'), notReserved, Ident, opt(TypeParams), '(', sep(Param, ','), ')', opt(":", ReturnType), opt(';')], + // ambient module shorthand `declare module "foo";` (no body — the module arm below + // requires `{…}`) and `declare global { … }` (global-scope augmentation; `global` + // is a contextual-keyword block, not a namespace name). tsc accepts both. + ['declare', 'module', String_, opt(';')], + ['declare', 'global', '{', many(Stmt), '}'], ['declare', alt($, Stmt)], + // A leading `async`/`abstract` modifier before any declaration: tsc's parser + // accepts it (the checker rejects invalid combinations like `async class`); the + // dedicated arms above (function's async arm, class's opt('abstract')) match + // valid combinations first and keep their flat shape, so only otherwise-invalid + // pairings fall to this modifier-prefix arm. `async` is split out with a + // `not('function')` guard: `async function` MUST take the async-function arm so + // its params/body carry the [Await] context — otherwise this lenient prefix would + // catch the async arm's await-context rejections (e.g. `async function f(a=await)`) + // and re-accept them as a plain function with a stray `async` modifier. + // A leading modifier soup before a declaration — mirrors the decorator-prefix arm + // below (var/let/const/using are Stmt-level forms `$`=Decl alone can't reach). tsc + // parses the soup before any of these (`accessor var x`, `public using y`); invalid + // combinations are the checker's line. Restricted to Decl + var/let/const + using — + // NOT an arbitrary expression statement (`public someExpr;` must stay a reject). + [many1(alt('abstract', 'public', 'private', 'protected', 'readonly', 'static', 'override', 'accessor')), alt( + $, + [alt('let', 'const', 'var'), sep(Binding, ','), asi()], + [opt('await'), 'using', not(alt('[', '{')), Binding, many(',', Binding), opt(';')], + )], + ['async', not('function'), $], ['namespace', notReserved, Ident, many('.', Ident), '{', many(Stmt), '}'], // dotted name: `namespace A.B.C { … }` ['module', alt([notReserved, Ident, many('.', Ident)], String_), '{', many(Stmt), '}'], // `module A.B.C { … }` | `module "x" { … }` + ['export', 'as', 'namespace', notReserved, Ident, opt(';')], // UMD NamespaceExportDeclaration — BEFORE the lenient `export alt($, Stmt)` (else `as` wraps as an expr-statement) ['export', alt($, Stmt)], // decorators before export/default/etc. — tsc allows either order. The variable- // statement alternates mirror tsc's parseDeclaration surface: after decorators it @@ -558,21 +797,22 @@ const Decl = rule($ => [ // statements (`@dec if (…)` is a tsc parse error). [many1(DecoratorExpr), alt( $, - [alt('let', 'const', 'var'), sep(Binding, ','), opt(';')], + [alt('let', 'const', 'var'), sep(Binding, ','), asi()], // `using` requires a real binding here: `@dec using x` is parse-clean but // `using 1` is a tsc parse error (zero-binding `var;` by contrast is clean, // so the var/let/const alternative above keeps the lenient sep()). - [opt('await'), 'using', Binding, many(',', Binding), opt(';')], + [opt('await'), 'using', not(alt('[', '{')), Binding, many(',', Binding), opt(';')], )], // decorators may also sit BETWEEN `export` and `default` (`export @dec default // class C {}` — tsc parses the soup in either spot; ordering is a checker error). ['export', many(DecoratorExpr), 'default', alt( - [opt('async'), 'function', opt('*'), opt(notReserved, Ident), opt(TypeParams), '(', sep(Param, ','), ')', opt(':', Type), alt(Block, opt(';'))], // function + ...tsFnArms([opt(notReserved, Ident)], alt(Block, [not('{'), opt(';')])), // function ['abstract', 'class', notReserved, Ident, opt(TypeParams), heritageClauses, '{', many(ClassMember), '}'], // named abstract class ['abstract', 'class', opt(TypeParams), heritageClauses, '{', many(ClassMember), '}'], // anonymous abstract class + ['interface', notReserved, Ident, opt(TypeParams), heritageClauses, '{', many(InterfaceMember, opt(alt(';', ','))), '}'], // export default interface (interface is not an Expr) [Expr, opt(';')], // catch-all: export default )], - ['export', '*', alt(['from', String_, opt(';')], ['as', Ident, 'from', String_, opt(';')])], + ['export', opt('type'), '*', alt(['from', String_, opt(';')], ['as', alt(Ident, String_), 'from', String_, opt(';')])], // export (type)? * (as ns)? from "m" — alias is a ModuleExportName ['export', '{', sep(ExportSpecifier, ','), '}', opt('from', String_), opt(';')], ['export', '=', Expr, opt(';')], ['export', 'type', '{', sep(ExportSpecifier, ','), '}', opt('from', String_), opt(';')], @@ -580,7 +820,8 @@ const Decl = rule($ => [ ['import', alt( [ImportClause, 'from', String_, opt(';')], // import X from "m" (also `import type from "m"` = default named `type`) ['type', ImportClause, 'from', String_, opt(';')], // import type X from "m" - [Ident, '=', Expr, opt(';')], // import x = expr + ['type', Ident, '=', Expr, opt(';')], // import type X = require(…) / = ns.Foo (type-only import-equals) + [Ident, '=', Expr, opt(';')], // import x = expr (also `import type = …` where `type` is the binding name) [String_, opt(';')], // import "m" )], [many(DecoratorExpr), 'export', alt($, Stmt)], diff --git a/typescriptreact.monarch.json b/typescriptreact.monarch.json index 26748e8..2411f77 100644 --- a/typescriptreact.monarch.json +++ b/typescriptreact.monarch.json @@ -356,10 +356,11 @@ "(?:[a-zA-Z_$]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})(?:[a-zA-Z0-9_$]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})*", { "cases": { - "is": "operator", "keyof": "operator", "typeof": "operator", "readonly": "keyword", + "this": "keyword", + "is": "operator", "abstract": "keyword", "new": "operator", "asserts": "operator", @@ -370,9 +371,7 @@ "null": "keyword", "undefined": "keyword", "void": "operator", - "this": "keyword", "unique": "keyword", - "symbol": "keyword", "import": "keyword", "function": "keyword", "in": "keyword", @@ -380,6 +379,7 @@ "@new": "keyword", "super": "keyword", "instanceof": "operator", + "target": "keyword", "class": "keyword", "implements": "keyword", "async": "keyword", @@ -423,8 +423,8 @@ "interface": "keyword", "type": "keyword", "enum": "keyword", - "namespace": "keyword", "module": "keyword", + "namespace": "keyword", "from": "keyword", "constructor": "keyword", "defer": "keyword", @@ -433,6 +433,7 @@ "number": "keyword", "boolean": "keyword", "object": "keyword", + "symbol": "keyword", "bigint": "keyword", "any": "keyword", "unknown": "keyword", @@ -550,7 +551,7 @@ } ], [ - "(?:[0-9]+(?:_[0-9]+)*(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])", + "(?:(?:0|[1-9][0-9]*(?:_[0-9]+)*)(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])", { "token": "number", "switchTo": "@value" @@ -588,10 +589,6 @@ "(?:[a-zA-Z_$]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})(?:[a-zA-Z0-9_$]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})*", { "cases": { - "is": { - "token": "operator", - "switchTo": "@root" - }, "keyof": { "token": "operator", "switchTo": "@root" @@ -604,6 +601,14 @@ "token": "keyword", "switchTo": "@root" }, + "this": { + "token": "keyword", + "switchTo": "@value" + }, + "is": { + "token": "operator", + "switchTo": "@root" + }, "abstract": { "token": "keyword", "switchTo": "@root" @@ -644,18 +649,10 @@ "token": "operator", "switchTo": "@root" }, - "this": { - "token": "keyword", - "switchTo": "@value" - }, "unique": { "token": "keyword", "switchTo": "@root" }, - "symbol": { - "token": "keyword", - "switchTo": "@value" - }, "import": { "token": "keyword", "switchTo": "@root" @@ -684,6 +681,10 @@ "token": "operator", "switchTo": "@root" }, + "target": { + "token": "keyword", + "switchTo": "@root" + }, "class": { "token": "keyword", "switchTo": "@root" @@ -856,11 +857,15 @@ "token": "keyword", "switchTo": "@root" }, - "namespace": { + "module": { "token": "keyword", "switchTo": "@root" }, - "module": { + "global": { + "token": "variable", + "switchTo": "@value" + }, + "namespace": { "token": "keyword", "switchTo": "@root" }, @@ -896,6 +901,10 @@ "token": "keyword", "switchTo": "@value" }, + "symbol": { + "token": "keyword", + "switchTo": "@value" + }, "bigint": { "token": "keyword", "switchTo": "@value" @@ -984,10 +993,6 @@ "token": "variable", "switchTo": "@value" }, - "global": { - "token": "variable", - "switchTo": "@value" - }, "globalThis": { "token": "variable", "switchTo": "@value" @@ -1061,7 +1066,7 @@ "include": "@exprBody" }, [ - "/(?:[^/\\\\\\[\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])+/[gimsuydv]*", + "/(?:[^/\\\\\\[*\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])(?:[^/\\\\\\[\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])*/(?:[a-zA-Z0-9_$]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})*", { "token": "regexp", "switchTo": "@value" @@ -1119,7 +1124,7 @@ "number" ], [ - "(?:[0-9]+(?:_[0-9]+)*(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])", + "(?:(?:0|[1-9][0-9]*(?:_[0-9]+)*)(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])", "number" ], [ @@ -1142,10 +1147,11 @@ "(?:[a-zA-Z_$]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})(?:[a-zA-Z0-9_$]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})*", { "cases": { - "is": "operator", "keyof": "operator", "typeof": "operator", "readonly": "keyword", + "this": "keyword", + "is": "operator", "abstract": "keyword", "new": "operator", "asserts": "operator", @@ -1156,9 +1162,7 @@ "null": "keyword", "undefined": "keyword", "void": "operator", - "this": "keyword", "unique": "keyword", - "symbol": "keyword", "import": "keyword", "function": "keyword", "in": "keyword", @@ -1166,6 +1170,7 @@ "@new": "keyword", "super": "keyword", "instanceof": "operator", + "target": "keyword", "class": "keyword", "implements": "keyword", "async": "keyword", @@ -1209,8 +1214,9 @@ "interface": "keyword", "type": "keyword", "enum": "keyword", - "namespace": "keyword", "module": "keyword", + "global": "variable", + "namespace": "keyword", "from": "keyword", "constructor": "keyword", "defer": "keyword", @@ -1219,6 +1225,7 @@ "number": "keyword", "boolean": "keyword", "object": "keyword", + "symbol": "keyword", "bigint": "keyword", "any": "keyword", "unknown": "keyword", @@ -1241,14 +1248,13 @@ "process": "variable", "require": "variable", "exports": "variable", - "global": "variable", "globalThis": "variable", "@default": "identifier" } } ], [ - "/(?:[^/\\\\\\[\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])+/[gimsuydv]*", + "/(?:[^/\\\\\\[*\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])(?:[^/\\\\\\[\\n]|\\\\[^\\n\\r\\u2028\\u2029]|\\[(?:[^\\]\\\\\\n]|\\\\[^\\n\\r\\u2028\\u2029])*\\])*/(?:[a-zA-Z0-9_$]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})*", "regexp" ], [ diff --git a/typescriptreact.tmLanguage.json b/typescriptreact.tmLanguage.json index 4c4b529..6f4b263 100644 --- a/typescriptreact.tmLanguage.json +++ b/typescriptreact.tmLanguage.json @@ -124,14 +124,17 @@ "include": "#import-default-binding" }, { - "include": "#type-predicate-operator" + "include": "#keyof-typekw" }, { - "include": "#keyof-typekw" + "include": "#type-predicate-operator" }, { "include": "#extends-typekw" }, + { + "include": "#unique-typekw" + }, { "include": "#as-typekw" }, @@ -160,10 +163,10 @@ "include": "#scope-keyword-operator-expression" }, { - "include": "#scope-keyword-operator-expression-is" + "include": "#scope-keyword-operator-expression-keyof" }, { - "include": "#scope-keyword-operator-expression-keyof" + "include": "#scope-keyword-operator-expression-is" }, { "include": "#scope-keyword-operator-expression-asserts" @@ -171,9 +174,6 @@ { "include": "#scope-keyword-operator-expression-infer" }, - { - "include": "#scope-keyword-operator-expression-as" - }, { "include": "#scope-keyword-operator-expression-satisfies" }, @@ -183,9 +183,6 @@ { "include": "#scope-keyword-control-loop" }, - { - "include": "#scope-keyword-control-loop-of" - }, { "include": "#scope-keyword-control-flow" }, @@ -210,9 +207,6 @@ { "include": "#scope-storage-modifier" }, - { - "include": "#scope-storage-modifier-accessibility" - }, { "include": "#scope-keyword-other-extends" }, @@ -256,10 +250,10 @@ "include": "#scope-constant-language-null" }, { - "include": "#scope-support-type-primitive" + "include": "#this-literal" }, { - "include": "#this-literal" + "include": "#scope-support-type-primitive" }, { "include": "#super-literal" @@ -958,7 +952,7 @@ }, "regex-literal-prefix-ops": { "name": "string.regexp.tsx", - "begin": "(?:(?<=[=|\\^&<>+\\-*%~,\\[(?:{;.])|(?<=\\bis)|(?<=\\bkeyof)|(?<=\\btypeof)|(?<=\\breadonly)|(?<=\\bnew)|(?<=\\bextends)|(?<=\\bin)|(?<=\\bas)|(?<=\\b@new)|(?<=\\binstanceof)|(?<=\\bclass)|(?<=\\basync)|(?<=\\byield)|(?<=\\bsatisfies)|(?<=\\bfunction)|(?<=\\bget)|(?<=\\bset)|(?<=\\bpublic)|(?<=\\bprivate)|(?<=\\bprotected)|(?<=\\bstatic)|(?<=\\babstract)|(?<=\\boverride)|(?<=\\baccessor)|(?<=\\bexport)|(?<=\\bdeclare)|(?<=\\bout)|(?<=\\belse)|(?<=\\bdo)|(?<=\\breturn)|(?<=\\bthrow)|(?<=\\btry)|(?<=\\bfinally)|(?<=\\bcatch)|(?<=\\bof)|(?<=\\bcase)|(?<=\\busing)|(?<=\\bdefault)|(?<=\\bimport)|(?<=\\btype)|(?<=\\bvoid)|(?<=\\bdelete)|(?<=\\bawait)|(?<=^))\\s*([!](?:\\s*[!])*)\\s*(?:((?:/\\*\\*(?!/)[\\s\\S]*?\\*/|/\\*[\\s\\S]*?\\*/)\\s*))?(/)(?![*/])", + "begin": "(?:(?<=[=|\\^&<>+\\-*%~,\\[(?:{;.])|(?<=\\bkeyof)|(?<=\\btypeof)|(?<=\\breadonly)|(?<=\\bis)|(?<=\\bnew)|(?<=\\bextends)|(?<=\\bunique)|(?<=\\bin)|(?<=\\bas)|(?<=\\b@new)|(?<=\\binstanceof)|(?<=\\bclass)|(?<=\\basync)|(?<=\\byield)|(?<=\\bsatisfies)|(?<=\\bfunction)|(?<=\\bget)|(?<=\\bset)|(?<=\\bpublic)|(?<=\\bprivate)|(?<=\\bprotected)|(?<=\\bstatic)|(?<=\\babstract)|(?<=\\boverride)|(?<=\\baccessor)|(?<=\\bexport)|(?<=\\bdeclare)|(?<=\\bout)|(?<=\\belse)|(?<=\\bdo)|(?<=\\breturn)|(?<=\\bthrow)|(?<=\\btry)|(?<=\\bfinally)|(?<=\\bcatch)|(?<=\\bof)|(?<=\\bcase)|(?<=\\bdefault)|(?<=\\bimport)|(?<=\\btype)|(?<=\\bconstructor)|(?<=\\bvoid)|(?<=\\bdelete)|(?<=\\bawait)|(?<=^))\\s*([!](?:\\s*[!])*)\\s*(?:((?:/\\*\\*(?!/)[\\s\\S]*?\\*/|/\\*[\\s\\S]*?\\*/)\\s*))?(/)(?![*/])", "beginCaptures": { "1": { "name": "keyword.operator.logical.prefix.tsx" @@ -970,7 +964,7 @@ "name": "punctuation.definition.string.begin.regexp.tsx" } }, - "end": "(/)([gimsuydv]*)", + "end": "(/)([a-z]*)", "endCaptures": { "1": { "name": "punctuation.definition.string.end.regexp.tsx" @@ -1710,7 +1704,7 @@ }, "number": { "name": "constant.numeric.decimal.tsx", - "match": "(?:[0-9]+(?:_[0-9]+)*(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])" + "match": "(?:(?:0|[1-9][0-9]*(?:_[0-9]+)*)(?:\\.[0-9]*(?:_[0-9]+)*)?|\\.[0-9]+(?:_[0-9]+)*)(?:[eE][+\\-]?[0-9]+(?:_[0-9]+)*)?(?![0-9A-Za-z_$\\\\])" }, "template": { "name": "string.quoted.other.template.tsx", @@ -2905,7 +2899,7 @@ "name": "keyword.operator.expression.keyof.tsx" } }, - "end": "(?=[)}{\\],;=>]|\\b(?:is|keyof|extends|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", + "end": "(?=[)}{\\],;=>]|\\b(?:keyof|is|extends|unique|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", "patterns": [ { "include": "#type" @@ -2920,7 +2914,22 @@ "name": "keyword.other.extends.extends.tsx" } }, - "end": "(?=[)}{\\],;=>]|\\b(?:is|keyof|extends|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", + "end": "(?=[)}{\\],;=>]|\\b(?:keyof|is|extends|unique|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", + "patterns": [ + { + "include": "#type" + } + ] + }, + "unique-typekw": { + "name": "meta.type.unique.tsx", + "begin": "\\b(unique)\\b", + "beginCaptures": { + "1": { + "name": "keyword.other.unique.tsx" + } + }, + "end": "(?=[)}{\\],;=>]|\\b(?:keyof|is|extends|unique|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", "patterns": [ { "include": "#type" @@ -2929,13 +2938,13 @@ }, "as-typekw": { "name": "meta.type.as.tsx", - "begin": "\\b(as)\\b(?=\\s+[[:alpha:][:digit:]_$\"`({\\[\\-]|\\s*$)", + "begin": "\\b(as)\\b", "beginCaptures": { "1": { "name": "keyword.operator.expression.as.tsx" } }, - "end": "(?=[)}{\\],;=>]|\\b(?:is|keyof|extends|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", + "end": "(?=[)}{\\],;=>]|\\b(?:keyof|is|extends|unique|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", "patterns": [ { "include": "#type" @@ -2950,7 +2959,7 @@ "name": "keyword.other.extends.implements.tsx" } }, - "end": "(?=[)}{\\],;=>]|\\b(?:is|keyof|extends|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", + "end": "(?=[)}{\\],;=>]|\\b(?:keyof|is|extends|unique|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", "patterns": [ { "include": "#type" @@ -2965,7 +2974,7 @@ "name": "keyword.operator.expression.satisfies.tsx" } }, - "end": "(?=[)}{\\],;=>]|\\b(?:is|keyof|extends|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", + "end": "(?=[)}{\\],;=>]|\\b(?:keyof|is|extends|unique|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", "patterns": [ { "include": "#type" @@ -3009,17 +3018,17 @@ ] }, "scope-keyword-operator-expression": { - "match": "\\b(typeof|new|void|instanceof|delete)\\b", - "name": "keyword.operator.expression.tsx" - }, - "scope-keyword-operator-expression-is": { - "match": "\\b(is)\\b(?=\\s+[[:alpha:][:digit:]_$\"`({\\[\\-]|\\s*$)", + "match": "\\b(typeof|new|void|as|instanceof|delete)\\b", "name": "keyword.operator.expression.tsx" }, "scope-keyword-operator-expression-keyof": { "match": "\\b(keyof)\\b(?=\\s+[[:alpha:][:digit:]_$\"`({\\[\\-]|\\s*$)", "name": "keyword.operator.expression.tsx" }, + "scope-keyword-operator-expression-is": { + "match": "\\b(is)\\b(?=\\s+[[:alpha:][:digit:]_$\"`({\\[\\-]|\\s*$)", + "name": "keyword.operator.expression.tsx" + }, "scope-keyword-operator-expression-asserts": { "match": "\\b(asserts)\\b(?=\\s+[[:alpha:][:digit:]_$\"`({\\[\\-]|\\s*$)", "name": "keyword.operator.expression.tsx" @@ -3028,20 +3037,12 @@ "match": "\\b(infer)\\b(?=\\s+[[:alpha:][:digit:]_$\"`({\\[\\-]|\\s*$)", "name": "keyword.operator.expression.tsx" }, - "scope-keyword-operator-expression-as": { - "match": "\\b(as)\\b(?=\\s+[[:alpha:][:digit:]_$\"`({\\[\\-]|\\s*$)", - "name": "keyword.operator.expression.tsx" - }, "scope-keyword-operator-expression-satisfies": { "match": "\\b(satisfies)\\b(?=\\s+[[:alpha:][:digit:]_$\"`({\\[\\-]|\\s*$)", "name": "keyword.operator.expression.tsx" }, "scope-storage-modifier": { - "match": "\\b(readonly|async|static|declare)\\b", - "name": "storage.modifier.tsx" - }, - "scope-storage-modifier-accessibility": { - "match": "\\b(abstract|public|private|protected|override|accessor)\\b(?=\\s+(?:\\.\\.\\.|[[:alpha:]_$\\[*#{\"'0-9]))", + "match": "\\b(readonly|abstract|async|public|private|protected|static|override|accessor|declare)\\b", "name": "storage.modifier.tsx" }, "scope-keyword-other-extends": { @@ -3061,11 +3062,11 @@ "name": "constant.language.null.tsx" }, "scope-support-type-primitive": { - "match": "\\b(void|symbol|string|number|boolean|object|bigint|any|unknown|never)\\b", + "match": "\\b(void|string|number|boolean|object|symbol|bigint|any|unknown|never)\\b", "name": "support.type.primitive.tsx" }, "scope-keyword-other": { - "match": "\\b(unique|@new|meta|out)\\b", + "match": "\\b(unique|@new|target|meta|out)\\b", "name": "keyword.other.tsx" }, "scope-keyword-control-import": { @@ -3073,15 +3074,11 @@ "name": "keyword.control.import.tsx" }, "scope-storage-type-function": { - "match": "\\b(function)\\b", + "match": "\\b(function|constructor)\\b", "name": "storage.type.function.tsx" }, "scope-keyword-control-loop": { - "match": "\\b(in|for|while|do|break|continue)\\b", - "name": "keyword.control.loop.tsx" - }, - "scope-keyword-control-loop-of": { - "match": "\\b(of)\\b(?=\\s+[[:alpha:][:digit:]_$\"`({\\[\\-]|\\s*$|\\s*[({\\[\"`/\\-])", + "match": "\\b(in|for|while|do|break|continue|of)\\b", "name": "keyword.control.loop.tsx" }, "scope-storage-type-class": { @@ -3133,11 +3130,11 @@ "name": "storage.type.enum.tsx" }, "scope-storage-type-namespace": { - "match": "\\b(namespace|module)\\b", + "match": "\\b(module|namespace)\\b", "name": "storage.type.namespace.tsx" }, "scope-support-variable": { - "match": "\\b(module|console|window|document|process|require|exports|global|globalThis)\\b", + "match": "\\b(module|global|console|window|document|process|require|exports|globalThis)\\b", "name": "support.variable.tsx" }, "scope-keyword-control-from-from": { @@ -3424,10 +3421,10 @@ "include": "#import-default-binding" }, { - "include": "#type-predicate-operator" + "include": "#keyof-typekw" }, { - "include": "#keyof-typekw" + "include": "#type-predicate-operator" }, { "include": "#extends-typekw" @@ -3460,10 +3457,10 @@ "include": "#scope-keyword-operator-expression" }, { - "include": "#scope-keyword-operator-expression-is" + "include": "#scope-keyword-operator-expression-keyof" }, { - "include": "#scope-keyword-operator-expression-keyof" + "include": "#scope-keyword-operator-expression-is" }, { "include": "#scope-keyword-operator-expression-asserts" @@ -3471,9 +3468,6 @@ { "include": "#scope-keyword-operator-expression-infer" }, - { - "include": "#scope-keyword-operator-expression-as" - }, { "include": "#scope-keyword-operator-expression-satisfies" }, @@ -3520,10 +3514,10 @@ "include": "#scope-constant-language-null" }, { - "include": "#scope-support-type-primitive" + "include": "#this-literal" }, { - "include": "#this-literal" + "include": "#scope-support-type-primitive" }, { "include": "#super-literal" @@ -3656,10 +3650,10 @@ "include": "#scope-keyword-operator-expression" }, { - "include": "#scope-keyword-operator-expression-is" + "include": "#scope-keyword-operator-expression-keyof" }, { - "include": "#scope-keyword-operator-expression-keyof" + "include": "#scope-keyword-operator-expression-is" }, { "include": "#scope-keyword-operator-expression-asserts" @@ -3667,9 +3661,6 @@ { "include": "#scope-keyword-operator-expression-infer" }, - { - "include": "#scope-keyword-operator-expression-as" - }, { "include": "#scope-keyword-operator-expression-satisfies" }, @@ -3757,7 +3748,7 @@ }, "regex": { "name": "string.regexp.tsx", - "begin": "(?:(?<=[=|\\^&<>+\\-*%~,\\[(?:{;.])|(?<=\\bis)|(?<=\\bkeyof)|(?<=\\btypeof)|(?<=\\breadonly)|(?<=\\bnew)|(?<=\\bextends)|(?<=\\bin)|(?<=\\bas)|(?<=\\b@new)|(?<=\\binstanceof)|(?<=\\bclass)|(?<=\\basync)|(?<=\\byield)|(?<=\\bsatisfies)|(?<=\\bfunction)|(?<=\\bget)|(?<=\\bset)|(?<=\\bpublic)|(?<=\\bprivate)|(?<=\\bprotected)|(?<=\\bstatic)|(?<=\\babstract)|(?<=\\boverride)|(?<=\\baccessor)|(?<=\\bexport)|(?<=\\bdeclare)|(?<=\\bout)|(?<=\\belse)|(?<=\\bdo)|(?<=\\breturn)|(?<=\\bthrow)|(?<=\\btry)|(?<=\\bfinally)|(?<=\\bcatch)|(?<=\\bof)|(?<=\\bcase)|(?<=\\busing)|(?<=\\bdefault)|(?<=\\bimport)|(?<=\\btype)|(?<=\\bvoid)|(?<=\\bdelete)|(?<=\\bawait)|(?<=^))\\s*(?:((?:/\\*\\*(?!/)[\\s\\S]*?\\*/|/\\*[\\s\\S]*?\\*/)\\s*))?(/)(?![*/])", + "begin": "(?:(?<=[=|\\^&<>+\\-*%~,\\[(?:{;.])|(?<=\\bkeyof)|(?<=\\btypeof)|(?<=\\breadonly)|(?<=\\bis)|(?<=\\bnew)|(?<=\\bextends)|(?<=\\bunique)|(?<=\\bin)|(?<=\\bas)|(?<=\\b@new)|(?<=\\binstanceof)|(?<=\\bclass)|(?<=\\basync)|(?<=\\byield)|(?<=\\bsatisfies)|(?<=\\bfunction)|(?<=\\bget)|(?<=\\bset)|(?<=\\bpublic)|(?<=\\bprivate)|(?<=\\bprotected)|(?<=\\bstatic)|(?<=\\babstract)|(?<=\\boverride)|(?<=\\baccessor)|(?<=\\bexport)|(?<=\\bdeclare)|(?<=\\bout)|(?<=\\belse)|(?<=\\bdo)|(?<=\\breturn)|(?<=\\bthrow)|(?<=\\btry)|(?<=\\bfinally)|(?<=\\bcatch)|(?<=\\bof)|(?<=\\bcase)|(?<=\\bdefault)|(?<=\\bimport)|(?<=\\btype)|(?<=\\bconstructor)|(?<=\\bvoid)|(?<=\\bdelete)|(?<=\\bawait)|(?<=^))\\s*(?:((?:/\\*\\*(?!/)[\\s\\S]*?\\*/|/\\*[\\s\\S]*?\\*/)\\s*))?(/)(?![*/])", "beginCaptures": { "1": { "name": "comment.block.tsx" @@ -3766,7 +3757,7 @@ "name": "punctuation.definition.string.begin.regexp.tsx" } }, - "end": "(/)([gimsuydv]*)", + "end": "(/)([a-z]*)", "endCaptures": { "1": { "name": "punctuation.definition.string.end.regexp.tsx" @@ -3901,7 +3892,7 @@ "include": "$self" } ], - "while": "^(?=\\s*(?:[<,\\[|&(...?:{;\\-.!*]|(?:is|keyof|typeof|readonly|abstract|new|asserts|extends|infer|true|false|null|undefined|void|this|unique|symbol)\\b|//|/\\*|[>\\])}](?:\\s*[>\\])}])*\\s*(?=[<,\\[|&(...?:{;\\-.!*=])|(?!(?:if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|let|const|var|using|function|constructor|class|interface|type|enum|namespace|module|public|private|protected|static|override|declare|async|accessor|get|set)\\b)(?:[a-zA-Z_$\\p{L}\\p{Nl}]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})(?:[a-zA-Z0-9_$\\p{L}\\p{Nl}\\p{Nd}\\p{Mn}\\p{Mc}\\p{Pc}]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})*\\b(?!\\s*[.(])))" + "while": "^(?=\\s*(?:[<,\\[|&(...?:{;\\-.!*]|(?:keyof|typeof|readonly|this|is|abstract|new|asserts|extends|infer|true|false|null|undefined|void|unique)\\b|//|/\\*|[>\\])}](?:\\s*[>\\])}])*\\s*(?=[<,\\[|&(...?:{;\\-.!*=])|(?!(?:if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|let|const|var|using|function|constructor|class|interface|type|enum|namespace|module|public|private|protected|static|override|declare|async|accessor|get|set)\\b)(?:[a-zA-Z_$\\p{L}\\p{Nl}]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})(?:[a-zA-Z0-9_$\\p{L}\\p{Nl}\\p{Nd}\\p{Mn}\\p{Mc}\\p{Pc}]|\\\\u[0-9A-Fa-f]{4}|\\\\u\\{[0-9A-Fa-f]+\\})*\\b(?!\\s*[.(])))" }, "type-object": { "name": "meta.object-type.tsx", @@ -3965,7 +3956,7 @@ "name": "keyword.operator.expression.is.tsx" } }, - "end": "(?=[)}{\\],;=>]|\\b(?:is|keyof|extends|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", + "end": "(?=[)}{\\],;=>]|\\b(?:keyof|is|extends|unique|as|implements|satisfies|if|else|switch|case|default|for|while|do|in|of|break|continue|return|await|yield|try|catch|finally|throw|debugger|with|import|defer|export|from|typeof|instanceof|new|delete|void|asserts|infer)\\b)", "patterns": [ { "include": "#type"