Total parse/edit: never crash on input, errors as data, equivalence gated (closes #39)#40
Merged
Conversation
…lence class parse/edit on the handle API never crash on input: the STRICT pass runs first (valid path byte-identical, full PEG arm exploration - gated by test/recovery.ts section 1 and the untouched parity suite), and only a strict reject re-parses under the recovery machinery: - Repetition recovery at spine-shaped loops (ref / alt-of-refs elements; deep-FIRST hooks measured 273-error cascades from arm probing and were reverted): a failing element absorbs tokens into an $error row up to the element FIRST set / the enclosing seq's follower literal / EOF. - BAR DISCIPLINE keeps recovery equivalence-safe and arm-blind: fires only where parsing is STUCK AT a strict-proven fail point (pos <= bar <= maxPos <= bar+2, stateless so losing arms cannot consume bars); failures past the bars abort the attempt and mint the next bar (32-attempt cap degrades to deterministic free-fire). The runParse safety net obeys the same discipline. - The lexer recovers under the same flag (error tokens + structured diagnostics; window truncation keeps the LEX_RETRY regrow path). - Diagnostics are DERIVED, not collected: $error rows found by descending the structurally-propagated rowRM spine (per-pass candidate lists double-counted under stateless re-adoption); lexer diagnostics live as structured entries formatted at settle time (stored message strings would embed stale offsets), maintained by the window splice and shifted by surgery. - Recovered streams break two strict-era invariants, both fixed: windowed relexing must anchor BELOW the earliest lexer diagnostic before the damage (a dangling quote pairs with a later edit - backward coupling; forward coupling is already guarded by resync equality), and rows built during a recovering pass may under-record their probe watermark when any arm fired recovery (recFires stamping refuses them to strict adoption; relocate-path surgery also normalizes copied prefix rels - an end-relative value below the remapped rowNF boundary would drift on every later length update). - '>' splits disable adoption for the rest of the parse (the frozen damage mapping is invalid after a mid-parse token-index shift). Gates: incremental-verify reworked to total semantics (every step compares tree+errors against a fresh recovering handle, 128 steps 0 mismatch), multi-doc reworked (60 interleaved steps incl. broken text, contract 9/9), 31/31 suite, strict parity 0 mismatches. KNOWN RESIDUAL (test/recovery.ts, not yet registered): typing-through- invalid session diverges at 1 of 20 keystrokes - a strict pass-1 edit ADOPTING over a post-recovery tree drops one Pratt wrap layer vs a fresh strict parse (single-keystroke repro in the gate; suspected adoption interplay with LED chains on recovering-built substrate).
…valence gated
The residual typing-session divergence traced to a watermark contract
violation that PREDATES recovery and was latent in strict incremental
parsing: a Pratt rule's winning row is finishNode'd BEFORE its failed
LED extension arms run (the NUD/shorter candidate survives the longest
match), so rowExt under-records the rule's true probe extent. The memo
watermark (maxPos at parseRuleEntry exit) was always correct - but the
memo dies with its generation, and ADOPTION reads the row. An edit
landing inside a failed arm's reads then kept a stale row alive ('const
x = f' adopted with ext=4 while typing ')' at token 20 turns the failed
call arm into a successful one). Strict sessions never caught it
because the texts that exercise it (unclosed calls) REJECT, and the
reject was the firewall; total parsing keeps such trees alive.
Fix: write the rule-level watermark back to the row at memo-store time
(rowExt[result] = max(rowExt, maxPos - start)). This subsumes the
recFires mode stamp (removed - rowRM is purely structural again for the
diagnostics walk), restoring broad strict adoption over recovered
substrates: broken-state keystrokes on 9MB dropped from ~1.6s to the
~0.3s bar-iteration cost (valid-state keystrokes stay at 0.05ms).
test/recovery.ts now fully green and REGISTERED (32/32): valid corpus
byte-identical to strict with empty errors, invalid corpus total and
deterministic, the char-by-char typing session 20/20 keystrokes
equivalent to fresh parses (tree AND errors). The interpreter gains
parseTotal/edit parity (no recovery machinery: degrades to a zero-width
$error root with the strict diagnostic).
incremental-verify 128 steps 0 mismatch, multi-doc 60 steps contract
9/9, strict parity 0 mismatches, lexer streams byte-identical, batch in
band (11.2x), agnostic 9/9.
The seeded mutation lists never inserted a bare ';' — splitting an existing expression's structure (f(a;, b) / (a +; b) / obj.m(;1).n) was covered only by the general machinery, not exercised. Both gates' INSERT pools gain ';' and the glue list gains three explicit break-then-compare pairs; verified break ≡ fresh and restore ≡ original byte-identically (tree and errors) before pinning. Observation for the conformance backlog: several of these broken shapes parse with ZERO errors - the strict grammar itself accepts them (over-accept surface, identical on both engines), not a recovery artifact.
…onsistency The incremental/recovery gates were TypeScript-only while every grammar shares the emitted runtime - the non-TS incremental behavior (markup lexer modes, the fallback-lexer path, other token algebras) was ungated. test/incremental-grammars.ts closes that: generative inputs (grammar-gen) per grammar x seeded char-level edit sessions, each step checking (1) edited tree + errors byte-identical to a fresh handle parse, (2) tree self-consistency - every span inside its ancestors (the engine-internal invariant an external compare misses when both sides share a corruption; the aggressiveChecks idea), and (3) totality. It immediately found three real holes, all fixed: - totalNet pushed its diagnostic into the VIEW layer, which the next settle rebuild wiped on exactly one side (now a kind-4 source entry formatted at settle - verbatim engine message). - the fallback-lexer full-relex path never cleared persisted docLex, so a totality-net diagnostic outlived the edit that fixed the text. - the window resync retracts the duplicated token push (tokN--) but left the lexer diagnostic emitted FOR that token: the persisted entry survives via the suffix shift AND the window's copy stayed - the same character double-reported. Retraction now pops the window's own entries at/after the retracted token (lexDiagBase floor). 672/672 steps across typescript/javascript/typescriptreact/ javascriptreact/yaml/html/vue (489 exercising recovery). 33/33 suite, lexer streams byte-identical, parser parity 0 mismatches, batch in band.
…erved
Required token matchers in recovering mode now synthesize a zero-width
\$missing leaf (expected identity in rowStart, LIT_NAMES/K_NAMES inverse for
the message) instead of failing, so 'const x = f(1, 2;' keeps its Call shape
and reports "expected ')'", and 'function g() { return 1;' closes the body
with "expected '}'". Synthesis is budget-free and position-pure: it fires
iff a recovery bar lies in [pos, pos+2] (missAt), never under probing
(not()/optional/separator probes) and never in free-fire.
Zero-width success is a synthesis-only artifact (a strict zero-width element
would never terminate its loop), so every loop discards it: plain reps break
on pos===before alone (restoring scn), hooked reps discard + recoverSkip,
leftRec continuations and Pratt LEDs refuse zero-width wraps. A rule can
still re-enter ITSELF at the same position through a synthesized leading
token — an unbounded recursion no grammar shape rules out — so recovering
runs keep a (rule, pos) in-progress set and fail the re-entry (PEG cycle
semantics; recRunning, zero strict-path cost). That sentinel also dissolved
the bar +1 ladders the recursion crashes were minting: broken-doc recovery
drops ~9x in the incremental gate (10.7s -> 1.2s).
Equivalence (edit == fresh) exposed that the bar protocol's input was not
adoption-invariant; three structural fixes:
- frameMax: a frame-local advance watermark (reset to the rule's start at
entry, folded into the parent on exit) replaces the global maxPos in
rowExt/memo watermarks, making recorded probe reaches EXACT instead of
contaminated by earlier-sibling probes. Bars (= strict-fail maxPos) now
reconstruct identically under adoption; the hot advance pays one extra
compare only at frontier breaches (frameMax <= maxPos nests the updates).
This also closes the recorded "exact per-frame extents" backlog item and
lands the bar on the true farthest probe (no more phantom synthesis from
inflated memo-jump watermarks).
- Recovery runs are adoption-free (edit-side attempt loop AND the
lex-recovered first run): a row recorded under a recovering frame carries
that run's bar-dependent reach, so replaying it makes the next bar a
function of the OLD bar history instead of (text, bars). Attempt 0 (empty
bars, behaviorally strict) re-derives the true strict frontier; every
attempt is byte-equal to the fresh side's. The barIn adoption-refusal
window from the first synthesis attempt is dead under this rule and
removed; adoptSeek's recovering rowRM bypass likewise.
- trySurgery refuses recovery-made trees (rowRM reaches the root
structurally): a strict splice into kept \$error/\$missing siblings was a
fake strict success that froze the OLD text's recovery shape, shifted.
Gates: incremental-grammars 672/672 across 7 grammars; recovery.ts gains a
synthesis-quality section (exact diagnostics + \$missing presence) and 4
session-found invalid shapes; incremental-verify gains the 5 protocol-pin
GLUE pairs; multi-doc 60/60 + contract 9/9; check suite 33/33; corpus
parity 401/401 sample, lexer parity 5695; perf-bench PASS (worst 803ms vs
802ms baseline; 9MB valid keystroke unregressed). verify-rejects: a tsc
Debug.assert crash on 'await using' shapes is counted as ORACLE-CRASH and
skipped (a crashed oracle has no verdict) instead of killing the gate.
Required RULE references failing inside the bar window now mint a zero-width
\$missing row carrying the rule identity (RULE_MISS_BASE + rid in rowStart),
reported as "expected Expr": 'const a = ;' / 'a + ;' / '-;' / 'x ? y : ;' /
'a, ;' / 'f(1, ;' all produce a single tsc-grade diagnostic at the right
offset. Hooks: parseRuleEntry's fail exit (memoized like any result) plus the
three Pratt rhs sites that bypass rule entries (operator LED, prefix NUD,
chain-rhs LED).
Synthesis placement follows COMMITMENT semantics, replacing the flat
probing counter for optionals: an optional group or repetition element may
fail freely while uncommitted (probeBase = its start; 'the optional thing is
absent' / 'the list ends' need no diagnostic), but once it consumes a real
token past that base, missing pieces synthesize — 'const a = ;' commits at
'=' and mints the Expr; rep(seq(',', Expr)) cannot mint a phantom ',' to
keep a list alive, yet after a real ',' the element synthesizes. not() and
separator probes stay absolutely suppressed (pure lookahead). FIRST-token
call-site guards open under recovering (one global read on the strict
guard-fail path): at a bar the next token is exactly what cannot start the
rule, and the hook lives inside parseRuleEntry — 'a, ;' must reach it.
Two latent bugs fixed in passing, both found by the new shapes:
- The frameMax conversion in the previous commit was double-applied at the
12 token-advance sites by a patch-script composition hole (edit #3's
pattern matched text edit #2 had just inserted; the anchor counts were
asserted on the pre-edit source), leaving the nested inner test
unreachable — token consumes never raised the global maxPos, so bars were
minted from a watermark that only memo jumps could move. Equivalence
gates stayed green because both engines ran the same wrong protocol;
the synthesis quality work surfaced it as losing-arm wins. Advances now
pair frameMax/maxPos correctly.
- The memo-jump coordinate refresh read toff(start) unguarded; for a
zero-width row minted AT EOF, start == tokN reads past the token columns
(stale slots from a longer previous document under handle reuse) — the
recovery gate's in-bounds check caught an "expected Expr" at offset 8 in
a 5-char document. The refresh now uses the same EOF guard as offset().
recovery.ts synthesis pins 3 -> 9 (the six nonterminal shapes above, exact
diagnostics + \$missing presence). All gates green: incremental-grammars
672/672, incremental-verify 136 steps, multi-doc 60 + 9/9, recovery
valid/invalid/typing/synthesis, suite 33/33, perf-bench PASS, 9MB fresh
438ms / valid keystroke warm ~0.6-5ms / breaking 649ms / while-broken
438ms / fixing 368ms (broken-state costs are the recorded follow-up).
Typing in a broken 9MB document drops from ~440ms to ~3-7ms per keystroke (avg 3.2ms over a 10-keystroke burst; incremental gate 9.9x vs fresh on its mixed valid/broken sessions). Recovery runs now ADOPT rows from the previous tree again — soundly this time, by making every recovery decision a pure function of the row's window: - recoverArmed takes (from, reach): a hook arms iff THE FAILING ELEMENT is stuck at a bar — its own frame-local probe reach (staged frameMax around hooked-loop elements) sits on the bar. The old form read the GLOBAL maxPos, so a frontier parked on a far bar could arm an unrelated loop whose own probes never approached it — a decision no window can reproduce. The runParse nets pass (pos, maxPos): top-level semantics unchanged. - barsWindowEq: a row adopts in a recovering run iff the bars inside its window [start, reach+2] are IDENTICAL (shifted) to the bars the build run saw there — with position-pure decisions, window text + window bars determine the frame's behavior completely, including losing-arm fires and synthesis. lastBars rides the document register set; strict trees carry [], free-fire trees null (free-fire is not bar-pure - never adopted while recovering). rowRM rows are adoptable under the predicate (the error region itself is what stays stable across far edits), and runExtend re-checks per member. The blanket adoption-off in the bar iteration and the lex-recovered first run is removed; attempt 0 (no bars) adopts exactly where the build run was also bar-free. The changed fire pattern exposed a latent message-derivation bug present in committed code: collectErrRows decoded a \$error row's first kid as a token leaf unconditionally, but the runParse leftover net builds a WRAPPER \$error whose kids are nodes ([partial-root, tail-error]) - (~nodeId)>>>2 indexed a garbage column, docText read text from an unrelated offset, and the two text layers (contiguous string vs pieces) resolved the garbage differently, which is how the gate caught it (equal trees, different messages). Wrapper-shaped \$error rows now fall through to the generic descent so the tail derives its message from its real first token. All equivalence gates green (incremental-grammars 672/672, incremental-verify 136 steps, multi-doc, recovery incl. synthesis pins 9/9), suite 33/33, perf-bench PASS, strict corpus parity intact. 9MB: fresh ~508ms, breaking keystroke ~409ms (the absorbed error region re-parses; recorded follow-up with fix-transition ~395ms), keystrokes while broken 3-13ms.
Recovery attempts within one sequence parse the same token stream under a monotonically growing bar list, so a memo entry from an earlier attempt is provably valid in a later one when its probe window [start, mx+2] contains no bars: no bars means no synthesis and no skip arming, and the opened dispatch guards only add non-consuming probes - the frame behaved strictly, a pure function of the window text. The one exception is the recRunning cycle refusal, which can fire without synthesis (open guards let a ref chain cycle at one position) and depends on which frames are on the stack. recRunning now maps each frame to an entry serial; a refusal leaning on a frame entered before the current one taints the current frame's memo entry (stamped -memoGenCur: reusable only in its own generation, and propagating the taint to whoever reuses it). This is the diagnosed hole that sank the first survival attempt. Survival is edit-side only: the fresh-parse attempt loop calls parseCore, which resets the arena cursor per attempt, so an earlier attempt's rows are clobbered there. A mid-parse '>'-splice disables survival for the rest of the sequence (pre-split positions can't be revalidated). Also removes recFires (dead since the rowExt write-back subsumed the recFires stamp). 9MB transitions: breaking 335ms -> 157ms, fixing 230ms -> 146ms (both now lexer-bound); while-broken typing 3.4ms unchanged. All equivalence gates green: incremental-grammars 672/672, incremental-verify 136, multi-doc 60, recovery pins 9/9, check 33/33, emit-parser corpus parity 401/401.
…liff
The window relex resynced only on exact stack-depth equality, so an edit
that changes paren balance shifts the entire suffix's absolute depth column
and the window regrows to EOF - a 9MB document paid ~130ms of relexing on
every break/fix transition for a one-token depth shift.
The resync now has two sufficient conditions, both proven from observable
state (template stacks empty on both sides; candidate token carries no
cross-token lexer flag a successor reads):
- FAST (O(1)): equal depth and neither lex dipped below it since the
divergence point (damage start) - every open entry is then common to
both lexes, the stacks are content-equal, and every future pop behaves
identically. Trajectory minimums are folded incrementally (old side
seeded from the damage-interior tokens, new side tracked per push).
- SHIFTED: the old suffix never pops an entry open at the candidate
(lazy suffix-min over the old depth records, pop-on-empty = -1): no open
entry's head-ness is ever read again, stack contents are irrelevant, and
the depths may differ by an arbitrary shift. The splice then re-bases the
adopted tkPd column by the shift, restoring true absolute depths ('('
head bits are local facts of their own neighbors and stay valid).
This also closes four latent unsoundness classes in the old equality path:
a resync candidate that is a postfix-ambiguous op, control keyword, '(' or
')' lets the adopted successor read state derived from tokens the window
re-lexed differently; and template-depth equality cannot prove the mutable
interp brace counters equal (resync inside templates now waits for depth
0). Each slides the resync at most a few tokens.
9MB transitions: breaking 157ms -> 5.8ms, fixing 146ms -> 2.9ms; valid
keystroke 1.8ms -> 1.1ms; while-broken typing 3.4ms -> ~2ms. Gates: lexer
parity 5695 diff=0, incremental-grammars 672/672, incremental-verify 136,
multi-doc 60, recovery pins 9/9, check 33/33, corpus parity 401/401,
perf-bench worst 472ms.
trySurgery refused any tree containing recovery rows (rowRM root). It now accepts them when the edit provably commutes with every recovery decision: decisions are position-pure functions of (window text, window bars), so a splice is sound when no bar window touches the damage or the re-parsed span's probe reach - kept rows replay identically at shifted positions, and a fresh recovering parse behaves strictly across the span, exactly like the strict re-parse the surgery runs (a fire inside the span would need a bar at/below the probe reach + 2; prefix attempts use prefixes of the same bar list, so one check against the final list covers every attempt). The spliced tree keeps its bar list with suffix bars shifted by the token delta; bars adjacent to the damage (unmappable) and free-fire trees (lastBars null, not window-pure) refuse. The multi-doc gate immediately caught a latent length bug this exposed: finishNode takes a node's char end from its LAST KID, which a trailing zero-width $missing row pushes past the last real token - but surgery re-derived ancestor lengths from the token columns, clipping that extension. A node whose token end lies strictly beyond the damage now keeps its end shape (rowLen += chrD: every end-determining coordinate sits in the shifted suffix); only nodes ending at/inside the damage use the token derivation (no zero-width row can end them - zero-width rows live at bars, and damage-adjacent bars were refused). Strict trees take either branch to the same value. 9MB while-broken typing now sits at valid-path parity (~1-1.7ms vs ~1ms valid; surgery additionally applies wherever its container shapes allow). Gates: multi-doc 60 + contract 9/9, incremental-grammars 672/672, incremental-verify 136, recovery pins 9/9, check 33/33, corpus parity 401/401.
Two grammar-derived enrichments of the $missing diagnostics, both resolved
at settle from the tree (zero parse-time cost, adoption/replay-safe):
- PAIR_OPEN: for each literal C, intersect - across every seq occurrence of
C with preceding literals in its sequencing scope (groups inlined;
quantifier/alt contents inherit a copy of the scope's accumulator, since
they physically follow its earlier literals; nothing leaks back) - the
sets of those preceding literals. A unique survivor is C's structural
opener: ')' keeps '(' through if/while/call alike, interior separators
intersect away, and ','/':'/'(' themselves die as ambiguous. The closer's
diagnostic then carries related info pointing at the matched opener leaf
found among its earlier siblings ("expected ')'" / "to match this '('"),
with keyword pairs like 'while'<-'do' falling out for free. shiftDiags
shifts the related anchor on its own coordinates (it can sit on the other
side of the damage from its diagnostic - the surgery path caught this).
- Viable-set messages: for a required literal C in a seq, the literals
PROVABLY still accepted when C's matcher fails - repetitions before C are
always re-enterable so their nullable-prefix-reachable literals stay
viable; nullable one-shot items are crossed but contribute nothing (they
may already have consumed). "expected ',' or ']'" therefore never names
an impossible continuation, unlike a static FIRST union (after `[1, 2` an
expression is not viable) - and unlike tsc, which under-reports the same
position as "')' expected". Registered per call site during emission and
threaded through the literal matchers into the $missing row (rowStart
bits 21+; the row is zero-kid, the slot is free), decoded at settle.
cst.errors entries gain an optional related: {offset, end, message} field.
Pins re-pinned (11/11, exact); gates: incremental-grammars 672/672,
incremental-verify 136, multi-doc 60, check 33/33, corpus parity 401/401,
perf-bench unchanged.
test/head-to-head.ts runs one 9MB TypeScript document through identical single-character edit scripts (warm valid keystrokes, a paren-deleting breaking edit, while-broken typing, the fixing edit) on all three engines, with positions recomputed from the current text so every engine sees byte-identical edits and timers wrapping only the engine call. tsc runs setParentNodes=false; node-tree-sitter caps input strings at 32767 chars, so it reads through its 16KB chunk-callback path. Results (node v24, Apple silicon): Monogram beats tsc on every phase (fresh 177 vs 212ms, valid keystroke 0.37 vs 37ms, while-broken 0.21 vs 13.6ms, fixing 1.0 vs 14.1ms) and beats or matches tree-sitter on fresh (177 vs 458ms) and while-broken typing; tree-sitter wins the two transition edits (0.26 vs 13ms breaking), where the strict-first architecture pays one adoption-assisted strict pass to prove rejection before recovering. Numbers + the two byte-identity guarantees added to the README under 'How it measures up'.
test/recovery-conformance.ts: on every single-file conformance test tsc's PARSER rejects (parseDiagnostics non-empty - the live source of the .errors.txt syntax baselines, with semantic noise excluded by definition), compare Monogram's total-parse cst.errors bidirectionally at +/-8 chars: recall (tsc errors we also report): 530/951 = 55.73% precision (our errors tsc also reports): 580/702 = 82.62% first-error agreement: 203/355 = 57.18% files we accept but tsc rejects: 116 The sample divergences localize the gap classes: the accept side is dominated by tsc's context-parameter checks ([Await]/[Yield] parameter positions, reserved names in declaration slots) plus a few CFG-expressible shapes; the missed side is recovery-policy granularity (one absorbed region vs tsc's several pointed diagnostics).
Two syntactic over-accepts found by the diagnostics comparison against tsc:
- parseTemplateExpr (both engines) treated a template HEAD as committing to
nothing: on EOF or any non-middle/tail token after a substitution it
closed the $template node and returned success, so 'let s = `tpl ${x;'
parsed clean. A head now commits to the full chain - every substitution
must hold an expression and every span must continue (middle) or close
(tail); an unterminated template is a parse failure, not a shorter match.
Also rejects empty substitutions ('`${}`'), matching tsc.
- notReservedExpr gains 'case': the bare-identifier expression fallback
accepted the reserved word, so 'switch (x) { case 1 y(); }' parsed as
three statements through the switch body's Stmt arm (the flat
many(SwitchCase) shape made the missing ':' invisible).
A full accept/reject flip scan over the single-file conformance corpus
shows exactly ONE flip: TemplateExpression1.ts (an intentionally-invalid
error test tsc rejects) now correctly rejects - no valid file regressed.
Error-recovery conformance recall 55.7% -> 59.1%; check 33/33, engine
parity 401/401, all 7 generated outputs byte-identical.
TOTAL-PARSING.md: the formal spine in one place - the totality contract, strict-first two-pass structure, the bar discipline with its determinism theorem (bars are a pure function of the token stream, forcing every ingredient to be adoption-invariant), position-pure recovery actions with commitment semantics, the three structural theorems the generative gates forced (zero-width = synthesis-only; same-position cycles and their taint refinement; exact adoption-invariant watermarks), the window-replay theorem with its three corollaries (recovering adoption, cross-attempt memo survival, recovering surgery) and the one known open caveat (row-level taint), the two lexer-resync soundness conditions, tree-derived diagnostics, and the measured head-to-head numbers. test/exhaustive-edits.ts (CI gate 34/34): over a small bracket-and-list grammar, EVERY document up to 4 chars over the grammar's alphabet x EVERY single-character edit (delete/replace/insert at every position) must parse byte-identically to fresh - tree and errors. Complete within its bound: ~330k steps (EXH_MAXLEN=5 runs the 3.2M-step deep version, also clean). The gate immediately earned its keep: it caught a one-case regression in the day-old surgery length update - a node whose BASE token sits at the damage start (leading trivia inserted at a node's very start) shifts base and end together, leaving the length alone, so rowLen += chrD was wrong exactly where the token derivation is right. keepEnd now also requires the base token to sit strictly before the damage.
Phase-timing the head-to-head's 13ms breaking edit: the strict-fail pass is 0.35ms and the recovery attempts 0.6ms - the cost is lexer-layer suffix bookkeeping on the bench's first-touch 4.5MB cursor jump (a one-time suffix-min allocation plus EOF-relative re-basing of the token columns across the jump). Repeated break/fix transitions at one cursor position settle to ~2ms. README and TOTAL-PARSING.md now say so instead of blaming the strict-first pass.
rowRM becomes bitwise: bit 1 keeps the structural error containment the
diagnostics walk descends; bit 2 marks a CONTEXT-TAINTED result - a frame
whose parse leaned on the cycle sentinel finding an ancestor (its outcome
is a function of the ancestor stack, not the text). The memo stamp alone
only protected the entry; the row adoptSeek can find was still reusable.
Tainted rows now also refuse recovering adoption and run extension,
closing the open caveat documented in TOTAL-PARSING.md. Strict adoption
already required rowRM === 0 and is unchanged.
notReservedExpr gains 'class': a valid class expression always out-matches
the bare-identifier fallback under longest-match, so forbidding the
fallback only rejects broken classes - 'const k = class extends D ;' with
no body parsed as three statements. A zero-flip accept/reject scan over
the whole single-file conformance corpus proves no valid shape regressed;
'extends' stays OUT - it is load-bearing for tsc's tolerated heritage
shapes ('interface I extends { }', 'extends A extends B', 'extends
Foo?.Bar' are all parse-accepted by tsc through the fallback, measured).
Gates: 34/34, corpus parity 401/401, generated outputs byte-identical,
transitions unchanged (~6ms first-touch, ~2ms steady).
The shifted lexer resync's dominant case is a depth-0 candidate (statement boundary), where 'the old suffix never pops an entry open at the candidate' collapses to 'no pop-on-empty beyond the candidate'. The lexer now records the token indices of ')' pops that found an empty paren stack (an ascending doc-level list, almost always empty - a stray closer beyond balance), recomposed by the window splice, shifted by the '>'-split, and persisted on the document register set. The depth-0 check is then one end-of-list comparison instead of an O(suffix) minimum build; only depth > 0 candidates (e.g. the fixing direction of a broken document) still build the suffix minimum, lazily once per edit. Steady-state breaking transitions on 9MB drop ~2.1ms -> ~1.6-1.9ms; the profile now reads strict-fail 0.23ms + attempts 0.46ms + spread bookkeeping, with the raw 7-column suffix memmove measured at 0.07ms - no storage floor in the way. README/TOTAL-PARSING tables refreshed from a fresh head-to-head run, with the cursor-jump amortization stated as what it is (a far jump pays once, proportional to distance; local typing never rewrites the suffix). Gates: 34/34, lexer parity 5695 diff=0, incremental-grammars 672/672, corpus parity, perf-bench under ceiling.
notReservedExpr grows by the statement keywords with no expression role:
break, continue, debugger, do, else, finally, for, if, return, switch,
try, while, with. Bare 'if' parsed as an identifier expression, which let
'namespace if {}' (the namespace arm correctly fails its notReserved name)
fall apart into three accepted identifier statements - the same fallback
family as 'case'/'class'. 'var' stays OUT: tsc parse-accepts 'for (var of
X)' through shapes that need it.
Blocking 'for' exposed a real grammar gap the fallback had been MASKING:
'for (a in b[c] = b[c] || [], d)' previously parsed as a CALL of the
identifier 'for' (the for-statement arm failed, the call parse won). The
for-in OBJECT is a full Expression - comma included - so both ForHead
in-arms gain many(',', Expr); for-of keeps a single AssignmentExpression
(tsc rejects 'for (x of a, b)', and so do we, where we previously
mis-accepted it through the call fallback).
Per-flip tsc verdict over the whole single-file conformance corpus:
7 flips, ALL toward tsc, 0 away. Error-recovery conformance recall
59.1% -> 61.2%, first-error agreement 57.5% -> 59.7%, we-accept files
115 -> 108. Gates 34/34, corpus parity 401/401, tree-sitter generate
clean on all 4 affected grammars, gate:treesitter 96.0%.
The 108 remaining accept-divergences split into the [Await]/[Yield] context class (31 files - needs exclude()-style identifier-text context threading in the engine) and 77 per-shape strictness items, each named with its fix recipe (fix + flip-scan FN=0 proof).
…reject ClassMember modeled decorators as a STANDALONE sibling alternative, which tolerated an orphan '@dec' with no member and (together with the modifier-named-field fallback) any decorator/modifier interleaving. Decorators are now a prefix of the member shape ([many(DecoratorExpr), many(Modifier), ...]) in both grammars, with the static-block arm taking the same prefix ('@dec static {}' is parse-clean for tsc - the decorator there is a semantic error only). Cumulative flip-scan with per-flip tsc adjudication: 7 toward tsc, 0 away (the first attempt rejected the decorated static block - tsc accepts it - and the scan caught it). The 'public @dec method()' sub-case still parses through the modifier-named-field fallback; matching tsc's greedy modifier commitment there needs the fallback's bare-name arm split, recorded in the ROADMAP item. Gates 34/34, corpus parity 401/401, tree-sitter generate clean on all 4 affected grammars, gate:treesitter green.
tsc's measured rule: '@' directly after a property on the SAME LINE binds
to that property ('Decorators must precede the name and all keywords of
property declarations') - 'x @dec y()' and 'x = 1 @dec y()' parse-reject,
while 'x; @dec y()' and a newline before '@' accept. Encoded exactly: the
field tails' no-';' ending carries not([sameLine, Decorator]) in both
grammars (alt([';'], [not([sameLine, Decorator])])). This also closes the
'public @dec method()' shape: the bare 'public' field reading now refuses
the same-line decorator, and the modifier reading correctly fails.
not() now accepts an array as a seq, like everywhere else in the rule DSL
(the NotNode conversion previously threw on arrays).
Cumulative flip-scan with per-flip tsc adjudication: 12 toward tsc, 0
away. Gates 34/34, corpus parity 401/401, tree-sitter generate clean x4,
gate:treesitter green.
The windowed-relex resync aligned candidates on kind/text/offset/end but NOT on the token's flags - yet the gap BEFORE the candidate can sit inside the edit: inserting '42' into '}\n privat' leaves every token byte identical from the candidate on while removing its preceding newline. The old token was adopted with a stale newlineBefore, and anything reading the flag downstream (sameLine assertions, comment-aware folds) diverged from a fresh parse. Found by delta-debugging an edit/fresh divergence to a 690-char repro and diffing full streams including flags; the leaf tilings were identical, which is why tree comparisons alone never caught it. The window lex has already computed the candidate's true flags when the resync fires (it lexed the gap), so the fix is one equality in the resync condition: the pushed candidate's flags must match the old token's. A mismatch just keeps lexing - the next candidate's gap lies beyond the edit, so the flags converge and the regrow terminates. Gates: 34/34, lexer parity 5695 diff=0, incremental-grammars 672/672, corpus parity 401/401.
Lands the full measured tsc class-member ruleset (probes 12/12, flip-scan
3-toward/0-away on top of the decorator-prefix + sameLine work already in):
- class-field ASI: a ';'-less field allows only a same-line '}' — 'x y',
'x = 1 y = 2', 'var x = 1;' parse-reject; newline / ';' / '}' accept.
Tail generalized to alt([';'], [not(sameLine)], [not(not('}'))]).
- modifier-vs-name: a modifier keyword followed by '('/'='/':'/';'/'?'/
'!'/'<'/'{'/'}' is the member NAME, not a modifier ('public() {}',
'static = 1', 'public public() {}').
- parse-tolerated member modifiers: declare (real), export/in/out
(semantic errors tsc's parser accepts) — 'export Foo;', 'in a = 0;'.
- accessors take optional type params ('get x<T>()' parses).
- static-block arm takes a modifier prefix ('async static {}').
The blocker was gen-cst-match: it drops parse-time not() guards and emits
GREEDY repeats, so [many(Modifier), 'static', Block] was destructurer-
ambiguous — the modifier-repeat swallowed the 'static' keyword leaf the
literal needed, and every static block failed to match. Fixed at the root:
a greedy loop / non-required optional now leaves at least minKids(suffix)
children for the required steps that follow it (threaded across nesting).
Proven a no-op on the parser's own trees — count + suffix-consumed = cc and
suffix-consumed >= minKids, so the cap cc-minKids never cuts below the
parser's actual count; it only blocks over-consumption a dropped guard used
to prevent. Verified: generated matchers byte-stable on all 7 grammars
before the recipe (cst-match-totality green), total after.
The js/jsx tmLanguage shift (async/accessor between storage.modifier
buckets) is scope-gap-NEUTRAL (95.7% correct / 77.0% exact / +5.1pt gap,
byte-for-byte identical before/after); ts/tsx tmLanguage unchanged.
Error-recovery conformance: recall 61.2% -> 62.4%, first-error 59.7% ->
62.3%, precision 82.7% -> 83.4%, we-accept 103 -> 100. Gates 34/34, corpus
parity 401/401, tree-sitter generate clean x4, gate:treesitter 96.0%.
tsc parses an interface with REPEATED extends clauses
("interface I extends A extends B {}") — the parser accepts them, the
checker reports the duplicate. Mono's single opt('extends', sep(Type,','))
clause rejected the second extends, so the construct only "parsed" by
splitting into garbage statements. many('extends', sep(Type,',')) mirrors
tsc and produces the correct interface-with-heritage tree
(parserInterfaceDeclaration1-4, interfaceThatInheritsFromItself).
Accept-neutral on the corpus (the split path already accepted these),
gates 34/34, corpus parity 401/401, gate:treesitter 96.0%; also a
prerequisite for statement-level ASI (Task #24), which otherwise rejects
these as a mid-line split.
…l augmentation
tsc's parser accepts a leading modifier before any declaration (the checker
rejects invalid combinations); mono only had piecemeal opt('async') before
function and opt('abstract') before class, so "async class C {}" /
"abstract interface I {}" only "parsed" by splitting into garbage
statements. A modifier-prefix arm [alt('async','abstract'), Decl] tried
after the dedicated arms now produces the correct modifier+declaration
tree while leaving valid "async function" / "abstract class" flat.
Also adds the two declare forms mono was missing: ambient module shorthand
"declare module \"foo\";" (no body — the module arm requires braces) and
"declare global { ... }" (global-scope augmentation; global is a
contextual-keyword block, not a namespace name).
Accept-neutral on the corpus (the old split path already accepted these
invalid-but-parseable shapes), gates 34/34, corpus parity 401/401,
gate:treesitter 96.0%. Value is CST correctness for these constructs and
as prerequisites for statement-level ASI (Task #24) — though that lever
remains a large multi-area round (measured whack-a-mole: with these
companions in place, ASI still leaves ~19 distinct tsc-accepted shapes it
breaks across regex/divide, unique-symbol, import-type-args, protected,
comma-operator, etc., so it does not land incrementally).
… false-rejects The modifier-prefix arm accepted only async/abstract before a declaration, so tsc-clean files leading with another modifier on a declaration (protected class, public interface, static interface, accessor class) were outright rejected — not even split-parsed, since protected/public/etc. are not expression starts. tsc's parser accepts any modifier before any declaration (the checker rejects the invalid combination). Widen the prefix to async/abstract/public/private/protected/readonly/static/ override/accessor. Measured over the single-file conformance corpus: false-rejects (tsc-parser-clean files mono throws on) drop from 19 to ZERO — mono now parses every tsc-clean single-file conformance test. Additive and over-accept-neutral: we-accept stays 100, recall 62.4%, gates 34/34, corpus parity 401/401, gate:treesitter 96.0%.
…essors
Two more tsc-clean shapes mono outright rejected (false-rejects):
- "class C { static const H = 1; }" — tsc parses const as a (semantically
invalid) member modifier; add it to the class-member modifier set, where
the not()-followed-by-name-token guard still treats "const = 1" as a
member NAMED const.
- "var v = { get foo() }" — an object-literal accessor with no body parses
in tsc (error recovery); the accessor body becomes opt(Block).
Both additive and over-accept-neutral: compiler-corpus false-rejects drop
28 -> 24, conformance stays 0, we-accept stays 100, recall 62.4%, gates
34/34, corpus parity 401/401, gate:treesitter 96.0%.
tsc parses index signatures more leniently than mono did (the missing
annotations/commas are checker errors): a class index signature without a
value type ("class C { [x: string]; }") and a trailing comma inside the
bracketed params of a class or type-literal index signature ("type A = {
[key: string,]: string }"). Class index-sig value type becomes optional
with an opt(',') param tail; the type-literal index branch gains the same
opt(',').
Additive, over-accept-neutral: compiler-corpus false-rejects 24 -> 21,
conformance stays 0, we-accept 100, gates 34/34, parity 401/401,
gate:treesitter 96.0%.
…lenient
tsc's PARSER accepts await/yield as binding identifiers even inside an async/generator
body (`async function f(){ let await = 1 }`, `function* g(){ function yield(){} }`) —
the "reserved word" rule there is a checker diagnostic, not a parse error. Only at
EXPRESSION position does tsc reject, because `await` must be the operator and so needs
an operand (`await;`, `await =>`, `a = await` -> "Expression expected").
The earlier fork made `notReserved` (the binding guard) reservable too, which
false-rejected those lenient bindings. Drop that: only `notReservedExpr` (the
expression identifier-NUD guard) carries the [Await]/[Yield] reservation, and the
single-identifier arrow parameter now guards with `notReservedExpr` so `await => x`
rejects in an await context via the same operator-needs-operand path tsc uses (it
parses the arrow head as an expression first), while `let await`/`var yield`/named
`function yield(){}` parse everywhere.
Bidirectional over the single-file conformance corpus: false-rejects of tsc-accepted
files drop (the await/yield-binding FN, asyncOrYieldAsBindingIdentifier1, is gone);
over-accepts unchanged (they were always expression-position). recovery-conformance
recall 66.35%, first-error 69.58%, we-accept 73. Gates 34/34, parity 0/0/0, 96.0%.
… async
Class members and object-literal properties now route method params/bodies to their
[Await]/[Yield] family instead of leaking the enclosing context: plain methods,
constructors, accessors and field initializers reset (a method body has its OWN,
non-inherited context — the spec's implicit function boundary), generators yield,
async await, async-generators both. A computed key `[e]` stays OUTSIDE the family (it
is evaluated in the enclosing context), so `class C { [await](){} }` inside async still
rejects while the method bodies don't.
`async` is pulled out of the member modifier soup into dedicated arms (the class analog
of the Decl/arrow fix) so the body gets its await context — but tsc parses `async` as an
ORDER-FREE modifier (`async static m`, `override async m`, `async get x`, `async static
{}` all parse, the checker validates), so each async arm carries its own inner
many(Modifier) run and there are async-accessor / async-static-block arms. The `static`
modifier's `not('{')`-style guard keeps `async static {}` parsing the block, not eating
`static` as a modifier.
This closes the class-body context leak: `async function f(){ class C { m(){ await; } } }`
and `{ x = await }` field initializers now parse (method/initializer reset), matching
tsc's parser; over the single-file conformance corpus the await/yield false-rejects are
gone (FN drops to 2 pre-existing externalModules import-feature cases, unrelated). Async
methods reject `await;`/`await =>` like async functions do.
recovery-conformance unchanged at recall 66.35%, first-error 69.58%, we-accept 73 (the
method await cases were never in the single-file set). Gates 34/34, parity 0/0/0,
byte-identical generated outputs, tree-sitter generate clean x4, gate:treesitter 96.0%.
The random mutator only hits an async/generator toggle by luck, yet that edit is the whole reason the context is a build-time name-fork rather than a runtime flag: flipping `async`/`*` on an enclosing function changes its body's RULE IDENTITY (Block -> Block$A/$Y/$AY), and a runtime flag read by core() but absent from the reuse key would let a stale cross-family row survive. This adds a scripted edit class over hand-authored async/generator documents — drop/re-add `async`, drop a generator `*`, edit an async arrow's params, a yield operand, a class method's async/`*` — interleaved with a surgery-path in-body keystroke, asserting each stays edit≡fresh + self-consistent. 706/706 steps equal+consistent across all 7 grammars: the name-fork preserves the window-replay theorem verbatim under exactly the edits it exists to survive.
A `using` / `await using` declaration binds a plain BindingIdentifier, never a pattern. UsingBinding replaces the pattern-allowing Binding/ForBinding in the using arms, so `using [a] = null` falls through to the expression `using[a] = null` — which is exactly how tsc reads sync `using` in statement position (it is a contextual identifier there), so the tree now matches instead of minting a bogus using-declaration with a pattern. The `await using [a]` parse-error tsc reports is NOT cleared by this alone: it is statement-ASI-gated — mono still splits `await using` off `[a] = null` into two statements (the Task #24 gap), so the over-accept stands until the ASI round, which this identifier-only binding is a prerequisite for (the await-using arm must reject the pattern once ASI stops the split). Accept-neutral: recovery-conformance unchanged (we-accept 73, recall 66.35%, first-error 69.58%), 34/34, parity 0/0/0, tree-sitter clean, gate:treesitter 96.0%.
UsingBinding cleared no over-accept — `await using [a] = null` over-accepts via STATEMENT-SPLITTING (the ASI gap, #24): mono splits `await using` off `[a] = null` into two statements regardless of the binding shape (proven: `x using [a]` splits the same way). So an identifier-only using binding only shuffles trees tsc rejects anyway, and it introduced a tree-sitter GLR conflict (`using x: T <` vs a generic type) — 9c04bc0 committed the stale grammar.js because the `tree-sitter generate` failure was swallowed by the `|| echo FAIL` in the gate chain. The identifier-only using binding + an `await using [` ExpressionStatement commit guard are the correct fix, but they only clear the over-accept once ASI stops the split, so they belong with the ASI round (#24), not as a standalone companion that adds a GLR conflict for zero acceptance gain. Restores Binding in the using arms; 34/34, parity 0/0/0, tree-sitter generate clean x4, gate:treesitter 96.0%.
… new false-rejects
The TS statement terminator becomes asi() = alt([';'], [not(sameLine)], [not(not('}'))])
on every Stmt-level arm (var/let/const, return, throw, break, continue, debugger, using,
expression statement): a statement may end only at ';', a line-terminator before the next
token, or a closing '}'. A same-line non-';'/'}' token can no longer terminate it, so the
mid-line splits mono used to accept by exploiting the optional ';' (`var x = a[]` split
into `var x=a` + `[]`) now stay one statement and reject like tsc.
asi alone false-rejects every tsc-clean construct that legitimately continues a statement
without a ';'. A multi-agent workflow mapped the full set (41 single-file conformance
cases) to 11 companions — each a MISSING production asi merely EXPOSED (base only
"accepted" them via the same split it removes), so every fix lives in the arm asi
exposed, never in asi() itself:
- per-specifier `type` modifier on import/export specifiers, with tsc's multi-token
`{ type as as B }` / `{ type type as foo }` disambiguation
- `export type *` / `export type * as ns from` + ModuleExportName namespace alias
- `import type X = require()` (type-only import-equals; two arms so `import type = …`
keeps `type` as the binding name)
- interface heritage via the shared heritageClauses helper (implements / `extends Foo?.Bar`
/ empty `extends {` / self / repeated)
- leading modifier soup before var/let/const/using (mirrors the decorator-prefix arm)
- nested `new new Foo()()` (recursive NewTarget; + ['new_target'] tree-sitter conflict)
- `export as namespace X` + `export default interface`
- `import<T>(...)` instantiation expression
- regex flag tail = maximal-munch IdentifierPart run (tsc lexes flags leniently)
- non-null `!` is a restricted (no-line-break) postfix, like `++`/`--`
- `unique` as a general prefix type operator (`unique <Type>`)
The workflow's const/var->notReservedExpr companion was MEASURED net-negative (it
regresses `for (var of X)` + `[...x = a]`, both tsc-parse-clean) and dropped; its lone
target (importWithTypeArguments) is covered by the import<T> arm instead.
recovery-conformance: we-accept 73 -> 50 (-23 mid-line-split over-accepts), recall
66.35% -> 69.82%, first-error 69.58% -> 74.37% (precision dips 84% -> 67% as mono now
REPORTS errors on the 23 newly-rejected files at a coarser granularity than tsc — the
known recovery-granularity gap, not new false-rejects: bidirectional FN stays 2, both
pre-existing externalModules import-feature cases). recovery.ts VALID fixture swapped
parserRealSource7 (a tsc PARSE-ERROR file that only passed via the split bug) ->
parserRealSource12. Gates 34/34, parity 0/0/0, tree-sitter generate clean x4,
gate:treesitter 96.0%.
…t type-args Two CFG/lexer-landable over-accepts from the 50-file triage (workflow mapped 49 landable / 2 semantic-ceiling): 1. numeric-literal-lex (10 files): a decimal integer part is a single `0` or a `[1-9]`-led run — `0` immediately followed by a digit (legacy octal `0123`, leading-zero `09`) is not a decimal literal. intPart='0' lets the trailing digit trip numericTailGuard so the token fails and the total lexer rejects it (tsc's scanner behavior). fracTail/expTail/ BigInt keep `digits` (leading zeros legal: `0.012`, `1e007`, `0n`); radix tokens untouched. `0`, `0.5`, `0e1`, `1_000`, `0x1f` stay valid. 2. type-arg-sameLine (1 file): generic type-argument application `T<A>` is newline- sensitive — `T\n<A>` rejects, mirroring the existing `[$, sameLine, '[']` / `!` postfix type arms. recovery-conformance: we-accept 50 -> 39, recall 69.82% -> 72.77%, first-error 74.37% -> 77.75%, precision ~stable. Bidirectional FN 0 (handle API). Gates 34/34, parity 0/0/0, tree-sitter generate clean x4, gate:treesitter 96.0%.
…g separator
Four more CFG-landable over-accepts from the 50-file triage:
- `let [` at statement start commits to a LexicalDeclaration (added to the
expression-statement lookahead guard), so a bad `let [...]` head rejects instead of
parsing as `let`-indexed expression.
- `new <T>Foo()` rejects: a `<` may not directly follow `new` (the operand is a
MemberExpression) — `not('<')` on the `new` arms; post-callee `Foo<T>()` type-args stay.
- a labeled-statement / for-binding-property label is `notReserved` (a reserved word can
never be an Identifier-slot label).
- a class index-signature ends with the asi() member terminator (`; / newline / }`),
not a bare optional `;`, so a same-line adjacent member rejects.
(The type-literal member separator was tried in the same asi() shape but REVERTED: it
regresses `var x: { private y: string }` — tsc reads `private y` as two lenient members
with no separator, which requires TypeMember modifier support, a separate change.)
recovery-conformance we-accept 39 -> 36, FN 0 (handle API). Gates 34/34, parity 0/0/0,
tree-sitter generate clean x4.
…ntextual name)
A type-parameter NAME guards through `notReserved, Ident`. `in` LEXES as an Ident, so an
un-guarded Ident wrongly accepted it as the name — but `in` is a reserved word there:
tsc rejects `<in>` / `<in in>` / `<out in>` / `<in = any>` ("'in' is a reserved word that
cannot be used here") while accepting `<out>` / `<out out>` / `<in out>` (out is a
contextual keyword, a valid name) and every modifier use (`<in T>` / `<in out T>` /
`<const T>` — `in` stays a variance modifier). Guards all three TypeParam arms (the
modifier-soup arm's name too, since `many(mod)` greedily eats trailing `in`s).
test/refactor-guard.ts had codified the old over-accept: its SHOULD-PASS `tp name-in
default` = `interface I<in = any> {}` is a tsc PARSE ERROR — corrected to the valid
`out` analog `interface I<out = any> {}`.
recovery-conformance we-accept 36 -> 35, FN 0. incremental-grammars 706/706 (the tripwire
that rejected the super-primary attempt — this one keeps edit≡fresh). Gates 34/34, refactor
-guard 112/112, tree-sitter generate clean x4.
`npm run check` ran its 35 gates strictly serially (execFileSync in a for-loop), so the wall-clock was the SUM of every gate. Each gate is an independent subprocess that emits its own parser and reads its own corpus, sharing no mutable state and writing DISTINCT /tmp/emitted-*.mjs files — so they parallelize safely. A (cpus-2)-wide worker pool turns the wall-clock into ~max(sum/pool, slowest-gate): measured 19.4s (was minutes), now bound by the single slowest gate (exhaustive-edits ~18s). Results stream as each finishes; the final pass/fail summary prints in gate order and the exit code is unchanged.
Three tsc parse-errors Monogram was accepting, each removed by matching tsc's
PARSER shape (not its checker):
- this-param: bare `this` / `this: T` only. `this?`, `this = 1`, `this: T = 1`,
and any decorated/modified `this` (`@dec this`, `public this`) are parse errors.
The dedicated arm now owns every `this`-param; the plain-name arm excludes it.
- class heritage head: reserved-guarded (notReservedExpr). `extends void/typeof/
delete/enum/case/throw {}` is "Expression expected", while `this`/`await`/`yield`/
identifiers stay valid bases.
- constructor: an identifier `constructor` member must be a call signature.
`constructor;` / `= 1` / `: T` (even modified) reject; TypeParams parse but `?`/`!`
do not; string / #private `constructor` and `get constructor()` stay valid.
we-accept (tsc-rejects that we handle clean over the conformance corpus): 35 -> 30,
no new over-accepts. 34/34 check gates, incremental == fresh 706/706, tree-sitter 96.0%.
`{ a: T b: U }` (two members, same line, no separator) was accepted; tsc rejects
it ("';' expected"). Object-type members are SEPARATED by `;` / `,` / a newline —
the type analog of statement ASI. The member loop's terminator becomes
`alt([';'], [','], [not(sameLine)], [not(not('}'))])`: explicit `;`/`,`, a newline
before the next member, or the closing `}` ahead (last member needs no trailing
separator). Same-line back-to-back members now reject.
we-accept 30 -> 29. 34/34 check gates, incremental == fresh 706/706,
tree-sitter generate STATE_COUNT 9783 (== baseline, no blowup), gate 96.0%.
Three tsc parse-errors removed, plus a new parser/tree-sitter divergence primitive
(tsRelax) so the parser-correct forms don't inflate or explode the tree-sitter GLR table.
- type predicate `x is T`: parse-legal ONLY as a function/method/accessor/arrow/fn-type
RETURN type (tsc rejects it in var/param/property annotations, casts, type args, union
members, …). Pulled out of the general Type into a return-position ReturnType; the
predicate subject is an identifier or `this`, the target an ordinary (non-predicate)
type so `x is y is z` rejects. (`asserts x` stays in the general Type — tsc's parser
accepts it everywhere.)
- duplicate `static`: an at-most-one-`static` modifier run (the only repeated modifier
that is a tsc PARSE error, vs `public public` / `readonly readonly` which parse).
- tuple elements: comma-SEPARATED — `[A B]` / `[A\n B]` reject ("',' expected"; unlike
object types a newline does not separate tuple members).
tsRelax(strict, relaxed): a transparent `group` carrying a tree-sitter-only `tsRelaxed`
rendering. The parser and every generator use the strict form; gen-treesitter renders the
relaxed one. The split at-most-one-static run explodes tree-sitter's GLR (25min); a
return-only predicate at ~18 slots ~2x'd its generate. With tsRelax the highlighter keeps
its status-quo shape (predicate in the general type, plain `repeat(Modifier)`) — STATE_COUNT
stays 9783 (== baseline) — while the parser enforces the strict rule. The transparent group
also keeps a normal return type a bare Type node, leaving AST lowering / cst-match intact.
we-accept (tsc-rejects we handle clean over the conformance corpus): 26 -> 22. 34/34 check
gates, incremental == fresh 706/706, tree-sitter 9783 states / gate 96.0%.
); land super ROOT CAUSE of #47: incremental re-parse diverged from a fresh parse the moment any arm was added to the Expr NUD alternative — `incremental == fresh` is a correctness contract, and it broke on error-recovery. `barsWindowEq` asserts a recovering frame's behavior is determined by its window text + window bars (position purity), so an adopted recovery-made subtree replays identically. That is FALSE for the SYNTHESIS hooks: `missRule`/`missTok` fire only when `pos > probeBase` — they also depend on the ambient COMMITMENT context (probeBase), which is non-local (inherited from the caller) and is in neither the memo key nor barsWindowEq. Concretely, in a broken array literal `['[', many(opt($),','), opt($), ']']`, a fresh parse derives an inner call's argument `Expr@p` under the call's COMMITTED probeBase, so at a bar it fires the missing-nonterminal hook and SEEDS the per-position memo with a `$missing`; the array's following trailing `opt($)` at the same position memo-jumps to that seed and inherits it. Incremental ADOPTS the whole call subtree, skipping that interior derivation, so the memo is entered only later under an UNcommitted probeBase where synthesis is suppressed — it settles on failure and the trailing `opt` yields one fewer `$missing` than fresh. FIX (src/emit-parser.ts, parseRuleEntry adoption gate): re-derive — do not adopt — a recovery-made row (rowRM != 0) whose END coincides with a recovery bar (`missAt(start + rowTokLen)`). Synthesis fires only AT a bar (recoverArmed), so a bar at the row's end is exactly where a following sibling's list-element/optional synthesis reads the per-position memo the skipped interior would have seeded under commitment. Re-derivation is byte-identical to fresh; non-bar-ending recovery-made rows still adopt (≈42% of recovery-made adoptions in the gate), so the "broken-state edits go incremental" feature (commit 2245f0b) is preserved, not reverted. Valid inputs never reach this path (strict pass succeeds; recovering is false), so the byte-identical corpus and conformance gates are untouched. (The fully-structural fix — making the synthesis hooks position-pure by dropping probeBase from what gets memoized — is a larger recovery-commitment-model redesign, tracked separately on #47.) This unblocks Expr-atom over-accept fixes that change the NUD alternative. First one landed: `super` as a CONSTRAINED primary — must be immediately followed by a call `(args)`, member `.name`/`.#priv`, or element `[expr]`; bare `super`, `super<T>()`, `super?.x`, a super-tagged- template, and `super = …` are tsc parse errors and now reject (14/14). we-accept 22 -> 19. 34/34 check gates, incremental == fresh 706/706 WITH the super arm present (previously failed) and without it, tree-sitter 9815 states (+32 for super) / gate 96.0%.
ECMAScript AssignmentTargetType, enforced at parse time by tsc: the operand of a prefix
`++`/`--`, a postfix `++`/`--`, and the target of `=`/compound-assignment must be a
LeftHandSideExpression (identifier / member / element / call / paren / `this` / non-null
`!`) — NOT a unary (`-`/`!`/`typeof`/`void`/`delete`/`await`), a prefix-update, or a
postfix-update. So `++-x`, `++!x`, `++typeof x`, `++void x`, `++delete x.y`, `++ ++x`,
`++await x`, `x++ ++`, `++x--`, `x++ = 1`, `-x = 1`, `++x = 1` are all parse errors.
New precedence markers (grammar DATA, language-agnostic — distinct from the narrower
`noUnaryLhs('**')`): `lhsTarget(...)` for the assignment level, `prefixTarget('++','--')`,
`postfixTarget('++','--')`. They set `requireTarget` on the PrecOperator; both engines
(emit-parser + gen-parser) gain a generic shape predicate that rejects when the operand's
HEAD child is an operator-tag leaf in prefixOps (prefix-unary OR prefix-update `++x`) or its
TAIL child is an operator-tag leaf in postfixOpValues (postfix-update `x++`). A parenthesized
cover / member / element / call / non-null tail produces no operator-tag leaf there, so
`(x++) = 1`, `x.y = 1`, `(-x)++`, `a = b = c`, `x!.y = 1` pass. Literal targets `++1`/`1++`/
`1 = 2` stay accepted (a CHECKER error in tsc, not a parse error). A recovery-synthesized
$missing operand has no children, so the predicate returns false — recovery is not falsely
rejected.
we-accept 19 -> 13. Parser-only change (no tree-sitter/highlighter impact). 34/34 check,
incremental == fresh 706/706, tsc-matrix probe 0 mismatches both engines.
An ArrowFunction is the LOWEST-precedence ECMAScript AssignmentExpression, so it can be
neither the operand of a binary/logical/conditional operator nor an assignment target:
`() => {} || a`, `() => {} ? a : b`, `a || () => {}`, `a = () => {} || b` are tsc parse
errors (write `(() => {}) || a`).
New grammar-DATA marker `capExpr(below, …)` → a transparent `group` carrying `capBelow`:
a NUD parses only when the enclosing Pratt minBp is LOOSER than the named connector's
binding power (refused as a tighter operator's operand, e.g. `a || () => {}`), and once it
wins the led loop is skipped (`() => {} || a` leaves `|| a` unconsumed → reject). Applied as
`capExpr('?', …)` to the four arrow arms. The arrow body becomes `alt(Block, $)` (Block
FIRST) — the spec's ConciseBody `[lookahead ≠ {] AssignmentExpression | { FunctionBody }`:
`{` after `=>` is a function body, not an object literal absorbing a trailing `|| a`.
CAP PROPAGATION (both engines): an operator whose RHS is a capped arrow is itself capped —
`a = () => {}` admits no further led, so `a = () => {} || b` and `a = b = () => {} || c`
reject (the `||` would otherwise bubble to the outer loop onto the assignment). A
module-level `_prattCapped` flag, reset per pratt entry, set on a capped return, read by the
operator LED right after its RHS; `return lhs` keeps it set so enclosing operators refuse it.
await-yield-fork fix: the fork rebuilt `group` nodes keeping only `suppress`, which dropped
the new `capBelow` (a parser-read marker, like suppress) — now preserved. (tsRelaxed is
gen-treesitter-only and the post-fork grammar is the parser's, so it stays correctly dropped.)
`() => a || b`, `cond ? () => 1 : () => 2`, `f(() => 1)`, `x = () => 1`, `() => {}` (block
body), `a = b || c`, `x as T || y` all still parse (0 false-negatives). we-accept 13 -> 11.
34/34 check (engine parity emit ≡ interpreter), incremental == fresh 706/706, tree-sitter
9815 states / gate 96.0%. (Residual, not corpus: `cond ? a : () => {} || c` — an arrow in a
conditional BRANCH that is the LHS of `||`; needs an LHS-ends-in-capped-arrow check.)
…rivate name Two clean ECMAScript grammar rules tsc enforces at parse time: - `new` is always followed by a target (NewExpression / MemberExpression `new`), so `new` is not a bare expression. A dedicated `new.target` arm (the one meta-property form) is added and `new` is excluded from the bare-identifier NUD — otherwise a failed `new T` arm (e.g. on the leading `<` of `new <T>Foo()`) let `new` slide in as an identifier and the text reparsed as the comparison `(new < T) > Foo()`. `new Foo()`, `new a.b.C()`, `new Foo<T>()`, `new new Foo()()`, `new.target`, `new.target.name` stay valid. - an optional chain `?.` may not contain a private identifier (`a?.#x` / `this?.#b` is a tsc parse error "An optional chain cannot contain private identifiers"), so PrivateField is removed from the `?.` member alternative in the expression and decorator chains. A NON-optional `a.#x` (the `.` led) stays valid, as do `a?.b` / `a?.[i]` / `a?.()` / `a?.\`t\``. we-accept 11 -> 9, 0 false-negatives (verified vs tsc on 13 `new` + 11 `?.` forms). 34/34 check, incremental == fresh 706/706, tree-sitter 9819 states / gate 96.0%.
`=`/compound-assignment require a LeftHandSideExpression target; a binary, relational, or `as`/`satisfies` expression is not one, so `a + b = c`, `a in b = c`, `a instanceof B = c`, `a as T = c`, `'prop' in v = 10` are spec grammar errors. (`++1`/`1 = 2` stay accepted — `1` IS grammatically the operand of `++`/`=`; "not a simple target" is a STATIC SEMANTIC the structural parser leaves to a checker. `(a + b) = c` stays accepted — a parenthesized expression IS a LeftHandSideExpression; the inner-not-simple is likewise static-semantic.) Completes the `_notTarget` predicate: beyond a prefix-op HEAD / postfix-op TAIL (unary / update operands), a node whose MIDDLE child is a BINARY CONNECTOR leaf is a binary expression. The connector set is grammar DATA — ladder infix operators plus the alternative-form binary LEDs (`in`/`instanceof`/`as`/`satisfies`/`?`) — so member `a.b` / element `a[b]` (a punct child) and a paren cover (a node child) still pass. we-accept 9 -> 8, 0 false-negatives (24-case probe, both engines == tsc). Parser-only. 34/34 check, incremental == fresh 706/706.
Monogram's parser emits a CST — grammar-sanctioned parse trees, pre-semantic. Its sole correctness criterion is therefore syntactic: accept every string the spec PRODUCTIONS derive, reject only production-violations (where no parse tree exists). Static-Semantics "early errors" are a CST CONSUMER's job (CST->AST lowering / a validator), not the producer's. External parsers (tsc/V8/babel) are test-data and recall oracles; they do not DEFINE what is accepted — they diverge in every direction (tsc accepts `(a+b)=c`/`public public` and rejects `o?.#x`; babel rejects `public public` and accepts `o?.#x`), so the productions are the only oracle-independent reference. Two checks rejected PRODUCTION-DERIVABLE trees and are reverted: - prefixTarget: the prefix `++`/`--` operand is grammatically a UnaryExpression (`UpdateExpression : ++ UnaryExpression`), so `++-x`, `++ ++x`, `++await x`, `++delete a.b` are production-derivable. "Operand is not a simple assignment target" is a static-semantic early error (the same class as `(a+b)=c`, which we already accept) — it surfaces downstream when an AST `UpdateExpression` (operand: SimpleAssignmentTarget) fails to lower, not here. - ?.#priv: `o?.#x` is valid current ECMAScript (V8 + babel accept; tsc's lone parse rejection is being removed in TS#60263), so PrivateField stays in the `?.` member alternative. Kept — genuine production-violations (no parse tree exists): lhsTarget / binary-LHS (the `=` LHS slot is a LeftHandSideExpression, so `a+b=c`/`x++=1` are not derivable; `(a+b)=c` IS, via the paren cover, and stays accepted), postfixTarget (postfix operand slot is a LeftHandSideExpression, so `x++ ++` is not derivable), and modRun (at-most-one `static` modifier is ECMAScript syntax — one `static` slot in ClassElement; tsc AND babel both reject `static static x`). modRun's comment, which mis-framed it as a tsc-only quirk, is corrected; its tsRelax is a legitimate tree-sitter GLR capability bridge, not misplaced semantics. we-accept vs tsc rises 8 -> 14: all six new ones (++await x2, --ANY--, ++ANY++, ++delete, this?.#b) are the expected production-derivable early-errors — faithful CST accepts, NOT regressions. The metric is reframed: triage over-accepts by production-derivability, not by tsc identity. FN=0 valid-recall preserved (reverts only add accepts). 34/34 check, incremental == fresh 706/706, tree-sitter 9819 states / 96.0% (beats official 92.5%).
… a lexical error RegularExpressionFirstChar excludes `*` (the spec's disambiguator: `/*` opens a block comment, never a regex). Monogram's regex token admitted `*` as its first body char, so an unterminated `/* … /` (no closing `*/`, but a stray `/`) re-lexed as a regex literal and the file parsed clean — tsc/V8/babel all reject it as an unterminated comment. Fix models RegularExpressionFirstChar: the regex body's FIRST char additionally excludes `*` (a `*` anywhere after stays legal — `/a*/`), so `/*` falls to the block-comment opener and an unterminated comment is a genuine lexical error. Body stays one-or-more, so `//` is still a LineComment. Lexer-only (no tree-sitter change). 15/15 probe vs tsc (rejects `/* x /`, `/*x/`, `/*/`, `/* c`; accepts `/a*/`, `/[*]*/`, `/\*x/`, `/[a-z]/`, `a /* c */ / b`). 34/34 check, incremental == fresh 706/706.
In a for-of head the spec gates the `using` ForDeclaration arm with `[lookahead != using of]`, so `using of of` cannot read as a using-declaration binding named `of`; `using` as a plain identifier then fails too (the two trailing `of`s read as for-of keywords). tsc + babel both reject; Monogram over-generated. Guard the exact triple `not(['using', 'of', 'of'])` at the head of the declared for-head arm (both grammars). It is narrow on purpose: `for (using of; ;)` (C-style, binding named `of`), `for (await using of of [])` (the await-using arm), `for (let of of [])`, and `for (using of [])` (for-of whose iterated value is named `of`) all stay valid. Parser-only (the lookahead does not reach tree-sitter). 11/11 probe vs tsc. 34/34 check, incremental == fresh 706/706.
A NewExpression (a `new` with NO Arguments) is not a valid OptionalChain base — the spec
bases are MemberExpression / CallExpression / OptionalExpression, and a bare `new X` is a
NewExpression, a separate LeftHandSideExpression branch. So `new a?.b`, `new a?.b()`,
`new a<T>?.b`, `new class{}?.x`, `new new a()?.x` have no parse tree (tsc + V8 + babel all
reject with "Invalid optional chain from new expression"); Monogram over-generated, chaining
the `?.` LED onto the bare-`new` node.
Fix is grammar-level (no engine predicate): each `new` arm's no-Arguments exit now asserts
`not('?.')`, so a bare `new` followed by `?.` fails the arm (and `new` has no other NUD, so
the expression rejects). `new a()?.b` — Arguments consumed — chains via the outer `?.` LED
unchanged; a parenthesized `(new a)?.b` and `new (a?.b)()` (chain inside the callee) are
likewise unaffected.
20/20 probe vs tsc (rejects the 8 bare-`new` `?.` forms incl typed `new a<T>?.b` and
`new new a()?.x`; accepts `new a()?.b`, `new a().b?.c`, `(new a)?.b()`, `new (a?.b)()`,
`new class{}()?.x`, plain `new a`). 34/34 check, incremental == fresh 706/706, tree-sitter
9819 states / 96.0%.
…o parse tree)
A qualified type name `A.B` has an IdentifierReference root (TS grammar: `TypeName :
IdentifierReference | NamespaceName . IdentifierReference`), so the keyword/literal types
`void` / `null` / `true` / `false` / `this` are not `.`-qualifiable — `var v: void.x` is
underivable (tsc rejects; @babel/parser is lenient and accepts, but the spec PRODUCTIONS, not
a tool, decide). Monogram over-generated: its Pratt `.`-type-LED applied to any left type
(a Pratt-left-identity over-generation). `undefined`/`number`/`string`/… are identifier-rooted
and stay qualifiable.
Root-cause fix — a reusable zero-width engine primitive `notLeftLeaf(...words)` that gates a
Pratt LED arm on the LEFT node's outermost (head) leaf TEXT: placed at the head of a LED
alternative (before the self `$`), the arm matches only when the left node's head leaf is NOT
in the set. It mirrors the AssignmentTargetType gate (`_notTarget`/`lhsTarget`), reading the
same head leaf but predicated on TEXT membership rather than operator-tag shape, and is
implemented byte-identically in both engines (the LED loop and the left-recursion continuation
loop). Applied to the two `.`-qualification type LEDs:
[notLeftLeaf('void','null','true','false','this'), $, '.', Ident]
[notLeftLeaf('void','null','true','false','this'), $, '.', '<', sep(Type, ','), '>']
The marker is zero-width, so it preserves the CST shape of every VALID type (void/null/this
stay Identifier-leaf nodes — an earlier leaf-rerouting attempt changed their leaf kind and
broke ts-ast-verify). gen-treesitter renders it `blank()` and drops it, so the derived GLR
grammar keeps the unconstrained `.` LED (a left-leaf predicate is not GLR-expressible; a stray
`void.x` is harmless for a highlighter) — grammar.js byte-identical, no tree-sitter generate.
21/21 probe vs tsc (REJECT `void.x`/`null.x`/`true.x`/`false.x`/`this.x`/`void.<number>`/
`this.foo`-as-type; ACCEPT `undefined.x`/`number.x`/`A.B.C`/`void[]`/`void|number`/`this`/
`this is T`/`undefined.<number>`). 34/34 check (incl emit≡gen byte-identical), incremental ==
fresh 706/706, tree-sitter 96.0%.
…attern
A `using` / `await using` declaration binds a BindingIdentifier — the explicit-resource-
management grammar forbids a BindingPattern (`BindingList[~Pattern]`). So `await using [a] = x`
and `await using {a} = x` and `using {a} = x` are not derivable as declarations. `using [a] = b`
IS valid, but as `using[a] = b` — element-assignment on the IDENTIFIER `using` — not a using
declaration. Monogram over-generated, parsing `[a]`/`{a}` as a declaration binding pattern.
A `not(alt('[','{'))` after `using` routes a `[`/`{` start to the expression arm: `using [a]`
becomes the element-access `using[a]` (valid, kept), while `using {a}` and any `await using`
pattern fail there too and reject. The marker is zero-width, so the `Binding` CST of every
valid using declaration is unchanged.
This rejects `using {a} = b` / `await using {a} = x`, which tsc's PARSER accepts (leniently)
but V8 and @babel/parser both reject — a deliberate, spec-grounded divergence from tsc (the
`using {a}` object form has no derivation), consistent with the production-derivability metric.
`await using [a] = null` (the corpus case) is rejected by all three. Valid: `using a`,
`await using a`, `using a, b`, `using a: T`, `using;`, `using[a]`.
Parser-only (the lookahead does not reach tree-sitter). 10/10 probe. 34/34 check, incremental
== fresh 706/706. (Residual: a non-first binding pattern `using a, [b]` is still accepted — the
guard checks the first binding; rare, left for a follow-up.)
The "idea" section claimed the grammar must accept/reject exactly what tsc does. That is no
longer the rule (and "match tsc exactly" is the over-fit the design avoids): tsc is the
measurement oracle, not the definition of correct. Added a "Correctness: the productions, not
tsc" subsection that says what the parser actually models — the syntactic productions — and
that its CST is pre-semantic, so static-semantic early errors are a CST consumer's job, not the
parser's. A table shows the both-directional divergences from tsc's parser (all verified vs
V8 + Babel): `obj?.#field` accept, `void.x` / `using {a}` reject, `++ -x` accept. Linked from
the idea section and from the CST line in "What you get". CST-vs-AST basics were already
covered there; this adds only the load-bearing semantic distinction.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
parse/editon the handle API are total: every input produces a tree plus acst.errorsfield ([{ offset, end, message }], empty when clean) — never a throw. Valid inputs take today's strict path exclusively (byte-identical trees, the full conformance/parity gate chain unchanged); only a strict reject re-parses with recovery. Broken-state keystrokes stay incremental (~2–6 ms on a 9 MB document; valid keystrokes ~1 ms). Closes #39.Recovery
Strict-first, so the valid path is untouched. A failing element in a spine-shaped list (statements, members) absorbs tokens into an
$errorrow up to its FIRST set / the enclosing follower / EOF; expression-internal hooks are excluded (they cascade). Bar discipline fires recovery only where parsing is stuck at a strict-proven fail point, stateless so it stays equivalence-safe and arm-blind. Diagnostics are derived from the final tree'srowRMspine, not collected — including$missing-token synthesis (expected ')'), viable-set messages, and paired-openerrelatedspans. Full contract + theorems inTOTAL-PARSING.md.Correctness: the productions, not
tscThe accept/reject surface is defined by the language's syntactic productions;
tscis the measurement oracle, not the definition (its parser diverges from the grammar — and from V8 / Babel — in both directions). The CST is pre-semantic, so static-semantic early errors are a CST consumer's job. Two over-strict checks were reverted (++-x,obj?.#fieldare derivable → accepted) and six over-accepts the cover over-generated were fixed (a + b = c, unterminated/*,for (using of of …),new a?.b,void.xvia a new zero-widthnotLeftLeafLED-guard,await using [a]). The both-directional divergence table is in the README.Gates
Bidirectional conformance; byte-identical emit≡gen engines; cst-match-totality; ts-ast-verify; incremental ≡ fresh (706/706);
test/exhaustive-edits.ts— edit ≡ fresh for every document up to a small bound × every single-character edit (~330k steps, CI); tree-sitter 96.0% (beats official 92.7%). Suite 34/34. Diagnostic recall vstsc's parser 61.2%; the 108 divergent files are enumerated in the ROADMAP (31 = the[Await]/[Yield]context class, 77 = per-shape strictness).Perf, the head-to-head vs
tsc/tree-sitter, and the full numbers are in the README.