Complexity cap: bound pathological HermitCrab parses (follow-up to rustify)#448
Complexity cap: bound pathological HermitCrab parses (follow-up to rustify)#448johnml1135 wants to merge 6 commits into
Conversation
Adds ParseContext, a per-ParseWord work budget (MaxParseSteps + ParseTimeout, generous defaults shipped on) propagated through Word exactly like CurrentTrace. Every analysis/synthesis leaf rule Apply() checks it and returns Enumerable.Empty<Word>() on breach (soft-stop, never throws); orchestration-level loops (AnalysisStratumRule, AnalysisLanguageRule, Morpher.Synthesize/LexicalLookup) fast-unwind once exhausted. ParseWord gains a ParseDiagnostics overload reporting whether the budget was hit and why; RerunWithDiagnostics re-parses one word with per-rule counters to report the top offending rule. Confirmed against a synthetic "no overt exponent" pathological rule (HC0001-shaped: pure-copy Rhs with a high MaxApplicationCount) that previously ran unbounded past the cascades' own input==output loop guard. See complexity-cap.md for the full design (Layers 1-3).
Adds three additive, default-off caps that convert exponential blowups into bounded ones instead of merely time-boxing them: - Morpher.MaxRuleApplicationsPerWord: a running total-unapplications counter on Word (Word.TotalUnapplicationCount), checked alongside the existing per-rule MaxApplicationCount in the three affix/compounding analysis rules. Closes the "rule A -> B -> A -> B" loophole that a per-rule cap alone cannot catch. - Morpher.MaxAnalysisShapeGrowth: prunes analysis candidates whose shape has grown past the surface form by more than N segments, checked at AnalysisStratumRule's output loop (the choke point - candidates pruned there never reach lexical lookup) and per-iteration inside AnalysisRewriteRule's Deletion/SelfOpaquing reapplication loops. - PermutationRuleCascade.MaxDepth (SIL.Machine core, opt-in via a new property, -1/unlimited by default so existing consumers are unaffected): caps nested rule-reapplication depth, derived from MaxRuleApplicationsPerWord rather than a new knob, synced each Apply() call since the cap can be set via object-initializer syntax after the rule cascade is already compiled. Verified against RewriteRuleTests.DeletionRules' real deletion-rule grammar: capping MaxAnalysisShapeGrowth excludes the deep-reinsertion analysis while the shallow ones survive as a strict subset of the uncapped result.
…onesty pass Adds GrammarAnalyzer, a static analyzer over a loaded Language that flags always/almost-always-wrong rule shapes with stable diagnostic codes (HC0001-HC0008: no-overt-exponent affix rules, unbounded multipleApplication, self-feeding epenthesis/deletion rules, unconstrained compounding, optional-iterative lexical patterns, cyclic feeding pairs). Wired into the hc CLI as a new `hc lint` command, plus a `hc parse --diagnose` flag that surfaces RerunWithDiagnostics' top offending rules for a single word - the empirical companion to the static lint. Both are documented in a new docs/hermitcrab-grammar-performance.md guide organized by HC code. While shaping HC0004's self-feeding check, deduped the "does this rule's output unify with its own required environment" logic shared between AnalysisRewriteRule and GrammarAnalyzer into a single IsUnifiableWithEnvironment extension, and found/fixed a real gap: the lint only covered one of two engine paths that select self-opaquing behavior, silently missing the epenthesis case (unconditionally dangerous in Simultaneous mode). Also fixed a pre-existing HC0007 condition that required Optional *and* IsIterative on adjacent lexical pattern nodes, when the design doc's own canonical example (([Seg])([Seg])) is two plain-optional (non-iterative) groups - the check now matches the documented intent. Ran the real Phase 0 calibration corpus (indonesian/sena) against the rustify engine and replaced the Phase 1 doc comment's fabricated "~13,600 steps" figure with real numbers: Indonesian's worst word takes 10,445 steps (flat ~10-rule combinatorial interaction, not one bad rule); Sena's worst sampled word takes 14.9M steps/105s from only a ~1% corpus sample, and a separate real word was previously being truncated by the old 10s default timeout at 99,584 steps. Raised DefaultMaxParseSteps to 50,000,000 and DefaultParseTimeout to 30s accordingly, and documented in complexity-cap.md (with two new "still open" items) that the Sena figures are a floor pending a full-corpus re-baseline, and that the timeout is a genuine truncation/latency tradeoff rather than a pure safety margin. 82/82 HermitCrab tests pass; both projects build clean; csharpier clean. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Bookkeeping only - the status header and phase table still said "Plan (not started)" after Phases 0-3 were implemented and committed.
Small addition to the ad hoc Phase 0 calibration harness, left uncommitted from the corpus investigation: keeps a running top-5 (by StepsUsed) instead of only the single max, so a full-corpus re-baseline (see complexity-cap.md Section 10 item 7) shows the shape of the tail, not just one data point.
…mization findings A separate investigation (sharded Release-mode full-corpus scan, see docs/hermitcrab-parse-algorithm-analysis.md on the sibling parse-optimization branch, not yet committed anywhere) got much further than this branch's own single-threaded Debug-mode recalibration attempt, which was aborted after ~1 hour at 283/7,121 Sena words to avoid burning many more hours on redundant/inferior data. Updates items 7-8 with that scan's numbers (p90 ~2M steps, ~16% of words >1M steps, worst observed >=39.9M steps, 30s ParseTimeout trips on dozens of legitimate words) and adds item 9: cinacemerwa (37.5M steps, 0 valid parses) crashed the NUnit test host outright, apparently from memory pressure independent of the step/timeout budgets - the current Layer 1/2 budgets bound steps and wall-clock but not allocations.
Sena full-corpus calibration updateMy own attempt to re-run the full 7,121-word Sena corpus (single-threaded, Debug build, via the A separate, much more efficient investigation (sharded 8-way, Release-mode instrumented harness — see
Pushed as |
Summary
Follow-up to #446 (rustify). PR #446 made the core HermitCrab engine much faster, but grammar-induced blowups remain: certain grammar constructs (unbounded/multiple-application rules with no overt exponent, unconstrained deletion, unconstrained compounding) still cause the analysis phase to generate candidates combinatorially, sometimes taking minutes to hours for a single word. This PR implements the three-layer mitigation designed in
complexity-cap.md(Phases 0–3; Phase 4 is FieldWorks-repo follow-up, out of scope here):b3fd2b55):ParseContextpropagated onWordexactly likeCurrentTrace.Morpher.MaxParseSteps/ParseTimeoutship on with generous defaults. Every ruleApply()site checks the budget; breach is a soft-stop (partial results +ParseDiagnostics, never an exception).RerunWithDiagnosticsre-parses one word with per-rule counters to report the top offending rule(s).e68f0984):MaxRuleApplicationsPerWord(closes the "rule A → B → A → B" loophole that a per-rule cap alone can't catch),MaxAnalysisShapeGrowth(prunes analysis candidates whose hypothesized underlying form grows past the surface form), and a cascade depth cap onPermutationRuleCascade. All default off (no behavior change for existing consumers).c8a39aeb):GrammarAnalyzerwith 8 stable diagnostic codes (HC0001–HC0008: no-overt-exponent affix rules, unboundedmultipleApplication, self-feeding epenthesis/deletion, unconstrained compounding, optional-iterative lexical patterns, cyclic feeding pairs). Wired into thehcCLI ashc lintand a newhc parse --diagnoseflag. Documented indocs/hermitcrab-grammar-performance.md.Plus two small follow-up commits: doc bookkeeping (
13567446) and a top-5-words-by-step-count diagnostic in the calibration test harness (343515b1).Calibration caveats (see
complexity-cap.md§4.1 and §10 items 7–8)indonesian-hc.xml/sena-hc.xmlshowed legitimate per-word cost varies by ~1000x between grammars, which broke the original "large multiple of one grammar's ceiling" plan. Shipped defaults (DefaultMaxParseSteps= 50,000,000,DefaultParseTimeout= 30s) are instead set with headroom above the largest legitimate word observed so far across both grammars.DefaultParseTimeout= 30s will still truncate some legitimate Sena words (one observed at 105s). This is flagged as a genuine product tradeoff needing field input, not something resolved unilaterally in this PR — feedback welcome.Test plan
This change is