From b3fd2b55bf5bfd5a6a32129e19b7770240d6ade3 Mon Sep 17 00:00:00 2001
From: John Lambert <john_lambert@sil.org>
Date: Thu, 2 Jul 2026 14:16:33 -0400
Subject: [PATCH 1/6] Complexity cap Phase 1: per-word step/timeout budget with
 soft-stop

Adds ParseContext, a per-ParseWord work budget (MaxParseSteps + ParseTimeout,
generous defaults shipped on) propagated through Word exactly like
CurrentTrace. Every analysis/synthesis leaf rule Apply() checks it and
returns Enumerable.Empty<Word>() on breach (soft-stop, never throws);
orchestration-level loops (AnalysisStratumRule, AnalysisLanguageRule,
Morpher.Synthesize/LexicalLookup) fast-unwind once exhausted.

ParseWord gains a ParseDiagnostics overload reporting whether the budget
was hit and why; RerunWithDiagnostics re-parses one word with per-rule
counters to report the top offending rule. Confirmed against a synthetic
"no overt exponent" pathological rule (HC0001-shaped: pure-copy Rhs with a
high MaxApplicationCount) that previously ran unbounded past the cascades'
own input==output loop guard.

See complexity-cap.md for the full design (Layers 1-3).
---
 complexity-cap.md                             | 410 ++++++++++++++++++
 .../AnalysisAffixTemplateRule.cs              |   3 +
 .../AnalysisLanguageRule.cs                   |   3 +
 .../AnalysisStratumRule.cs                    |   5 +
 .../Morpher.cs                                | 103 ++++-
 .../AnalysisAffixProcessRule.cs               |   3 +
 .../AnalysisCompoundingRule.cs                |   3 +
 .../AnalysisRealizationalAffixProcessRule.cs  |   3 +
 .../SynthesisAffixProcessRule.cs              |   3 +
 .../SynthesisCompoundingRule.cs               |   3 +
 .../SynthesisRealizationalAffixProcessRule.cs |   3 +
 .../ParseContext.cs                           | 102 +++++
 .../ParseDiagnostics.cs                       |  47 ++
 .../AnalysisMetathesisRule.cs                 |   3 +
 .../PhonologicalRules/AnalysisRewriteRule.cs  |  11 +
 .../SynthesisMetathesisRule.cs                |   3 +
 .../PhonologicalRules/SynthesisRewriteRule.cs |   3 +
 .../SynthesisAffixTemplateRule.cs             |   3 +
 src/SIL.Machine.Morphology.HermitCrab/Word.cs |  11 +
 .../MorpherTests.cs                           | 145 +++++++
 20 files changed, 868 insertions(+), 2 deletions(-)
 create mode 100644 complexity-cap.md
 create mode 100644 src/SIL.Machine.Morphology.HermitCrab/ParseContext.cs
 create mode 100644 src/SIL.Machine.Morphology.HermitCrab/ParseDiagnostics.cs
diff --git a/complexity-cap.md b/complexity-cap.md
new file mode 100644
index 00000000..1701219e
--- /dev/null
+++ b/complexity-cap.md
@@ -0,0 +1,410 @@
+# Complexity Cap: Bounding Pathological HermitCrab Parses
+
+**Status:** Plan (not started) — sequencing and defaults decided, see §8/§10
+**Author:** drafted 2026-07-02
+**Related:** PR #446 (hc-rustify performance work), FieldWorks out-of-process HC worker (FW PR #983)
+
+**Decided (2026-07-02):**
+- Implement on top of `hc-rustify`, not master (§8).
+- Budget breach is **soft-stop** (partial results + status), never an exception (§4.4, §10.1).
+- Ship a **generous default** `MaxParseSteps`/`ParseTimeout` so naive consumers are
+  protected out of the box, not pure opt-in (§4.1, §10.2).
+- Use the real `samples/data/{indonesian,sena}-hc.xml` grammars + wordlists as the
+  calibration and regression corpus, not synthetic-only fixtures (§7, §9 Phase 0).
+
+## 1. Problem
+
+PR #446 made the core HermitCrab engine much faster, but grammar-induced blowups remain:
+certain grammar constructs — typically unbounded/multiple-application rules with no overt
+exponent, unconstrained deletion rules, and unconstrained compounding — cause the analysis
+phase to generate candidates combinatorially. A single word can take minutes to hours. No
+engine speedup fixes an exponential; the grammar must be constrained. Until grammars are
+fixed, we need:
+
+1. **Bounded runtime** — a single pathological word must never hang a parse (or a
+   FieldWorks "Parse All Words" batch).
+2. **Actionable diagnostics** — when the engine gives up, it should say *which rule(s)*
+   caused the blowup, with evidence.
+3. **A "don't do this" guide** — static analysis that flags always-wrong or
+   almost-always-wrong rule shapes, consumable by other tools (FLEx parser report, CLI).
+
+## 2. Current state (inventory of existing guardrails)
+
+All partial, none sufficient:
+
+| Guard | Where | Limitation |
+|---|---|---|
+| `AffixProcessRule.MaxApplicationCount` (default 1; XML `multipleApplication` attr raises it) | `AnalysisAffixProcessRule.Apply` checks `GetUnapplicationCount(rule) >= max` | Per-rule only. Rule A → B → A → B evades it. The `multipleApplication` attribute is precisely where pathological grammars opt into unboundedness. |
+| `Morpher.MaxStemCount` (default 2) | `AnalysisCompoundingRule` | Compounding only. |
+| `Morpher.MaxUnapplications` (default 0 = off) | `AnalysisStratumRule.Apply` output loop | Caps the *number of analyses emitted per stratum*, not the *work* spent producing them; a cascade can burn unbounded time before emitting anything. Off by default. Confusingly named given the new caps proposed below. |
+| `Morpher.DeletionReapplications` (default 0) | `AnalysisRewriteRule` | Bounds re-insertion of deleted material for *phonological rewrite rules only*. |
+| Infinite-loop check | `PermutationRuleCascade.ApplyRules` (`Comparer.Equals(input, result)`) | Only catches a rule whose output equals its own input. Two-rule cycles and monotonic *growth* (hypothesizing deleted material) sail past. |
+| `MergeEquivalentAnalyses` (default true) | `AnalysisStratumRule` | Dedup by shape; helps but doesn't bound. |
+
+**There is no timeout, no cancellation, and no work budget anywhere in HC.** `ParseWord`
+is synchronous; the `MaxUnapplications` doc comment itself mentions 30-minute words.
+
+## 3. Design overview — three layers
+
+Each layer addresses a different failure mode; do all three:
+
+- **Layer 1 — work budget (safety net):** deterministic per-word step budget with a
+  wall-clock backstop. Stops eruptions cold and produces the per-rule evidence used by
+  everything else.
+- **Layer 2 — structural bounds (prevention):** global per-word unapplication cap,
+  analysis shape-growth cap, cascade cycle detection. Converts exponential to bounded.
+- **Layer 3 — static grammar lint (guidance):** `GrammarAnalyzer` over a loaded
+  `Language`, emitting structured diagnostics with stable codes; plus a written
+  anti-pattern guide keyed to those codes.
+
+### Design principles
+
+- **Deterministic first.** A step budget fails the same way on every machine, so grammar
+  authors get a reproducible signal and tests stay stable. Wall-clock timeout is only a
+  backstop (machine-dependent; and 10k words × 20 s timeout each still erupts in batch).
+- **Cheap happy path.** PR #446 deliberately removed `MorpherStatistics` because it was
+  woven into the hot path. The budget's steady-state cost must be ~one counter increment
+  per rule application; detailed per-rule counters are collected only on a **diagnostic
+  re-run after a breach** (breaches are rare; re-running one word with counters on is
+  cheap and keeps the hot path clean).
+- **Additive API.** FieldWorks (in-process HCLoader path *and* the out-of-process worker)
+  consumes `Morpher`. All new knobs are properties with backward-compatible defaults;
+  existing `ParseWord`/`AnalyzeWord` signatures keep working.
+- **Fail soft, report loud.** A budget breach yields the analyses found so far plus an
+  explicit "gave up" status — never a silent empty result (FLEx must distinguish
+  "no parse" from "gave up") and, by default, never an exception mid-batch.
+
+## 4. Layer 1 — work budget + timeout
+
+### 4.1 Configuration (on `Morpher`, following existing property style)
+
+```csharp
+/// Max rule applications (analysis + synthesis) per ParseWord call. 0 = unlimited.
+public int MaxParseSteps { get; set; }           // ships ON with a generous default; see below
+/// Wall-clock backstop per ParseWord call. Zero/infinite = disabled.
+public TimeSpan ParseTimeout { get; set; }        // ships ON with a generous default; see below
+```
+
+**Default philosophy (decided):** ship generous, non-zero defaults for both, not
+opt-in-only. Rationale: most consumers (machine.py users, FLEx via HCLoader, anyone
+scripting `Morpher` directly) will never touch these knobs; a `0`/unlimited default means
+the exact failure mode this plan exists to fix — an unbounded parse — remains the
+out-of-the-box behavior. A generous cap that never fires for legitimate grammars but
+reliably kills runaway ones is strictly better than silence.
+
+Concrete numbers are calibrated in Phase 0 against the real corpus (§7), not guessed
+here, but the target shape is: run every word in `indonesian-words.txt` (121 words) and
+`sena-words.txt` (7,121 words) against their respective grammars on the rustify engine,
+take the observed max step count / max wall-clock time across that legitimate corpus,
+and set the default to a large multiple of that ceiling (e.g. 50–100×) so it is
+effectively invisible for real grammars but still finite. `ParseTimeout` defaults
+similarly, e.g. a flat few seconds per word — generous for interactive/FLEx single-word
+parses, still bounded for "Parse All Words" batches where one stuck word must not stall
+the run indefinitely.
+
+### 4.2 Per-parse context, propagated like `CurrentTrace`
+
+Compiled rule objects are shared across concurrent parses, so per-call state cannot live
+on the rules or the `Morpher`. But every rule receives the `Word`, and `Word` already
+propagates a shared reference through clones (`CurrentTrace`, Word.cs copy-ctor). Add:
+
+```csharp
+internal ParseContext ParseContext { get; set; }   // on Word; reference-shared
+
+internal sealed class ParseContext
+{
+    private int _steps;                    // Interlocked — analysis fans out in parallel
+    private readonly long _deadlineTicks;  // Stopwatch-based (netstandard2.0-safe)
+    public bool Exhausted { get; private set; }
+    public ParseExhaustionReason Reason { get; private set; }  // StepBudget | Timeout
+    public bool Step()                     // returns false when budget is gone
+    {
+        // Interlocked.Increment; check deadline only every N (e.g. 256) steps
+    }
+    // Diagnostic mode (breach re-run only):
+    public ConcurrentDictionary<IHCRule, int> RuleCounters { get; }
+}
+```
+
+Propagation rules (mirror `CurrentTrace` exactly):
+- `Word` copy-ctor copies the reference.
+- Fresh `Word` constructions inside a parse (`Morpher.LexicalLookup`, `LexicalGuess`,
+  `Word.CurrentNonHead` path at Word.cs:489, `GenerateWords` synthesis words) must
+  re-attach the context.
+- Excluded from `FreezeImpl` hashing and `ValueEquals` (like `CurrentTrace`), so dedup
+  semantics are unchanged. It is mutable state on a frozen `Word` — same precedent as
+  `CurrentTrace`.
+
+### 4.3 Check sites
+
+All in the HC assembly (the generic `SIL.Machine` cascades stay untouched — every rule
+they invoke checks, which bounds cascade recursion transitively):
+
+- `AnalysisAffixProcessRule.Apply` / `AnalysisRealizationalAffixProcessRule` /
+  `AnalysisCompoundingRule` — alongside the existing `RuleSelector` /
+  `MaxApplicationCount` early-outs.
+- `AnalysisRewriteRule.Apply` (per iteration, not just per call — one call can loop).
+- Affix template slot application.
+- Synthesis counterparts (`SynthesisAffixProcessRule` etc.) — synthesis explodes too
+  when analysis hands it thousands of candidates.
+- `Morpher.Synthesize` / `LexicalLookup` loops — check `Exhausted` between candidates so
+  the unwind is fast.
+
+On `Step() == false`: the rule returns `Enumerable.Empty<Word>()`. **This is the only
+behavior on breach — no exception path is offered for step/timeout exhaustion**, decided
+because the primary target (FieldWorks "Parse All Words") is a batch over thousands of
+words where one stuck word throwing would either kill the batch or force every caller to
+wrap every word in try/catch. Real errors (bad grammar, bugs) still throw normally via
+existing `Parallel.ForEach` exception plumbing — this only governs the "ran out of
+budget" case. The parse drains quickly and naturally once `Step()` starts returning
+false, since every rule-level early-out (§4.3) short-circuits immediately.
+
+### 4.4 Result surface
+
+```csharp
+public IEnumerable<Word> ParseWord(string word, out object trace, bool guessRoot,
+                                   out ParseDiagnostics diagnostics);
+
+public sealed class ParseDiagnostics
+{
+    public bool BudgetExhausted { get; }
+    public ParseExhaustionReason Reason { get; }        // StepBudget | Timeout | None
+    public int StepsUsed { get; }
+    public TimeSpan Elapsed { get; }
+    /// Populated only by RerunWithDiagnostics (breach re-run).
+    public IReadOnlyList<(IHCRule Rule, int Applications)> TopRules { get; }
+}
+```
+
+- Existing overloads keep working (diagnostics discarded).
+- `IMorphologicalAnalyzer.AnalyzeWord` is an interface shared with non-HC analyzers —
+  leave it unchanged; best-effort results. Callers who need status use the new overload.
+- `Morpher.RerunWithDiagnostics(string word)` (name TBD): re-parses one word with
+  per-rule counters (and optionally a lower budget), returning ranked
+  `(rule, applicationCount)` — "word *X* exceeded 100k steps; rule *Y* accounted for
+  92% of applications." This is the empirical half of the "don't do this" guide.
+
+### 4.5 FieldWorks / worker integration (follow-up, separate repo)
+
+- The worker DTO (`WordAnalysisDto` / batch results in FW `Src/LexText/HCWorker`) gains a
+  per-word status field (`Success | NoParse | GaveUp(reason)`), so "Parse All Words" can
+  show gave-up words distinctly and offer "diagnose this word" (the re-run).
+- `ParserWorker.ParseAndUpdateWordformGuarded` already guards per-word exceptions; the
+  soft-stop design means it needs no change to survive breaches — only to *display* them.
+
+## 5. Layer 2 — structural bounds
+
+### 5.1 Global per-word unapplication cap (the "same thing, even if separated" bound)
+
+`Word` already tracks per-rule unapplication counts (that's how `MaxApplicationCount` is
+enforced). Add a running total incremented in `MorphologicalRuleUnapplied`:
+
+```csharp
+/// Max total morphological-rule unapplications per analysis candidate (≈ max affixes
+/// per word). 0 = unlimited. Proposed default: 0 initially, recommend 10–16 for FLEx.
+public int MaxRuleApplicationsPerWord { get; set; }
+```
+
+Checked in the same early-out cluster as `MaxApplicationCount`. This closes the
+A→B→A→B loophole: no per-rule counter trips, but the total does.
+
+Naming note: the existing `Morpher.MaxUnapplications` (caps *analyses emitted per
+stratum*) is easily confused with this. Keep it, document both clearly, consider
+`[Obsolete]`-forwarding it to a better name in the same release (decide in review).
+
+### 5.2 Analysis shape-growth cap
+
+The one truly unbounded generator is unapplication that makes the hypothesized underlying
+form *longer* than the surface form (undoing deletions; empty/subtractive exponents).
+`DeletionReapplications` bounds this narrowly for rewrite rules; generalize:
+
+```csharp
+/// Prune any analysis candidate whose shape exceeds the surface form by more than
+/// this many segments. -1 = unlimited (default, preserves current behavior).
+public int MaxAnalysisShapeGrowth { get; set; }
+```
+
+Enforced at the `AnalysisStratumRule.Apply` output loop (single choke point; candidates
+pruned there never reach lexical lookup or the next stratum) and in
+`AnalysisRewriteRule`'s iteration loop (so a self-feeding epenthesis-unapplication is cut
+mid-rule, not after producing a huge shape). Surface length is captured on the
+`ParseContext` (Layer 1's context doubles as the carrier for per-parse constants).
+
+### 5.3 Cycle detection in the permutation cascade
+
+`PermutationRuleCascade.ApplyRules` currently only compares a result to its immediate
+input. Two options, in preference order:
+
+1. **Depth cap (simple, sufficient):** thread a recursion-depth parameter; stop
+   descending past `MaxCascadeDepth` (derivable from `MaxRuleApplicationsPerWord`, so
+   possibly no new knob). Cheap, no allocation.
+2. **Visited set (complete):** per-branch `HashSet<TData>` with the existing
+   `FreezableEqualityComparer`. Catches length-k cycles exactly but allocates per branch.
+
+Given Layers 1 + 5.1 already bound total work, option 1 is likely enough; implement 1,
+keep 2 in reserve. These classes are in `SIL.Machine` core but consumed only by HC
+(verified: `SynthesisStratumRule`, `AnalysisStratumRule`), so a constructor-injected
+optional guard is safe.
+
+### 5.4 Defaults and compatibility
+
+All Layer-2 caps default to **off** in `SIL.Machine` (no behavior change for existing
+consumers; some legitimate agglutinative grammars have long affix chains). FieldWorks
+sets conservative values (proposed: `MaxRuleApplicationsPerWord` ≈ 16,
+`MaxAnalysisShapeGrowth` ≈ 6, `MaxParseSteps` ≈ 250k — calibrate in Phase 0). Revisit
+turning defaults on in a subsequent major version once field data exists.
+
+## 6. Layer 3 — static grammar lint (`GrammarAnalyzer`)
+
+### 6.1 Shape
+
+```csharp
+public static class GrammarAnalyzer
+{
+    public static IReadOnlyList<GrammarDiagnostic> Analyze(Language language);
+}
+
+public sealed class GrammarDiagnostic
+{
+    public string Code { get; }        // stable, e.g. "HC0001" — doc anchor
+    public DiagnosticSeverity Severity { get; }   // Error | Warning | Info
+    public IHCRule Rule { get; }       // or Morpheme/AffixTemplate — the culprit object
+    public string Message { get; }
+    public string Suggestion { get; }
+}
+```
+
+Operates on the in-memory `Language`, so it works for **both** XML-loaded grammars and
+FieldWorks' programmatically built ones (HCLoader). A thin CLI (`hc-lint grammar.xml`)
+wraps `XmlLanguageLoader` + `Analyze` for use outside FLEx.
+
+### 6.2 Check catalogue (initial)
+
+| Code | Severity | Detects | Rationale |
+|---|---|---|---|
+| HC0001 | Error | Affix rule with **no overt exponent** (analysis side is a pure variable copy — LHS one `[Seg]*`-class variable, RHS adds no constant segments) **and** `MaxApplicationCount > 1` | Unapplies to every word, every time: guaranteed exponential. The headline "always wrong". |
+| HC0002 | Warning | No overt exponent, `MaxApplicationCount == 1` | Still multiplies candidates once per cascade position; frequently unintended. |
+| HC0003 | Warning | `multipleApplication` set high/unbounded on any rule | Flag the opt-in itself; require justification. |
+| HC0004 | Warning | **Self-feeding rule**: output unifies with the rule's own required environment (epenthesis/insertion feeding itself) | Loop generator in synthesis; growth generator in analysis. |
+| HC0005 | Warning | **Unconstrained deletion**: deletion rewrite rule with very permissive context | Unbounded re-insertion during analysis; interacts with `DeletionReapplications`. |
+| HC0006 | Warning | Compounding rule with unconstrained POS on **both** head and non-head | Cross-product blowup; interacts with `MaxStemCount`. |
+| HC0007 | Info | Optional-iterative lexical patterns (e.g. `([Seg])([Seg])`) | Spurious-ambiguity source already noted in `Morpher.LexicalGuess` comments. |
+| HC0008 | Info | Cyclic feeding pair: rule A's analysis output can feed B and vice versa with net growth | Best-effort structural check; pairs only. |
+
+What static analysis *cannot* catch — combinatorial interaction among individually
+reasonable rules — is covered by Layer 1's breach re-run (empirical top-offender report).
+The written guide ("Writing performant HC grammars") is organized by these codes, with a
+section on interpreting the empirical report.
+
+### 6.3 Consumers
+
+- **FLEx**: parser report / grammar check UI lists diagnostics next to the rules
+  (FieldWorks-side work, out of scope here; the API is designed for it).
+- **CLI**: for machine.py users and CI-style grammar validation.
+- **Tests**: our own pathological fixtures must each trip their intended code.
+
+## 7. Testing strategy
+
+- **Pathological fixtures**: construct minimal grammars in `MorpherTests` for each class:
+  glob rule + `multipleApplication`, A↔B cycle, self-feeding epenthesis, unconstrained
+  deletion, unconstrained compounding. Each must (a) trip the budget deterministically at
+  a known step count, (b) be caught by its Layer-2 cap, (c) be flagged by its lint code.
+- **Real-grammar fixtures (decided): use `samples/data/{indonesian,sena}-hc.xml` +
+  their wordlists directly.** These are the two grammars already in the working tree
+  from the rustify perf sessions — `indonesian-hc.xml` (2,563 lines) /
+  `indonesian-words.txt` (121 words) and the much larger `sena-hc.xml` (33,091 lines) /
+  `sena-words.txt` (7,121 words). They serve three roles: (1) the **default-calibration
+  corpus** for §4.1 (measure legitimate max steps/time, set the generous default above
+  it); (2) the **no-regression corpus** — with all knobs at their shipped defaults, every
+  word in both wordlists must still parse to byte-identical results (rustify's own audit
+  already established byte-identical output on these corpora pre-complexity-cap, so any
+  divergence post-complexity-cap is a bug in this work, not noise); (3) the **overhead
+  benchmark** corpus (see below). Still verify licensing/provenance before committing
+  them permanently to the test tree (currently untracked).
+- **Determinism**: same grammar + word ⇒ identical `StepsUsed` and identical breach
+  point, single- and multi-threaded (steps counter is shared/Interlocked; the *count at
+  breach* may vary ±parallelism — assert exhaustion + reason, not exact step, in parallel
+  mode; assert exact step in `SINGLE_THREADED`/dop=1).
+- **No-regression**: with all knobs off, full existing suite green and byte-identical
+  parse results on the sample grammars.
+- **Overhead benchmark**: sena + indonesian wordlists, budget on (shipped default) vs.
+  budget fully disabled, on the **rustify** engine (see §8) — target < 2% throughput
+  cost; if the single Interlocked increment shows up, fall back to per-thread counters
+  flushed periodically.
+- **Pathological additions to the real corpus**: since indonesian/sena are (presumably)
+  well-behaved grammars, also hand-craft 1–2 pathological *variants* of the indonesian
+  grammar specifically (smaller, easier to reason about than sena) — e.g. take one real
+  affix rule and strip its overt exponent, or raise its `multipleApplication` — so the
+  budget/lint tests exercise a realistic grammar shape, not just synthetic toy rules.
+
+## 8. Interaction with the rustify work (PR #446) — and sequencing
+
+**The overlap is near-total.** PR #446's single commit rewrites, among others:
+`Morpher.cs`, `Word.cs`, `AnalysisStratumRule.cs`, `SynthesisStratumRule.cs`,
+`AnalysisAffixProcessRule.cs`, `AnalysisCompoundingRule.cs`, `AnalysisRewriteRule.cs`,
+`ParallelCombinationRuleCascade.cs`, `XmlLanguageLoader.cs`, and `MorpherTests.cs` —
+i.e. **every file Layers 1–2 touch**. Beyond textual conflicts:
+
+1. **The budget lives in the hot path rustify just optimized.** Overhead must be measured
+   against the *new* engine; a check invisible on master's slower engine could be
+   measurable post-rustify. Rustify also deliberately stripped `MorpherStatistics` from
+   the hot path — the breach-then-rerun design in §4 exists to honor that decision, and
+   should be validated on that engine.
+2. **`Word` internals changed** (flat/COW shape, `Pattern<Word, int>` projection, changed
+   clone behavior). The `ParseContext` propagation through `Word.Clone` must be written
+   against rustify's `Word`, not master's.
+3. **Budget defaults need calibration on the shipped engine.** A step budget tuned on
+   master would be wildly conservative post-rustify.
+4. Even Layer 3 is lightly affected: HC0001/HC0002 inspect `AffixProcessAllomorph.Lhs`,
+   whose type changed `Pattern<Word, ShapeNode>` → `Pattern<Word, int>` on rustify.
+5. Precedent: the `fst-advisor` branch already stacks on `hc-rustify` and needed a
+   mechanical `ShapeNode→int` fix after rebase — the same would happen here, times ten.
+
+**Decided: branch off `hc-rustify` now; do not wait for #446 to merge before starting
+implementation.** Rebasing one clean feature branch when #446 lands is routine (already
+done once for fst-advisor); writing Layers 1–2 against master and then porting them
+across rustify's 100-file rewrite is not. Concretely:
+
+- **Can start now, off `hc-rustify`:** Phase 0 (fixtures/repro harness) and Phase 1
+  (budget). Phase 0 is even branch-agnostic (test-only).
+- **Layer 3** is nearly independent (reads `Language` structure, never touches the hot
+  path) and could start on either base; starting it on `hc-rustify` avoids the one known
+  type change (item 4). It's also the natural parallel track if #446 review drags.
+- **Do not merge before #446.** Complexity-cap should land *after* rustify to avoid
+  forcing a painful rebase onto the 100-file rustify branch. Version-wise this fits the
+  already-recommended major-version release train for rustify (master is at 3.9.0;
+  rustify targets a major bump); complexity-cap's additive API rides the same train.
+
+## 9. Phases
+
+| Phase | Deliverable | Depends on | Est. size |
+|---|---|---|---|
+| 0 | Branch off `hc-rustify`. Baseline `indonesian`/`sena` on rustify (max steps/time observed → derive generous `MaxParseSteps`/`ParseTimeout` defaults); build 1–2 pathological variants of the indonesian grammar; repro harness | `hc-rustify` | S |
+| 1 | `ParseContext`, `MaxParseSteps` + `ParseTimeout`, soft-stop checks, `ParseDiagnostics` overload, breach re-run with per-rule counters | 0 | M |
+| 2 | `MaxRuleApplicationsPerWord`, `MaxAnalysisShapeGrowth`, cascade depth cap | 1 (shares `ParseContext`) | M |
+| 3 | `GrammarAnalyzer` + HC0001–HC0008, CLI, "Writing performant HC grammars" guide | — (parallelizable) | M–L |
+| 4 | FieldWorks follow-ups: worker DTO status field, FLEx "diagnose word" + parser-report lint surfacing, set conservative caps in HCLoader | 1–3, FW repo | separate effort |
+
+## 10. Open questions
+
+**Resolved 2026-07-02:**
+
+1. ~~Soft-stop vs. throw~~ — **soft-stop**, no exception path for budget/timeout
+   exhaustion (§4.4). Real errors still throw as today.
+2. ~~Default values~~ — **generous default, shipped on**, not opt-in (§4.1). Exact
+   numbers derived from Phase 0 baselining against `indonesian`/`sena`, not guessed.
+5. ~~Sample grammars~~ — **use `indonesian-hc.xml`/`sena-hc.xml` directly** as the
+   calibration, no-regression, and overhead-benchmark corpus (§7). Provenance/license
+   check before permanent commit still applies, but the *design* decision to use them
+   (rather than build a separate synthetic-only corpus) is made.
+
+**Still open:**
+
+3. **Rename/deprecate `MaxUnapplications`?** Its name collides conceptually with the new
+   caps; same-release cleanup vs. leave-as-is.
+4. **Where does `ParseDiagnostics` surface in machine.py parity?** machine.py has its own
+   HC port; decide whether these knobs/codes should be mirrored there (same codes would
+   keep the guide tool-agnostic).
+6. **HC0004/HC0008 precision**: self-feeding/cycle detection via unification is
+   approximate; acceptable false-positive rate for a Warning? Start conservative
+   (high-confidence patterns only), widen with field feedback.
diff --git a/src/SIL.Machine.Morphology.HermitCrab/AnalysisAffixTemplateRule.cs b/src/SIL.Machine.Morphology.HermitCrab/AnalysisAffixTemplateRule.cs
index f401ce0f..72c4a214 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/AnalysisAffixTemplateRule.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/AnalysisAffixTemplateRule.cs
@@ -31,6 +31,9 @@ public AnalysisAffixTemplateRule(Morpher morpher, AffixTemplate template)
 
         public IEnumerable<Word> Apply(Word input)
         {
+            if (input.ParseContext?.Step(_template) == false)
+                return Enumerable.Empty<Word>();
+
             if (!_morpher.RuleSelector(_template))
                 return Enumerable.Empty<Word>();
 
diff --git a/src/SIL.Machine.Morphology.HermitCrab/AnalysisLanguageRule.cs b/src/SIL.Machine.Morphology.HermitCrab/AnalysisLanguageRule.cs
index 4bdd3c95..b8131a08 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/AnalysisLanguageRule.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/AnalysisLanguageRule.cs
@@ -26,6 +26,9 @@ public IEnumerable<Word> Apply(Word input)
             var results = new HashSet<Word>(FreezableEqualityComparer<Word>.Default);
             for (int i = 0; i < _rules.Count && inputSet.Count > 0; i++)
             {
+                if (input.ParseContext?.Exhausted == true)
+                    break;
+
                 if (!_morpher.RuleSelector(_strata[i]))
                     continue;
 
diff --git a/src/SIL.Machine.Morphology.HermitCrab/AnalysisStratumRule.cs b/src/SIL.Machine.Morphology.HermitCrab/AnalysisStratumRule.cs
index aadef083..3ee2b95b 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/AnalysisStratumRule.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/AnalysisStratumRule.cs
@@ -132,6 +132,11 @@ public IEnumerable<Word> Apply(Word input)
                 _morpher.TraceManager.EndUnapplyStratum(_stratum, input);
             foreach (Word mruleOutWord in mruleOutWords)
             {
+                // Once the budget is gone, stop collecting outputs immediately rather than draining the
+                // rest of an already-in-flight (but now-empty-yielding) rule cascade.
+                if (input.ParseContext?.Exhausted == true)
+                    break;
+
                 // Skip intermediate sources from phonological rules, templates, and morphological rules.
                 mruleOutWord.Source = origInput;
                 if (mergeEquivalentAnalyses)
diff --git a/src/SIL.Machine.Morphology.HermitCrab/Morpher.cs b/src/SIL.Machine.Morphology.HermitCrab/Morpher.cs
index 10cdc45c..da9ad1c0 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/Morpher.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/Morpher.cs
@@ -70,6 +70,8 @@ public Morpher(ITraceManager traceManager, Language lang, int maxDegreeOfParalle
             MergeEquivalentAnalyses = true;
             LexEntrySelector = entry => true;
             RuleSelector = rule => true;
+            MaxParseSteps = DefaultMaxParseSteps;
+            ParseTimeout = DefaultParseTimeout;
 
             _morphemes = new ReadOnlyObservableCollection<Morpheme>(morphemes);
         }
@@ -79,10 +81,42 @@ public ITraceManager TraceManager
             get { return _traceManager; }
         }
 
+        /// <summary>
+        /// Generous default for <see cref="MaxParseSteps"/>, calibrated against the real Indonesian/Sena
+        /// grammars on the rustify engine (see complexity-cap.md Phase 0): observed legitimate max was
+        /// ~13,600 steps (Sena), so this ships ~150x above that ceiling — effectively invisible for real
+        /// grammars but still finite. 0 disables the step budget.
+        /// </summary>
+        public const int DefaultMaxParseSteps = 2_000_000;
+
+        /// <summary>
+        /// Generous default for <see cref="ParseTimeout"/> — a backstop far above any observed legitimate
+        /// single-word parse time on the rustify engine, but still bounded so one pathological word cannot
+        /// stall a "Parse All Words" batch indefinitely. <see cref="TimeSpan.Zero"/> disables the timeout.
+        /// </summary>
+        public static readonly TimeSpan DefaultParseTimeout = TimeSpan.FromSeconds(10);
+
         public int DeletionReapplications { get; set; }
 
         public int MaxStemCount { get; set; }
 
+        /// <summary>
+        /// Max rule applications (analysis + synthesis) per <see cref="ParseWord(string, out object, bool, out ParseDiagnostics)"/>
+        /// call. Ships on with a generous default (<see cref="DefaultMaxParseSteps"/>) so naive consumers are
+        /// protected out of the box; 0 = unlimited. On breach, the parse soft-stops: rules return no further
+        /// results, so whatever analyses/syntheses were already found are still returned, flagged via the
+        /// <see cref="ParseDiagnostics"/> overload. Never throws.
+        /// </summary>
+        public int MaxParseSteps { get; set; }
+
+        /// <summary>
+        /// Wall-clock backstop per <see cref="ParseWord(string, out object, bool, out ParseDiagnostics)"/> call,
+        /// checked periodically alongside the step budget (not on every step, to keep the happy path cheap).
+        /// Ships on with a generous default (<see cref="DefaultParseTimeout"/>); <see cref="TimeSpan.Zero"/> or
+        /// a negative value disables it. Same soft-stop behavior as <see cref="MaxParseSteps"/>.
+        /// </summary>
+        public TimeSpan ParseTimeout { get; set; }
+
         /// <summary>
         /// MaxUnapplications limits the number of unapplications to make it possible
         /// to make it possible to debug words that take 30 minutes to parse
@@ -128,11 +162,47 @@ public IEnumerable<Word> ParseWord(string word, out object trace)
         /// If there are no analyses and guessRoot is true, then guess the root.
         /// </summary>
         public IEnumerable<Word> ParseWord(string word, out object trace, bool guessRoot)
+        {
+            return ParseWord(word, out trace, guessRoot, out _);
+        }
+
+        /// <summary>
+        /// Parse the specified surface form, possibly tracing the parse. If there are no analyses and
+        /// guessRoot is true, then guess the root. <paramref name="diagnostics"/> reports whether
+        /// <see cref="MaxParseSteps"/>/<see cref="ParseTimeout"/> cut the parse short (soft-stop: the
+        /// returned sequence is whatever was found so far, never an exception).
+        /// </summary>
+        public IEnumerable<Word> ParseWord(string word, out object trace, bool guessRoot, out ParseDiagnostics diagnostics)
+        {
+            return ParseWordCore(word, out trace, guessRoot, collectRuleCounters: false, out diagnostics);
+        }
+
+        /// <summary>
+        /// Re-parses one word with per-rule application counters enabled and reports the top offenders —
+        /// "word X exceeded N steps; rule Y accounted for most of the applications". Intended for use only
+        /// after a breach is observed via the <see cref="ParseDiagnostics"/> overload: counters add overhead,
+        /// so they are never on during the normal happy path (see complexity-cap.md §3 "cheap happy path").
+        /// </summary>
+        public ParseDiagnostics RerunWithDiagnostics(string word, out IEnumerable<Word> results)
+        {
+            results = ParseWordCore(word, out _, false, collectRuleCounters: true, out ParseDiagnostics diagnostics);
+            return diagnostics;
+        }
+
+        private IEnumerable<Word> ParseWordCore(
+            string word,
+            out object trace,
+            bool guessRoot,
+            bool collectRuleCounters,
+            out ParseDiagnostics diagnostics
+        )
         {
             // convert the word to its phonetic shape
             Shape shape = _lang.SurfaceStratum.CharacterDefinitionTable.Segment(word);
 
             var input = new Word(_lang.SurfaceStratum, shape);
+            var parseContext = new ParseContext(MaxParseSteps, ParseTimeout, shape.Count, collectRuleCounters);
+            input.ParseContext = parseContext;
             input.Freeze();
             if (_traceManager.IsTracing)
                 _traceManager.AnalyzeWord(_lang, input);
@@ -177,11 +247,30 @@ public IEnumerable<Word> ParseWord(string word, out object trace, bool guessRoot
 
                 matches.Sort((x, y) => y.Morphs.Count().CompareTo(x.Morphs.Count()));
 
+                diagnostics = CreateParseDiagnostics(parseContext);
                 return matches;
             }
+            diagnostics = CreateParseDiagnostics(parseContext);
             return syntheses;
         }
 
+        private static ParseDiagnostics CreateParseDiagnostics(ParseContext parseContext)
+        {
+            if (!parseContext.Exhausted)
+                return ParseDiagnostics.None;
+
+            IReadOnlyList<(IHCRule Rule, int Applications)> topRules = null;
+            if (parseContext.DiagnosticsEnabled)
+            {
+                topRules = parseContext
+                    .RuleCounters.Select(kvp => (Rule: kvp.Key, Applications: kvp.Value))
+                    .OrderByDescending(t => t.Applications)
+                    .ToList();
+            }
+
+            return new ParseDiagnostics(true, parseContext.Reason, parseContext.StepsUsed, parseContext.Elapsed, topRules);
+        }
+
         /// <summary>
         /// Generates surface forms from the specified word synthesis information.
         /// </summary>
@@ -208,6 +297,7 @@ out object trace
             trace = rootTrace;
 
             var words = new ConcurrentBag<string>();
+            var parseContext = new ParseContext(MaxParseSteps, ParseTimeout, rootEntry.PrimaryAllomorph.Segments.Shape.Count);
 
             Exception exception = null;
             Parallel.ForEach(
@@ -220,12 +310,15 @@ out object trace
                 {
                     try
                     {
-                        var synthesisWord = new Word(synthesisInfo.Allomorph, realizationalFS);
+                        var synthesisWord = new Word(synthesisInfo.Allomorph, realizationalFS)
+                        {
+                            ParseContext = parseContext,
+                        };
                         foreach (Tuple<IMorphologicalRule, RootAllomorph> rule in synthesisInfo.RulePermutation)
                         {
                             synthesisWord.MorphologicalRuleUnapplied(rule.Item1);
                             if (rule.Item2 != null)
-                                synthesisWord.NonHeadUnapplied(new Word(rule.Item2, new FeatureStruct()));
+                                synthesisWord.NonHeadUnapplied(new Word(rule.Item2, new FeatureStruct()) { ParseContext = parseContext });
                         }
 
                         synthesisWord.CurrentTrace = rootTrace;
@@ -307,6 +400,8 @@ private IEnumerable<Word> Synthesize(string word, IList<Word> analyses)
                 var matches = new HashSet<Word>(FreezableEqualityComparer<Word>.Default);
                 foreach (Word analysisWord in analyses)
                 {
+                    if (analysisWord.ParseContext?.Exhausted == true)
+                        break;
                     foreach (Word validWord in SynthesizeAnalysis(word, analysisWord))
                         matches.Add(validWord);
                 }
@@ -342,6 +437,8 @@ private IEnumerable<Word> SynthesizeAnalysis(string word, Word analysisWord)
         {
             foreach (Word synthesisWord in LexicalLookup(analysisWord))
             {
+                if (synthesisWord.ParseContext?.Exhausted == true)
+                    yield break;
                 foreach (Word alternative in synthesisWord.ExpandAlternatives())
                 {
                     foreach (Word validWord in _synthesisRule.Apply(alternative).Where(IsWordValid))
@@ -371,6 +468,8 @@ LexEntry entry in SearchRootAllomorphs(input.Stratum, input.Shape)
                     .Distinct()
             )
             {
+                if (input.ParseContext?.Exhausted == true)
+                    yield break;
                 foreach (RootAllomorph allomorph in entry.Allomorphs)
                 {
                     Word newWord = input.Clone();
diff --git a/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisAffixProcessRule.cs b/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisAffixProcessRule.cs
index b9f6d4ac..7cca6fdf 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisAffixProcessRule.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisAffixProcessRule.cs
@@ -39,6 +39,9 @@ public AnalysisAffixProcessRule(Morpher morpher, AffixProcessRule rule)
 
         public IEnumerable<Word> Apply(Word input)
         {
+            if (input.ParseContext?.Step(_rule) == false)
+                return Enumerable.Empty<Word>();
+
             if (!_morpher.RuleSelector(_rule))
                 return Enumerable.Empty<Word>();
 
diff --git a/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisCompoundingRule.cs b/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisCompoundingRule.cs
index b5013d4e..e03b6cfe 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisCompoundingRule.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisCompoundingRule.cs
@@ -39,6 +39,9 @@ public AnalysisCompoundingRule(Morpher morpher, CompoundingRule rule)
 
         public IEnumerable<Word> Apply(Word input)
         {
+            if (input.ParseContext?.Step(_rule) == false)
+                return Enumerable.Empty<Word>();
+
             if (!_morpher.RuleSelector(_rule))
                 return Enumerable.Empty<Word>();
 
diff --git a/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisRealizationalAffixProcessRule.cs b/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisRealizationalAffixProcessRule.cs
index 031c6fba..e526682a 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisRealizationalAffixProcessRule.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisRealizationalAffixProcessRule.cs
@@ -39,6 +39,9 @@ public AnalysisRealizationalAffixProcessRule(Morpher morpher, RealizationalAffix
 
         public IEnumerable<Word> Apply(Word input)
         {
+            if (input.ParseContext?.Step(_rule) == false)
+                return Enumerable.Empty<Word>();
+
             if (!_morpher.RuleSelector(_rule))
                 return Enumerable.Empty<Word>();
 
diff --git a/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/SynthesisAffixProcessRule.cs b/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/SynthesisAffixProcessRule.cs
index 98a3895d..6537c0e9 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/SynthesisAffixProcessRule.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/SynthesisAffixProcessRule.cs
@@ -40,6 +40,9 @@ public SynthesisAffixProcessRule(Morpher morpher, AffixProcessRule rule)
 
         public IEnumerable<Word> Apply(Word input)
         {
+            if (input.ParseContext?.Step(_rule) == false)
+                return Enumerable.Empty<Word>();
+
             if (!input.IsMorphologicalRuleApplicable(_rule))
                 return Enumerable.Empty<Word>();
 
diff --git a/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/SynthesisCompoundingRule.cs b/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/SynthesisCompoundingRule.cs
index 29e3bd5f..4602321c 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/SynthesisCompoundingRule.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/SynthesisCompoundingRule.cs
@@ -44,6 +44,9 @@ private Matcher<Word, int> BuildMatcher(IEnumerable<Pattern<Word, int>> lhs)
 
         public IEnumerable<Word> Apply(Word input)
         {
+            if (input.ParseContext?.Step(_rule) == false)
+                return Enumerable.Empty<Word>();
+
             if (!input.IsMorphologicalRuleApplicable(_rule))
                 return Enumerable.Empty<Word>();
 
diff --git a/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/SynthesisRealizationalAffixProcessRule.cs b/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/SynthesisRealizationalAffixProcessRule.cs
index bd1717f8..ab45edd6 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/SynthesisRealizationalAffixProcessRule.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/SynthesisRealizationalAffixProcessRule.cs
@@ -40,6 +40,9 @@ public SynthesisRealizationalAffixProcessRule(Morpher morpher, RealizationalAffi
 
         public IEnumerable<Word> Apply(Word input)
         {
+            if (input.ParseContext?.Step(_rule) == false)
+                return Enumerable.Empty<Word>();
+
             if (!_morpher.RuleSelector(_rule))
                 return Enumerable.Empty<Word>();
 
diff --git a/src/SIL.Machine.Morphology.HermitCrab/ParseContext.cs b/src/SIL.Machine.Morphology.HermitCrab/ParseContext.cs
new file mode 100644
index 00000000..82731dde
--- /dev/null
+++ b/src/SIL.Machine.Morphology.HermitCrab/ParseContext.cs
@@ -0,0 +1,102 @@
+using System;
+using System.Collections.Concurrent;
+using System.Collections.Generic;
+using System.Diagnostics;
+using System.Threading;
+
+namespace SIL.Machine.Morphology.HermitCrab
+{
+    public enum ParseExhaustionReason
+    {
+        None,
+        StepBudget,
+        Timeout,
+    }
+
+    /// <summary>
+    /// Per-<see cref="Morpher.ParseWord(string, out object, bool, out ParseDiagnostics)"/> work budget,
+    /// referenced from every <see cref="Word"/> produced during that parse (propagated through
+    /// <see cref="Word"/>'s copy constructor exactly like <see cref="Word.CurrentTrace"/>). Compiled rule
+    /// objects are shared across concurrent parses, so this state cannot live on the rules or the
+    /// <see cref="Morpher"/> itself; it lives here instead and is threaded through the data.
+    /// </summary>
+    internal sealed class ParseContext
+    {
+        // Wall-clock is checked only every Nth step: Stopwatch reads are cheap but not free, and the
+        // budget's steady-state cost on the happy path must stay close to a single Interlocked increment.
+        private const int DeadlineCheckMask = 0xFF;
+
+        private readonly int _maxSteps;
+        private readonly long _timeoutTicks;
+        private readonly long _startTimestamp;
+        private readonly ConcurrentDictionary<IHCRule, int> _ruleCounters;
+        private int _steps;
+        private int _exhausted;
+        private ParseExhaustionReason _reason;
+
+        public ParseContext(int maxSteps, TimeSpan timeout, int surfaceLength, bool collectRuleCounters = false)
+        {
+            _maxSteps = maxSteps;
+            _timeoutTicks = timeout > TimeSpan.Zero ? (long)(timeout.TotalSeconds * Stopwatch.Frequency) : -1;
+            _startTimestamp = Stopwatch.GetTimestamp();
+            SurfaceLength = surfaceLength;
+            if (collectRuleCounters)
+                _ruleCounters = new ConcurrentDictionary<IHCRule, int>();
+        }
+
+        /// <summary>Length (in segments) of the surface shape being parsed; carrier for Layer 2's shape-growth cap.</summary>
+        public int SurfaceLength { get; }
+
+        public bool Exhausted => Volatile.Read(ref _exhausted) != 0;
+
+        public ParseExhaustionReason Reason => _reason;
+
+        public int StepsUsed => Volatile.Read(ref _steps);
+
+        public TimeSpan Elapsed =>
+            TimeSpan.FromSeconds((double)(Stopwatch.GetTimestamp() - _startTimestamp) / Stopwatch.Frequency);
+
+        public bool DiagnosticsEnabled => _ruleCounters != null;
+
+        public IReadOnlyDictionary<IHCRule, int> RuleCounters => _ruleCounters;
+
+        /// <summary>
+        /// Records one rule-application attempt. Returns false once the budget is gone; callers must
+        /// treat that as "no result" and unwind immediately (return <c>Enumerable.Empty&lt;Word&gt;()</c>),
+        /// never throw.
+        /// </summary>
+        public bool Step(IHCRule rule = null)
+        {
+            if (Exhausted)
+                return false;
+
+            if (rule != null && _ruleCounters != null)
+                _ruleCounters.AddOrUpdate(rule, 1, (_, count) => count + 1);
+
+            if (_maxSteps <= 0 && _timeoutTicks < 0)
+                return true;
+
+            int steps = Interlocked.Increment(ref _steps);
+            if (_maxSteps > 0 && steps >= _maxSteps)
+            {
+                MarkExhausted(ParseExhaustionReason.StepBudget);
+                return false;
+            }
+            if (_timeoutTicks >= 0 && (steps & DeadlineCheckMask) == 0)
+            {
+                if (Stopwatch.GetTimestamp() - _startTimestamp >= _timeoutTicks)
+                {
+                    MarkExhausted(ParseExhaustionReason.Timeout);
+                    return false;
+                }
+            }
+            return true;
+        }
+
+        private void MarkExhausted(ParseExhaustionReason reason)
+        {
+            if (Interlocked.CompareExchange(ref _exhausted, 1, 0) == 0)
+                _reason = reason;
+        }
+    }
+}
diff --git a/src/SIL.Machine.Morphology.HermitCrab/ParseDiagnostics.cs b/src/SIL.Machine.Morphology.HermitCrab/ParseDiagnostics.cs
new file mode 100644
index 00000000..a661e505
--- /dev/null
+++ b/src/SIL.Machine.Morphology.HermitCrab/ParseDiagnostics.cs
@@ -0,0 +1,47 @@
+using System;
+using System.Collections.Generic;
+
+namespace SIL.Machine.Morphology.HermitCrab
+{
+    /// <summary>
+    /// Reports whether <see cref="Morpher.MaxParseSteps"/>/<see cref="Morpher.ParseTimeout"/> cut a parse
+    /// short. A breach is a soft-stop: the parse still returns whatever analyses/syntheses it had found,
+    /// this just tells the caller the result may be incomplete rather than "no parse".
+    /// </summary>
+    public sealed class ParseDiagnostics
+    {
+        public static readonly ParseDiagnostics None = new ParseDiagnostics(
+            false,
+            ParseExhaustionReason.None,
+            0,
+            TimeSpan.Zero,
+            null
+        );
+
+        internal ParseDiagnostics(
+            bool budgetExhausted,
+            ParseExhaustionReason reason,
+            int stepsUsed,
+            TimeSpan elapsed,
+            IReadOnlyList<(IHCRule Rule, int Applications)> topRules
+        )
+        {
+            BudgetExhausted = budgetExhausted;
+            Reason = reason;
+            StepsUsed = stepsUsed;
+            Elapsed = elapsed;
+            TopRules = topRules ?? Array.Empty<(IHCRule Rule, int Applications)>();
+        }
+
+        public bool BudgetExhausted { get; }
+
+        public ParseExhaustionReason Reason { get; }
+
+        public int StepsUsed { get; }
+
+        public TimeSpan Elapsed { get; }
+
+        /// <summary>Populated only by <see cref="Morpher.RerunWithDiagnostics"/>.</summary>
+        public IReadOnlyList<(IHCRule Rule, int Applications)> TopRules { get; }
+    }
+}
diff --git a/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/AnalysisMetathesisRule.cs b/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/AnalysisMetathesisRule.cs
index 5d160243..7b26df7c 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/AnalysisMetathesisRule.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/AnalysisMetathesisRule.cs
@@ -37,6 +37,9 @@ public AnalysisMetathesisRule(Morpher morpher, MetathesisRule rule)
 
         public IEnumerable<Word> Apply(Word input)
         {
+            if (input.ParseContext?.Step(_rule) == false)
+                return Enumerable.Empty<Word>();
+
             if (!_morpher.RuleSelector(_rule))
                 return Enumerable.Empty<Word>();
 
diff --git a/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/AnalysisRewriteRule.cs b/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/AnalysisRewriteRule.cs
index e691b4c0..08a01a6c 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/AnalysisRewriteRule.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/AnalysisRewriteRule.cs
@@ -120,6 +120,9 @@ private static bool IsUnifiable(Constraint<Word, int> constraint, Pattern<Word,
 
         public IEnumerable<Word> Apply(Word input)
         {
+            if (input.ParseContext?.Step(_rule) == false)
+                return Enumerable.Empty<Word>();
+
             if (!_morpher.RuleSelector(_rule))
                 return Enumerable.Empty<Word>();
 
@@ -151,6 +154,10 @@ public IEnumerable<Word> Apply(Word input)
                                 j++;
                                 if (j > _morpher.DeletionReapplications)
                                     break;
+                                // Bounded by DeletionReapplications above, but that's a user-set knob with
+                                // no ceiling of its own — still gate each reapplication on the shared budget.
+                                if (input.ParseContext?.Step(_rule) == false)
+                                    break;
                                 data = sr.Item2.Apply(data).SingleOrDefault();
                             }
                         }
@@ -162,6 +169,10 @@ public IEnumerable<Word> Apply(Word input)
                             while (data != null)
                             {
                                 srApplied = true;
+                                // Unlike Deletion, this loop has no reapplication ceiling of its own (a
+                                // self-feeding rule can hypothesize forever) — the budget is the only bound.
+                                if (input.ParseContext?.Step(_rule) == false)
+                                    break;
                                 data = sr.Item2.Apply(data).SingleOrDefault();
                             }
                         }
diff --git a/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/SynthesisMetathesisRule.cs b/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/SynthesisMetathesisRule.cs
index 2d8c3af5..90a2b272 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/SynthesisMetathesisRule.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/SynthesisMetathesisRule.cs
@@ -34,6 +34,9 @@ public SynthesisMetathesisRule(Morpher morpher, MetathesisRule rule)
 
         public IEnumerable<Word> Apply(Word input)
         {
+            if (input.ParseContext?.Step(_rule) == false)
+                return Enumerable.Empty<Word>();
+
             if (!_morpher.RuleSelector(_rule))
                 return Enumerable.Empty<Word>();
 
diff --git a/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/SynthesisRewriteRule.cs b/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/SynthesisRewriteRule.cs
index ecf84a7d..e98bc98b 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/SynthesisRewriteRule.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/SynthesisRewriteRule.cs
@@ -50,6 +50,9 @@ public SynthesisRewriteRule(Morpher morpher, RewriteRule rule)
 
         public IEnumerable<Word> Apply(Word input)
         {
+            if (input.ParseContext?.Step(_rule) == false)
+                return Enumerable.Empty<Word>();
+
             if (!_morpher.RuleSelector(_rule))
                 return Enumerable.Empty<Word>();
 
diff --git a/src/SIL.Machine.Morphology.HermitCrab/SynthesisAffixTemplateRule.cs b/src/SIL.Machine.Morphology.HermitCrab/SynthesisAffixTemplateRule.cs
index 21248d00..7251ab11 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/SynthesisAffixTemplateRule.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/SynthesisAffixTemplateRule.cs
@@ -27,6 +27,9 @@ public SynthesisAffixTemplateRule(Morpher morpher, AffixTemplate template)
 
         public IEnumerable<Word> Apply(Word input)
         {
+            if (input.ParseContext?.Step(_template) == false)
+                return Enumerable.Empty<Word>();
+
             if (_morpher.TraceManager.IsTracing)
                 _morpher.TraceManager.BeginApplyTemplate(_template, input);
             var output = new HashSet<Word>(FreezableEqualityComparer<Word>.Default);
diff --git a/src/SIL.Machine.Morphology.HermitCrab/Word.cs b/src/SIL.Machine.Morphology.HermitCrab/Word.cs
index 96748875..95e8b320 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/Word.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/Word.cs
@@ -98,6 +98,7 @@ protected Word(Word word)
             _isLastAppliedRuleFinal = word._isLastAppliedRuleFinal;
             _isPartial = word._isPartial;
             CurrentTrace = word.CurrentTrace;
+            ParseContext = word.ParseContext;
             _disjunctiveAllomorphIndices =
                 word._disjunctiveAllomorphIndices == null || word._disjunctiveAllomorphIndices.Count == 0
                     ? null
@@ -226,6 +227,15 @@ public IEnumerable<Morpheme> MorphemesInApplicationOrder
 
         public object CurrentTrace { get; set; }
 
+        /// <summary>
+        /// Work budget for the parse this word is part of. Null for words never routed through
+        /// <see cref="Morpher.ParseWord(string, out object, bool, out ParseDiagnostics)"/> (e.g. words built
+        /// directly by rule-level unit tests), in which case budget checks are no-ops (unlimited).
+        /// Reference-shared like <see cref="CurrentTrace"/> — deliberately excluded from <see cref="FreezeImpl"/>
+        /// and <see cref="ValueEquals"/> so dedup semantics are unchanged.
+        /// </summary>
+        internal ParseContext ParseContext { get; set; }
+
         public bool IsPartial
         {
             get { return _isPartial; }
@@ -514,6 +524,7 @@ internal bool CheckBlocking(out Word word)
                     word = new Word(entry.PrimaryAllomorph, RealizationalFeatureStruct.Clone())
                     {
                         CurrentTrace = CurrentTrace,
+                        ParseContext = ParseContext,
                     };
                     word.Freeze();
                     return true;
diff --git a/tests/SIL.Machine.Morphology.HermitCrab.Tests/MorpherTests.cs b/tests/SIL.Machine.Morphology.HermitCrab.Tests/MorpherTests.cs
index 8245d17a..59879a89 100644
--- a/tests/SIL.Machine.Morphology.HermitCrab.Tests/MorpherTests.cs
+++ b/tests/SIL.Machine.Morphology.HermitCrab.Tests/MorpherTests.cs
@@ -543,6 +543,151 @@ public void AnalyzeWord_ConcurrentRepeatedParsing_IsDeterministic()
         }
     }
 
+    [Test]
+    public void ParseWord_DefaultBudget_DoesNotTripOnOrdinaryGrammar()
+    {
+        var any = FeatureStruct.New().Symbol(HCFeatureSystem.Segment).Value;
+        var edSuffix = new AffixProcessRule
+        {
+            Id = "PAST",
+            Name = "ed_suffix",
+            Gloss = "PAST",
+            RequiredSyntacticFeatureStruct = FeatureStruct.New(Language.SyntacticFeatureSystem).Symbol("V").Value,
+        };
+        edSuffix.Allomorphs.Add(
+            new AffixProcessAllomorph
+            {
+                Lhs = { Pattern<Word, int>.New("1").Annotation(any).OneOrMore.Value },
+                Rhs = { new CopyFromInput("1"), new InsertSegments(Table3, "+d") },
+            }
+        );
+        Morphophonemic.MorphologicalRules.Add(edSuffix);
+
+        var morpher = new Morpher(TraceManager, Language);
+        Assert.That(morpher.MaxParseSteps, Is.EqualTo(Morpher.DefaultMaxParseSteps));
+        Assert.That(morpher.ParseTimeout, Is.EqualTo(Morpher.DefaultParseTimeout));
+
+        IEnumerable<Word> results = morpher.ParseWord("sagd", out _, false, out ParseDiagnostics diagnostics).ToList();
+
+        Assert.That(results, Is.Not.Empty);
+        Assert.That(diagnostics.BudgetExhausted, Is.False);
+        Assert.That(diagnostics.Reason, Is.EqualTo(ParseExhaustionReason.None));
+    }
+
+    [Test]
+    public void ParseWord_StepBudgetExhausted_SoftStopsWithDiagnostics()
+    {
+        // A rule that keeps genuinely unapplying (each unapplication strips one distinct "+d"
+        // morph, so the cascade's own "input == output" infinite-loop guard never trips) with a
+        // MaxApplicationCount high enough that only the new step budget bounds it.
+        var any = FeatureStruct.New().Symbol(HCFeatureSystem.Segment).Value;
+        // No overt exponent: Rhs is a pure copy of the input, so every unapplication produces a
+        // Word with the identical Shape but one more entry in the morphological-rule-application
+        // list. The cascades' infinite-loop guard compares Words by ValueEquals (which includes
+        // that list), so it never trips here — only the new step budget bounds this.
+        var noExponentSuffix = new AffixProcessRule
+        {
+            Id = "REPEAT",
+            Name = "no_exponent_suffix",
+            Gloss = "REPEAT",
+            MaxApplicationCount = 1_000_000,
+            RequiredSyntacticFeatureStruct = FeatureStruct.New(Language.SyntacticFeatureSystem).Symbol("V").Value,
+        };
+        noExponentSuffix.Allomorphs.Add(
+            new AffixProcessAllomorph
+            {
+                Lhs = { Pattern<Word, int>.New("1").Annotation(any).OneOrMore.Value },
+                Rhs = { new CopyFromInput("1") },
+            }
+        );
+        Morphophonemic.MorphologicalRules.Add(noExponentSuffix);
+        SetRuleOrder(MorphologicalRuleOrder.Unordered);
+
+        var morpher = new Morpher(TraceManager, Language) { MaxParseSteps = 500, ParseTimeout = TimeSpan.Zero };
+
+        List<Word> results = morpher.ParseWord("sag", out _, false, out ParseDiagnostics diagnostics).ToList();
+
+        Assert.That(diagnostics.BudgetExhausted, Is.True);
+        Assert.That(diagnostics.Reason, Is.EqualTo(ParseExhaustionReason.StepBudget));
+        Assert.That(diagnostics.StepsUsed, Is.GreaterThanOrEqualTo(500));
+        // Soft-stop: never throws, and ParseWord itself must remain usable afterwards.
+        Assert.That(() => morpher.ParseWord("sagd", out _, false), Throws.Nothing);
+    }
+
+    [Test]
+    public void ParseWord_StepBudget_IsDeterministicSingleThreaded()
+    {
+        var any = FeatureStruct.New().Symbol(HCFeatureSystem.Segment).Value;
+        // No overt exponent: Rhs is a pure copy of the input, so every unapplication produces a
+        // Word with the identical Shape but one more entry in the morphological-rule-application
+        // list. The cascades' infinite-loop guard compares Words by ValueEquals (which includes
+        // that list), so it never trips here — only the new step budget bounds this.
+        var noExponentSuffix = new AffixProcessRule
+        {
+            Id = "REPEAT",
+            Name = "no_exponent_suffix",
+            Gloss = "REPEAT",
+            MaxApplicationCount = 1_000_000,
+            RequiredSyntacticFeatureStruct = FeatureStruct.New(Language.SyntacticFeatureSystem).Symbol("V").Value,
+        };
+        noExponentSuffix.Allomorphs.Add(
+            new AffixProcessAllomorph
+            {
+                Lhs = { Pattern<Word, int>.New("1").Annotation(any).OneOrMore.Value },
+                Rhs = { new CopyFromInput("1") },
+            }
+        );
+        Morphophonemic.MorphologicalRules.Add(noExponentSuffix);
+        SetRuleOrder(MorphologicalRuleOrder.Unordered);
+
+        var morpher = new Morpher(TraceManager, Language, maxDegreeOfParallelism: 1)
+        {
+            MaxParseSteps = 500,
+            ParseTimeout = TimeSpan.Zero,
+        };
+
+        morpher.ParseWord("sag", out _, false, out ParseDiagnostics first).ToList();
+        morpher.ParseWord("sag", out _, false, out ParseDiagnostics second).ToList();
+
+        Assert.That(first.StepsUsed, Is.EqualTo(second.StepsUsed));
+    }
+
+    [Test]
+    public void RerunWithDiagnostics_ReportsTopOffendingRule()
+    {
+        var any = FeatureStruct.New().Symbol(HCFeatureSystem.Segment).Value;
+        // No overt exponent: Rhs is a pure copy of the input, so every unapplication produces a
+        // Word with the identical Shape but one more entry in the morphological-rule-application
+        // list. The cascades' infinite-loop guard compares Words by ValueEquals (which includes
+        // that list), so it never trips here — only the new step budget bounds this.
+        var noExponentSuffix = new AffixProcessRule
+        {
+            Id = "REPEAT",
+            Name = "no_exponent_suffix",
+            Gloss = "REPEAT",
+            MaxApplicationCount = 1_000_000,
+            RequiredSyntacticFeatureStruct = FeatureStruct.New(Language.SyntacticFeatureSystem).Symbol("V").Value,
+        };
+        noExponentSuffix.Allomorphs.Add(
+            new AffixProcessAllomorph
+            {
+                Lhs = { Pattern<Word, int>.New("1").Annotation(any).OneOrMore.Value },
+                Rhs = { new CopyFromInput("1") },
+            }
+        );
+        Morphophonemic.MorphologicalRules.Add(noExponentSuffix);
+        SetRuleOrder(MorphologicalRuleOrder.Unordered);
+
+        var morpher = new Morpher(TraceManager, Language) { MaxParseSteps = 500, ParseTimeout = TimeSpan.Zero };
+
+        ParseDiagnostics diagnostics = morpher.RerunWithDiagnostics("sag", out IEnumerable<Word> results);
+        results.ToList();
+
+        Assert.That(diagnostics.BudgetExhausted, Is.True);
+        Assert.That(diagnostics.TopRules, Is.Not.Empty);
+        Assert.That(diagnostics.TopRules[0].Rule, Is.EqualTo(noExponentSuffix));
+    }
+
     private static string AnalysisSignature(Morpher morpher, string word)
     {
         return string.Join(

From e68f09844fb43de8759a408947c10b28e9e87817 Mon Sep 17 00:00:00 2001
From: John Lambert <john_lambert@sil.org>
Date: Thu, 2 Jul 2026 14:34:54 -0400
Subject: [PATCH 2/6] Complexity cap Phase 2: structural bounds (Layer 2)

Adds three additive, default-off caps that convert exponential blowups
into bounded ones instead of merely time-boxing them:

- Morpher.MaxRuleApplicationsPerWord: a running total-unapplications
  counter on Word (Word.TotalUnapplicationCount), checked alongside the
  existing per-rule MaxApplicationCount in the three affix/compounding
  analysis rules. Closes the "rule A -> B -> A -> B" loophole that a
  per-rule cap alone cannot catch.
- Morpher.MaxAnalysisShapeGrowth: prunes analysis candidates whose shape
  has grown past the surface form by more than N segments, checked at
  AnalysisStratumRule's output loop (the choke point - candidates pruned
  there never reach lexical lookup) and per-iteration inside
  AnalysisRewriteRule's Deletion/SelfOpaquing reapplication loops.
- PermutationRuleCascade.MaxDepth (SIL.Machine core, opt-in via a new
  property, -1/unlimited by default so existing consumers are
  unaffected): caps nested rule-reapplication depth, derived from
  MaxRuleApplicationsPerWord rather than a new knob, synced each Apply()
  call since the cap can be set via object-initializer syntax after the
  rule cascade is already compiled.

Verified against RewriteRuleTests.DeletionRules' real deletion-rule
grammar: capping MaxAnalysisShapeGrowth excludes the deep-reinsertion
analysis while the shallow ones survive as a strict subset of the
uncapped result.
---
 .../AnalysisStratumRule.cs                    |  28 ++++-
 .../Morpher.cs                                |  43 +++++++-
 .../AnalysisAffixProcessRule.cs               |   4 +
 .../AnalysisCompoundingRule.cs                |   4 +
 .../AnalysisRealizationalAffixProcessRule.cs  |   8 ++
 .../PhonologicalRules/AnalysisRewriteRule.cs  |  14 ++-
 src/SIL.Machine.Morphology.HermitCrab/Word.cs |   9 ++
 .../Rules/PermutationRuleCascade.cs           |  17 ++-
 .../MorpherTests.cs                           | 104 ++++++++++++++++++
 9 files changed, 219 insertions(+), 12 deletions(-)

diff --git a/src/SIL.Machine.Morphology.HermitCrab/AnalysisStratumRule.cs b/src/SIL.Machine.Morphology.HermitCrab/AnalysisStratumRule.cs
index 3ee2b95b..6cda018e 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/AnalysisStratumRule.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/AnalysisStratumRule.cs
@@ -11,6 +11,7 @@ namespace SIL.Machine.Morphology.HermitCrab
     internal class AnalysisStratumRule : IRule<Word, int>
     {
         private readonly IRule<Word, int> _mrulesRule;
+        private readonly PermutationRuleCascade<Word, int> _permutationCascade;
         private readonly IRule<Word, int> _prulesRule;
         private readonly IRule<Word, int> _templatesRule;
         private readonly Stratum _stratum;
@@ -39,11 +40,12 @@ public AnalysisStratumRule(Morpher morpher, Stratum stratum)
                     // because morphological rules should be considered optional
                     // during unapplication (they are obligatory during application,
                     // but we don't know they have been applied during unapplication).
-                    _mrulesRule = new PermutationRuleCascade<Word, int>(
+                    _permutationCascade = new PermutationRuleCascade<Word, int>(
                         mrules,
                         true,
                         FreezableEqualityComparer<Word>.Default
                     );
+                    _mrulesRule = _permutationCascade;
                     break;
                 case MorphologicalRuleOrder.Unordered:
                     // Single-threaded when the caller caps within-word parallelism (e.g. it
@@ -106,8 +108,24 @@ private IRule<Word, int> CompilePhonologicalRule(IPhonologicalRule prule, Morphe
             }
         }
 
+        private bool ExceedsShapeGrowth(Word word)
+        {
+            return _morpher.MaxAnalysisShapeGrowth >= 0
+                && word.ParseContext != null
+                && word.Shape.Count > word.ParseContext.SurfaceLength + _morpher.MaxAnalysisShapeGrowth;
+        }
+
         public IEnumerable<Word> Apply(Word input)
         {
+            // Re-synced on every call rather than baked in at compile time: MaxRuleApplicationsPerWord
+            // is a mutable Morpher property that callers set via object-initializer syntax after
+            // construction (the same pattern MaxParseSteps/ParseTimeout use), which runs after this
+            // rule was already compiled. No new knob per complexity-cap.md §5.3 — derived from the
+            // existing per-word unapplication cap (0/unlimited maps to no depth limit).
+            if (_permutationCascade != null)
+                _permutationCascade.MaxDepth =
+                    _morpher.MaxRuleApplicationsPerWord > 0 ? _morpher.MaxRuleApplicationsPerWord : -1;
+
             if (_morpher.TraceManager.IsTracing)
                 _morpher.TraceManager.BeginUnapplyStratum(_stratum, input);
 
@@ -137,6 +155,14 @@ public IEnumerable<Word> Apply(Word input)
                 if (input.ParseContext?.Exhausted == true)
                     break;
 
+                // Prune candidates whose hypothesized underlying shape has grown too far past the
+                // surface form — the truly unbounded generator (undone deletions, empty exponents).
+                // Pruned here so they never reach lexical lookup or the next stratum.
+                if (ExceedsShapeGrowth(mruleOutWord))
+                {
+                    continue;
+                }
+
                 // Skip intermediate sources from phonological rules, templates, and morphological rules.
                 mruleOutWord.Source = origInput;
                 if (mergeEquivalentAnalyses)
diff --git a/src/SIL.Machine.Morphology.HermitCrab/Morpher.cs b/src/SIL.Machine.Morphology.HermitCrab/Morpher.cs
index da9ad1c0..5a8630c9 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/Morpher.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/Morpher.cs
@@ -117,6 +117,24 @@ public ITraceManager TraceManager
         /// </summary>
         public TimeSpan ParseTimeout { get; set; }
 
+        /// <summary>
+        /// Max total morphological-rule unapplications per analysis candidate (≈ max affixes per
+        /// word), checked across <em>all</em> rules combined — closes the loophole where a per-rule
+        /// application cap never trips because no single rule repeats (e.g. rule A unapplies, then B,
+        /// then A again). 0 = unlimited
+        /// (default: some legitimate agglutinative grammars have long affix chains, so this is off by
+        /// default in the library; FieldWorks is expected to opt into a conservative value).
+        /// </summary>
+        public int MaxRuleApplicationsPerWord { get; set; }
+
+        /// <summary>
+        /// Prunes any analysis candidate whose shape exceeds the surface form's length by more than
+        /// this many segments — the one truly unbounded generator, where unapplication hypothesizes
+        /// deleted/epenthesized material and keeps making the underlying form longer. -1 = unlimited
+        /// (default, preserves existing behavior).
+        /// </summary>
+        public int MaxAnalysisShapeGrowth { get; set; } = -1;
+
         /// <summary>
         /// MaxUnapplications limits the number of unapplications to make it possible
         /// to make it possible to debug words that take 30 minutes to parse
@@ -172,7 +190,12 @@ public IEnumerable<Word> ParseWord(string word, out object trace, bool guessRoot
         /// <see cref="MaxParseSteps"/>/<see cref="ParseTimeout"/> cut the parse short (soft-stop: the
         /// returned sequence is whatever was found so far, never an exception).
         /// </summary>
-        public IEnumerable<Word> ParseWord(string word, out object trace, bool guessRoot, out ParseDiagnostics diagnostics)
+        public IEnumerable<Word> ParseWord(
+            string word,
+            out object trace,
+            bool guessRoot,
+            out ParseDiagnostics diagnostics
+        )
         {
             return ParseWordCore(word, out trace, guessRoot, collectRuleCounters: false, out diagnostics);
         }
@@ -268,7 +291,13 @@ private static ParseDiagnostics CreateParseDiagnostics(ParseContext parseContext
                     .ToList();
             }
 
-            return new ParseDiagnostics(true, parseContext.Reason, parseContext.StepsUsed, parseContext.Elapsed, topRules);
+            return new ParseDiagnostics(
+                true,
+                parseContext.Reason,
+                parseContext.StepsUsed,
+                parseContext.Elapsed,
+                topRules
+            );
         }
 
         /// <summary>
@@ -297,7 +326,11 @@ out object trace
             trace = rootTrace;
 
             var words = new ConcurrentBag<string>();
-            var parseContext = new ParseContext(MaxParseSteps, ParseTimeout, rootEntry.PrimaryAllomorph.Segments.Shape.Count);
+            var parseContext = new ParseContext(
+                MaxParseSteps,
+                ParseTimeout,
+                rootEntry.PrimaryAllomorph.Segments.Shape.Count
+            );
 
             Exception exception = null;
             Parallel.ForEach(
@@ -318,7 +351,9 @@ out object trace
                         {
                             synthesisWord.MorphologicalRuleUnapplied(rule.Item1);
                             if (rule.Item2 != null)
-                                synthesisWord.NonHeadUnapplied(new Word(rule.Item2, new FeatureStruct()) { ParseContext = parseContext });
+                                synthesisWord.NonHeadUnapplied(
+                                    new Word(rule.Item2, new FeatureStruct()) { ParseContext = parseContext }
+                                );
                         }
 
                         synthesisWord.CurrentTrace = rootTrace;
diff --git a/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisAffixProcessRule.cs b/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisAffixProcessRule.cs
index 7cca6fdf..067e23ad 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisAffixProcessRule.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisAffixProcessRule.cs
@@ -47,6 +47,10 @@ public IEnumerable<Word> Apply(Word input)
 
             if (
                 input.GetUnapplicationCount(_rule) >= _rule.MaxApplicationCount
+                || (
+                    _morpher.MaxRuleApplicationsPerWord > 0
+                    && input.TotalUnapplicationCount >= _morpher.MaxRuleApplicationsPerWord
+                )
                 || !_rule.OutSyntacticFeatureStruct.IsUnifiable(input.SyntacticFeatureStruct)
             )
             {
diff --git a/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisCompoundingRule.cs b/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisCompoundingRule.cs
index e03b6cfe..f9874c26 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisCompoundingRule.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisCompoundingRule.cs
@@ -48,6 +48,10 @@ public IEnumerable<Word> Apply(Word input)
             if (
                 input.NonHeadCount + 1 >= _morpher.MaxStemCount
                 || input.GetUnapplicationCount(_rule) >= _rule.MaxApplicationCount
+                || (
+                    _morpher.MaxRuleApplicationsPerWord > 0
+                    && input.TotalUnapplicationCount >= _morpher.MaxRuleApplicationsPerWord
+                )
                 || !_rule.OutSyntacticFeatureStruct.IsUnifiable(input.SyntacticFeatureStruct)
             )
             {
diff --git a/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisRealizationalAffixProcessRule.cs b/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisRealizationalAffixProcessRule.cs
index e526682a..a03b9379 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisRealizationalAffixProcessRule.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/MorphologicalRules/AnalysisRealizationalAffixProcessRule.cs
@@ -45,6 +45,14 @@ public IEnumerable<Word> Apply(Word input)
             if (!_morpher.RuleSelector(_rule))
                 return Enumerable.Empty<Word>();
 
+            if (
+                _morpher.MaxRuleApplicationsPerWord > 0
+                && input.TotalUnapplicationCount >= _morpher.MaxRuleApplicationsPerWord
+            )
+            {
+                return Enumerable.Empty<Word>();
+            }
+
             FeatureStruct realFS;
             if (!_rule.RealizationalFeatureStruct.Unify(input.RealizationalFeatureStruct, out realFS))
                 return Enumerable.Empty<Word>();
diff --git a/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/AnalysisRewriteRule.cs b/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/AnalysisRewriteRule.cs
index 08a01a6c..ae9bbe4e 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/AnalysisRewriteRule.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/AnalysisRewriteRule.cs
@@ -118,6 +118,13 @@ private static bool IsUnifiable(Constraint<Word, int> constraint, Pattern<Word,
             return true;
         }
 
+        private bool ExceedsShapeGrowth(Word data)
+        {
+            return _morpher.MaxAnalysisShapeGrowth >= 0
+                && data.ParseContext != null
+                && data.Shape.Count > data.ParseContext.SurfaceLength + _morpher.MaxAnalysisShapeGrowth;
+        }
+
         public IEnumerable<Word> Apply(Word input)
         {
             if (input.ParseContext?.Step(_rule) == false)
@@ -156,7 +163,7 @@ public IEnumerable<Word> Apply(Word input)
                                     break;
                                 // Bounded by DeletionReapplications above, but that's a user-set knob with
                                 // no ceiling of its own — still gate each reapplication on the shared budget.
-                                if (input.ParseContext?.Step(_rule) == false)
+                                if (input.ParseContext?.Step(_rule) == false || ExceedsShapeGrowth(data))
                                     break;
                                 data = sr.Item2.Apply(data).SingleOrDefault();
                             }
@@ -170,8 +177,9 @@ public IEnumerable<Word> Apply(Word input)
                             {
                                 srApplied = true;
                                 // Unlike Deletion, this loop has no reapplication ceiling of its own (a
-                                // self-feeding rule can hypothesize forever) — the budget is the only bound.
-                                if (input.ParseContext?.Step(_rule) == false)
+                                // self-feeding rule can hypothesize forever) — the budget and shape-growth
+                                // cap are the only bounds.
+                                if (input.ParseContext?.Step(_rule) == false || ExceedsShapeGrowth(data))
                                     break;
                                 data = sr.Item2.Apply(data).SingleOrDefault();
                             }
diff --git a/src/SIL.Machine.Morphology.HermitCrab/Word.cs b/src/SIL.Machine.Morphology.HermitCrab/Word.cs
index 95e8b320..09321541 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/Word.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/Word.cs
@@ -35,6 +35,7 @@ public class Word : Freezable<Word>, IAnnotatedData<int>, ICloneable<Word>
         private bool _isPartial;
         private Dictionary<string, HashSet<int>> _disjunctiveAllomorphIndices; // lazily allocated (see above)
         private int _mruleAppCount = 0;
+        private int _totalUnapplicationCount = 0;
         private readonly IList<Word> _alternatives = new List<Word>();
 
         public Word(RootAllomorph rootAllomorph, FeatureStruct realizationalFS)
@@ -107,6 +108,7 @@ protected Word(Word word)
                         kvp => new HashSet<int>(kvp.Value)
                     );
             _mruleAppCount = word._mruleAppCount;
+            _totalUnapplicationCount = word._totalUnapplicationCount;
         }
 
         public IEnumerable<Annotation<ShapeNode>> Morphs
@@ -253,6 +255,12 @@ public IEnumerable<IMorphologicalRule> MorphologicalRules
 
         internal int MorphologicalRuleApplicationCount => _mruleAppCount;
 
+        /// <summary>
+        /// Total morphological-rule unapplications on this analysis candidate, across all rules
+        /// combined. Carrier for <see cref="Morpher.MaxRuleApplicationsPerWord"/>.
+        /// </summary>
+        internal int TotalUnapplicationCount => _totalUnapplicationCount;
+
         internal bool IsAllMorphologicalRulesApplied
         {
             get { return _mruleAppIndex == -1; }
@@ -341,6 +349,7 @@ internal void RemoveMorph(Annotation<ShapeNode> morphAnn)
         internal void MorphologicalRuleUnapplied(IMorphologicalRule mrule)
         {
             CheckFrozen();
+            _totalUnapplicationCount++;
             if (mrule != null)
                 (_mrulesUnapplied = _mrulesUnapplied ?? new Dictionary<IMorphologicalRule, int>()).UpdateValue(
                     mrule,
diff --git a/src/SIL.Machine/Rules/PermutationRuleCascade.cs b/src/SIL.Machine/Rules/PermutationRuleCascade.cs
index b16671f4..3b82e449 100644
--- a/src/SIL.Machine/Rules/PermutationRuleCascade.cs
+++ b/src/SIL.Machine/Rules/PermutationRuleCascade.cs
@@ -22,22 +22,31 @@ IEqualityComparer<TData> comparer
         )
             : base(rules, multiApp, comparer) { }
 
+        /// <summary>
+        /// Caps how many nested rule (re-)applications a single branch may descend through, on top of
+        /// the base class's input==output infinite-loop guard (which a rule whose output never exactly
+        /// repeats its input — e.g. one that keeps growing the shape — sails past). -1 = unlimited, the
+        /// default, so existing consumers see no behavior change.
+        /// </summary>
+        public int MaxDepth { get; set; } = -1;
+
         public override IEnumerable<TData> Apply(TData input)
         {
             var output = new HashSet<TData>(Comparer);
-            ApplyRules(input, 0, output);
+            ApplyRules(input, 0, 0, output);
             return output;
         }
 
-        private void ApplyRules(TData input, int ruleIndex, HashSet<TData> output)
+        private void ApplyRules(TData input, int ruleIndex, int depth, HashSet<TData> output)
         {
+            bool descend = MaxDepth < 0 || depth < MaxDepth;
             for (int i = ruleIndex; i < Rules.Count; i++)
             {
                 foreach (TData result in ApplyRule(Rules[i], i, input))
                 {
                     // avoid infinite loop
-                    if (!MultipleApplication || !Comparer.Equals(input, result))
-                        ApplyRules(result, MultipleApplication ? i : i + 1, output);
+                    if (descend && (!MultipleApplication || !Comparer.Equals(input, result)))
+                        ApplyRules(result, MultipleApplication ? i : i + 1, depth + 1, output);
                     output.Add(result);
                 }
             }
diff --git a/tests/SIL.Machine.Morphology.HermitCrab.Tests/MorpherTests.cs b/tests/SIL.Machine.Morphology.HermitCrab.Tests/MorpherTests.cs
index 59879a89..1a3f49b6 100644
--- a/tests/SIL.Machine.Morphology.HermitCrab.Tests/MorpherTests.cs
+++ b/tests/SIL.Machine.Morphology.HermitCrab.Tests/MorpherTests.cs
@@ -688,6 +688,110 @@ public void RerunWithDiagnostics_ReportsTopOffendingRule()
         Assert.That(diagnostics.TopRules[0].Rule, Is.EqualTo(noExponentSuffix));
     }
 
+    [Test]
+    public void ParseWord_MaxRuleApplicationsPerWord_BoundsTotalAcrossRules()
+    {
+        // Same no-overt-exponent shape as the step-budget test, but bounded via the total-
+        // unapplications cap instead of the step budget: closes the "even if separated" loophole
+        // that per-rule MaxApplicationCount alone cannot (see complexity-cap.md §5.1).
+        var any = FeatureStruct.New().Symbol(HCFeatureSystem.Segment).Value;
+        var noExponentSuffix = new AffixProcessRule
+        {
+            Id = "REPEAT",
+            Name = "no_exponent_suffix",
+            Gloss = "REPEAT",
+            MaxApplicationCount = 1_000_000,
+            RequiredSyntacticFeatureStruct = FeatureStruct.New(Language.SyntacticFeatureSystem).Symbol("V").Value,
+        };
+        noExponentSuffix.Allomorphs.Add(
+            new AffixProcessAllomorph
+            {
+                Lhs = { Pattern<Word, int>.New("1").Annotation(any).OneOrMore.Value },
+                Rhs = { new CopyFromInput("1") },
+            }
+        );
+        Morphophonemic.MorphologicalRules.Add(noExponentSuffix);
+        SetRuleOrder(MorphologicalRuleOrder.Unordered);
+
+        // No step/timeout budget here — MaxRuleApplicationsPerWord alone must terminate the parse.
+        var morpher = new Morpher(TraceManager, Language)
+        {
+            MaxParseSteps = 0,
+            ParseTimeout = TimeSpan.Zero,
+            MaxRuleApplicationsPerWord = 10,
+        };
+
+        List<Word> results = morpher.ParseWord("sag", out _, false, out ParseDiagnostics diagnostics).ToList();
+
+        Assert.That(
+            diagnostics.BudgetExhausted,
+            Is.False,
+            "MaxRuleApplicationsPerWord is not a ParseDiagnostics-reported budget"
+        );
+        Assert.That(results.All(w => w.TotalUnapplicationCount <= 10), Is.True);
+    }
+
+    [Test]
+    public void ParseWord_MaxAnalysisShapeGrowth_PrunesDeepReinsertion()
+    {
+        // Reuses the DeletionRules rule4 shape (RewriteRuleTests.DeletionRules): deleting a high
+        // front unrounded vowel ("i") after a high vowel, so analysis can hypothesize progressively
+        // more deleted "i"s to the left, growing the shape well past the surface form.
+        var highFrontUnrndVowel = FeatureStruct
+            .New(Language.PhonologicalFeatureSystem)
+            .Symbol(HCFeatureSystem.Segment)
+            .Symbol("cons-")
+            .Symbol("voc+")
+            .Symbol("high+")
+            .Symbol("low-")
+            .Symbol("back-")
+            .Symbol("round-")
+            .Value;
+        var highVowel = FeatureStruct
+            .New(Language.PhonologicalFeatureSystem)
+            .Symbol(HCFeatureSystem.Segment)
+            .Symbol("cons-")
+            .Symbol("voc+")
+            .Symbol("high+")
+            .Value;
+
+        var rule4 = new RewriteRule
+        {
+            Name = "rule4",
+            Lhs = Pattern<Word, int>.New().Annotation(highFrontUnrndVowel).Value,
+        };
+        Allophonic.PhonologicalRules.Add(rule4);
+        rule4.Subrules.Add(
+            new RewriteSubrule { LeftEnvironment = Pattern<Word, int>.New().Annotation(highVowel).Value }
+        );
+
+        // Unbounded (default): matches the existing DeletionRules precedent exactly (RewriteRuleTests.
+        // DeletionRules) — deep reinsertion morph "27" ("buiibuii", 8 segments vs. surface "bubu"'s 4)
+        // is reachable.
+        var uncapped = new Morpher(TraceManager, Language) { DeletionReapplications = 1 };
+        List<Word> uncappedResults = uncapped.ParseWord("bubu", out _, false).ToList();
+        AssertMorphsEqual(uncappedResults, "24", "25", "26", "27", "19");
+
+        // Capped tightly enough that the deepest reinsertion chain cannot complete: the result set
+        // must shrink (never grow) relative to uncapped, and every remaining candidate's *analysis*
+        // step count must be no larger than the uncapped run's (the cap can only prune work, not add
+        // any) — without hard-coding which exact morphs the pruning walks away, since the interaction
+        // between DeletionReapplications' reapplication loop and Simultaneous-mode multi-site matching
+        // is intricate enough that pinning exact morph identities here would be over-fitting to
+        // incidental engine internals rather than the behavior this cap actually promises.
+        var capped = new Morpher(TraceManager, Language) { DeletionReapplications = 1, MaxAnalysisShapeGrowth = 0 };
+        List<Word> cappedResults = capped.ParseWord("bubu", out _, false).ToList();
+        Assert.That(
+            cappedResults.Select(w => string.Join("+", w.AllomorphsInMorphOrder.Select(a => a.Morpheme.Id))),
+            Is.SubsetOf(
+                uncappedResults.Select(w => string.Join("+", w.AllomorphsInMorphOrder.Select(a => a.Morpheme.Id)))
+            )
+        );
+        // The maximally-grown morph ("27", which needs the underlying shape to grow by 4 segments)
+        // must not survive a cap of 0 (no growth allowed at all).
+        Assert.That(cappedResults.Any(w => w.AllomorphsInMorphOrder.Any(a => a.Morpheme.Id == "27")), Is.False);
+    }
+
     private static string AnalysisSignature(Morpher morpher, string word)
     {
         return string.Join(

From c8a39aeb16dcc783b90ef8e80a48ce74f0a57a0b Mon Sep 17 00:00:00 2001
From: John Lambert <john_lambert@sil.org>
Date: Thu, 2 Jul 2026 16:16:35 -0400
Subject: [PATCH 3/6] Complexity cap Phase 3: static grammar lint (Layer 3) +
 calibration honesty pass

Adds GrammarAnalyzer, a static analyzer over a loaded Language that flags
always/almost-always-wrong rule shapes with stable diagnostic codes
(HC0001-HC0008: no-overt-exponent affix rules, unbounded
multipleApplication, self-feeding epenthesis/deletion rules,
unconstrained compounding, optional-iterative lexical patterns, cyclic
feeding pairs). Wired into the hc CLI as a new `hc lint` command, plus a
`hc parse --diagnose` flag that surfaces RerunWithDiagnostics' top
offending rules for a single word - the empirical companion to the
static lint. Both are documented in a new
docs/hermitcrab-grammar-performance.md guide organized by HC code.

While shaping HC0004's self-feeding check, deduped the "does this rule's
output unify with its own required environment" logic shared between
AnalysisRewriteRule and GrammarAnalyzer into a single
IsUnifiableWithEnvironment extension, and found/fixed a real gap: the
lint only covered one of two engine paths that select self-opaquing
behavior, silently missing the epenthesis case (unconditionally
dangerous in Simultaneous mode). Also fixed a pre-existing HC0007
condition that required Optional *and* IsIterative on adjacent lexical
pattern nodes, when the design doc's own canonical example
(([Seg])([Seg])) is two plain-optional (non-iterative) groups - the
check now matches the documented intent.

Ran the real Phase 0 calibration corpus (indonesian/sena) against the
rustify engine and replaced the Phase 1 doc comment's fabricated
"~13,600 steps" figure with real numbers: Indonesian's worst word takes
10,445 steps (flat ~10-rule combinatorial interaction, not one bad
rule); Sena's worst sampled word takes 14.9M steps/105s from only a ~1%
corpus sample, and a separate real word was previously being truncated
by the old 10s default timeout at 99,584 steps. Raised
DefaultMaxParseSteps to 50,000,000 and DefaultParseTimeout to 30s
accordingly, and documented in complexity-cap.md (with two new "still
open" items) that the Sena figures are a floor pending a full-corpus
re-baseline, and that the timeout is a genuine truncation/latency
tradeoff rather than a pure safety margin.

82/82 HermitCrab tests pass; both projects build clean; csharpier clean.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
---
 complexity-cap.md                             |  62 +++-
 docs/hermitcrab-grammar-performance.md        | 115 ++++++
 .../LintCommand.cs                            |  68 ++++
 .../ParseCommand.cs                           |  57 ++-
 .../Program.cs                                |   1 +
 .../GrammarAnalyzer.cs                        | 349 ++++++++++++++++++
 .../HermitCrabExtensions.cs                   |  28 ++
 .../Morpher.cs                                |  35 +-
 .../ParseContext.cs                           |   6 +-
 .../ParseDiagnostics.cs                       |   8 -
 .../PhonologicalRules/AnalysisRewriteRule.cs  |  20 +-
 .../ComplexityCapCorpusTests.cs               | 253 +++++++++++++
 .../GrammarAnalyzerTests.cs                   | 337 +++++++++++++++++
 13 files changed, 1287 insertions(+), 52 deletions(-)
 create mode 100644 docs/hermitcrab-grammar-performance.md
 create mode 100644 src/SIL.Machine.Morphology.HermitCrab.Tool/LintCommand.cs
 create mode 100644 src/SIL.Machine.Morphology.HermitCrab/GrammarAnalyzer.cs
 create mode 100644 tests/SIL.Machine.Morphology.HermitCrab.Tests/ComplexityCapCorpusTests.cs
 create mode 100644 tests/SIL.Machine.Morphology.HermitCrab.Tests/GrammarAnalyzerTests.cs

diff --git a/complexity-cap.md b/complexity-cap.md
index 1701219e..75fe21ae 100644
--- a/complexity-cap.md
+++ b/complexity-cap.md
@@ -92,15 +92,49 @@ the exact failure mode this plan exists to fix — an unbounded parse — remain
 out-of-the-box behavior. A generous cap that never fires for legitimate grammars but
 reliably kills runaway ones is strictly better than silence.
 
-Concrete numbers are calibrated in Phase 0 against the real corpus (§7), not guessed
-here, but the target shape is: run every word in `indonesian-words.txt` (121 words) and
-`sena-words.txt` (7,121 words) against their respective grammars on the rustify engine,
-take the observed max step count / max wall-clock time across that legitimate corpus,
-and set the default to a large multiple of that ceiling (e.g. 50–100×) so it is
-effectively invisible for real grammars but still finite. `ParseTimeout` defaults
-similarly, e.g. a flat few seconds per word — generous for interactive/FLEx single-word
-parses, still bounded for "Parse All Words" batches where one stuck word must not stall
-the run indefinitely.
+**Calibration results (2026-07-02, partial — see caveat below):** running the real
+corpus against the rustify engine shows legitimate cost varies by roughly **1000x**
+between the two grammars, which broke the original "large multiple of Indonesian's
+ceiling" plan:
+
+- `indonesian-words.txt` (121 words, 2,563-line grammar): worst observed word
+  (`mengamat-amati`, a reduplicated compound) took **10,445 steps**. `hc lint` reports
+  this grammar as clean (2 `HC0006` warnings, 0 errors), and `RerunWithDiagnostics`
+  shows the cost is a **flat distribution across ~10 rules** (~6.5% each) — legitimate
+  combinatorial interaction from compounding + reduplication, not one bad rule.
+- `sena-words.txt` (7,121 words, 33,091-line grammar): also lint-clean, but far more
+  expensive per word. The worst word sampled so far, `atawirambo`, took **14,905,517
+  steps / 105.3 seconds** — a successful, legitimate parse with the same flat
+  multi-rule-interaction shape (Sena's agglutinative verb morphology stacks many
+  candidate subject/tense/object affix slots). A separate word, `ndinakupangani`, hit
+  the (now-superseded) 10-second default timeout at only 99,584 steps, i.e. a real word
+  was previously getting truncated by the shipped default.
+- Because a full Sena run takes hours (many individual words take 10s–100+ seconds),
+  **only ~1% of the Sena corpus (72/7,121 words) has actually been sampled.** The
+  14,905,517-step figure is the worst seen so far, not a proven ceiling — this should be
+  re-baselined against the full corpus before the shipped defaults are treated as final.
+
+Given this, the "50–100x a single grammar's ceiling" heuristic below doesn't transfer
+across grammars of very different size/complexity — a multiplier calibrated on
+Indonesian would be irrelevant to Sena's scale, and a multiplier large enough for Sena
+would be absurd for Indonesian. Shipped defaults (`Morpher.DefaultMaxParseSteps` =
+50,000,000, `Morpher.DefaultParseTimeout` = 30s) are instead set with headroom above the
+largest *legitimate* word observed so far across both grammars, on the expectation that
+`ParseTimeout` — not the step count — is what actually trips for slow-but-legitimate
+words in practice, since step cost and wall-clock time track closely (~140k steps/sec
+observed on Sena). The step budget mainly exists to catch algorithmically-cheap infinite
+loops, which are cheap enough per step to blow past millions of steps in a fraction of a
+second regardless of a grammar's normal cost profile. See the doc comments on those two
+constants in `Morpher.cs` for the full reasoning. Note the timeout is a genuine, openly
+acknowledged tradeoff, not just a safety margin: at 30s it will still occasionally
+truncate an expensive-but-legitimate Sena word; raising it protects those words at the
+cost of a slower worst-case "Parse All Words" batch.
+
+Original target shape (superseded by the above, kept for history): run every word in
+`indonesian-words.txt` and `sena-words.txt` against their respective grammars, take the
+observed max step count / max wall-clock time across that legitimate corpus, and set the
+default to a large multiple of that ceiling (e.g. 50–100×) so it is effectively invisible
+for real grammars but still finite.
 
 ### 4.2 Per-parse context, propagated like `CurrentTrace`
 
@@ -408,3 +442,13 @@ across rustify's 100-file rewrite is not. Concretely:
 6. **HC0004/HC0008 precision**: self-feeding/cycle detection via unification is
    approximate; acceptable false-positive rate for a Warning? Start conservative
    (high-confidence patterns only), widen with field feedback.
+7. **Sena calibration is based on a ~1% sample (72/7,121 words)**, not a full corpus run
+   (see §4.1) — the worst-observed-word figures used to set `DefaultMaxParseSteps`/
+   `DefaultParseTimeout` are a floor, not a proven ceiling. Re-baseline against the full
+   corpus (accept the multi-hour run, or parallelize it) before treating these as final,
+   and specifically check whether any word exceeds the current 50,000,000-step default.
+8. **`DefaultParseTimeout` = 30s will still truncate some legitimate Sena words** (one
+   observed at 105s). Whether 30s is the right number — vs. a larger default, vs. no
+   default timeout with only a step budget, vs. a per-consumer-tunable-only knob with no
+   shipped default at all — is a real product decision that needs field input, not
+   something this investigation can resolve alone.
diff --git a/docs/hermitcrab-grammar-performance.md b/docs/hermitcrab-grammar-performance.md
new file mode 100644
index 00000000..8553bd47
--- /dev/null
+++ b/docs/hermitcrab-grammar-performance.md
@@ -0,0 +1,115 @@
+# Writing performant HermitCrab grammars
+
+HermitCrab's engine speedups (see the `hc-rustify` work) and its complexity-cap safety net
+(`complexity-cap.md`) both help pathological grammars fail *safely* — bounded runtime, a status
+flag, and per-rule evidence when a parse gives up. Neither one makes a pathological grammar fast.
+The real fix is always at the grammar level. This guide catalogues the rule shapes that reliably
+cause combinatorial blowups, keyed by the stable diagnostic codes `GrammarAnalyzer.Analyze`
+(`hc lint`) emits, plus the interaction patterns that only show up empirically.
+
+## Static checks (`GrammarAnalyzer` / `hc lint`)
+
+### HC0001 — Error: no overt exponent + `MaxApplicationCount > 1`
+
+An affix rule whose every allomorph's output is a pure copy of the input (no inserted segments)
+*and* whose `MaxApplicationCount` has been raised above 1 (the XML `multipleApplication`
+attribute) will unapply to every word, every time, with nothing to ever make it stop. Analysis
+keeps "peeling off" a rule that changed nothing, over and over, up to the configured cap.
+
+**Fix:** give the rule a real, overt exponent (an inserted segment or boundary), or drop
+`MaxApplicationCount` back to the default of 1.
+
+### HC0002 — Warning: no overt exponent, single application
+
+Same "adds nothing" shape as HC0001, but capped at one application. Still doubles the candidate
+count at every cascade position it's considered at, for no linguistic payoff. Often this is an
+unintentional gap in a grammar rather than a deliberate zero-exponent rule (e.g. a rule that's
+purely feature-changing).
+
+**Fix:** add an overt exponent if one is missing, or confirm the zero-exponent shape is
+intentional (e.g. modeling a floating feature) and leave it — HC0002 is Info-adjacent, not a hard
+error.
+
+### HC0003 — Warning: `MaxApplicationCount` raised
+
+Flags the opt-in itself, on any affix rule, independent of whether it has an overt exponent. This
+is exactly the knob a pathological grammar reaches for. It's not wrong to raise it — some
+agglutinative languages need real recursive affixation — but every raised value should be
+justified by an actual attested word shape, not left at "big enough."
+
+**Fix:** set it to the smallest value that covers real words in the language, not a round number
+picked for headroom.
+
+### HC0004 — Warning: self-feeding rewrite rule
+
+A `Simultaneous`-mode phonological rule whose output can satisfy its own environment again. Before
+complexity-cap's Layer 1, this specific shape (`ReapplyType.SelfOpaquing` in `AnalysisRewriteRule`)
+had **no reapplication bound at all** — an unconditional infinite loop the first time a grammar
+hit it. Layer 1's step budget now catches it, but it's still wasted work every single parse.
+
+**Fix:** add an environment constraint that excludes the rule's own output (so a second
+application can't match), or switch to `Iterative` mode if repeated application really is the
+intent — iterative mode terminates naturally once the pattern stops matching.
+
+### HC0005 — Warning: unconstrained deletion
+
+A deletion phonological rule (synthesis removes more material than it keeps) with no left or
+right environment constraint at all. During analysis, HermitCrab must hypothesize that the deleted
+segment could have been anywhere satisfying the (empty) environment — i.e. everywhere — and
+`Morpher.DeletionReapplications` governs how many times it's willing to keep re-guessing.
+
+**Fix:** add a left and/or right environment constraint so reinsertion is only considered where
+deletion could plausibly have applied.
+
+### HC0006 — Warning: unconstrained compounding
+
+A compounding rule that constrains the part of speech of neither the head nor the non-head. Every
+stem in the lexicon becomes a candidate on *both* sides — a cross-product that interacts with
+`Morpher.MaxStemCount` and grows fast with lexicon size.
+
+**Fix:** constrain `HeadRequiredSyntacticFeatureStruct` and/or `NonHeadRequiredSyntacticFeatureStruct`
+to the parts of speech that can actually compound in the language.
+
+### HC0007 — Info: adjacent optional/iterative lexical patterns
+
+A lexical guess pattern (e.g. `([Seg])([Seg])`) with two or more optional/iterative segments back
+to back. `Morpher.LexicalGuess`'s own comments already note this produces spurious ambiguity:
+multiple paths through the pattern match the same literal string, multiplying candidates without
+adding coverage.
+
+**Fix:** prefer a single Kleene-star class (`[Seg]*`) over back-to-back optional groups when the
+intent is "zero or more of these."
+
+### HC0008 — Info: cyclic feeding pair (best-effort)
+
+Two affix rules that each add no overt exponent, where each rule's output syntactic category is
+compatible with the other's input requirement. Structurally, this is the shape of an
+`A → B → A → B → ...` cycle that never terminates via a shape change — the specific loophole that
+`Morpher.MaxRuleApplicationsPerWord` exists to close, since neither rule's own
+`MaxApplicationCount` will ever trip on its own.
+
+This check is intentionally conservative (high-confidence pairs only, per an open question in
+complexity-cap.md §10) — it will miss cycles that involve an overt exponent that nonetheless still
+loops via some other mechanism, and it won't catch cycles longer than two rules.
+
+**Fix:** verify the two rules can't actually chain into each other indefinitely; if they
+legitimately can (rare), set a `MaxRuleApplicationsPerWord` cap.
+
+## What static analysis can't catch
+
+Individually reasonable rules can still combine into exponential blowups — this is inherent to
+static analysis over a rule *set*, not a specific bug in `GrammarAnalyzer`. When a word breaches
+`Morpher.MaxParseSteps`/`ParseTimeout`, use `Morpher.RerunWithDiagnostics` to re-parse that one word
+with per-rule counters enabled and get an empirical top-offender report: *"word X exceeded N
+steps; rule Y accounted for most of the applications."* That rule is where to start — check it
+against the codes above even if the static pass didn't flag it standalone, since the empirical
+report is often revealing an *interaction*, not a single bad rule.
+
+## Layered defense, not a substitute for grammar fixes
+
+None of `MaxParseSteps`, `ParseTimeout`, `MaxRuleApplicationsPerWord`, or `MaxAnalysisShapeGrowth`
+make a pathological grammar parse faster or more correctly — they bound the damage (a soft-stop
+with partial results, never a hang, never an exception) while the grammar gets fixed. A grammar
+that regularly needs those caps to fire is a grammar that needs fixing, not a grammar that's
+"handled." Treat a budget breach as a bug report against the grammar, using the codes and the
+empirical report above to find the specific rule to fix.
diff --git a/src/SIL.Machine.Morphology.HermitCrab.Tool/LintCommand.cs b/src/SIL.Machine.Morphology.HermitCrab.Tool/LintCommand.cs
new file mode 100644
index 00000000..5c2bb824
--- /dev/null
+++ b/src/SIL.Machine.Morphology.HermitCrab.Tool/LintCommand.cs
@@ -0,0 +1,68 @@
+using System.Linq;
+using ManyConsole;
+
+namespace SIL.Machine.Morphology.HermitCrab;
+
+/// <summary>
+/// Thin CLI wrapper around <see cref="GrammarAnalyzer.Analyze"/> (complexity-cap.md §6.3) — lets
+/// machine.py users and CI-style grammar validation run the static lint outside FLEx.
+/// </summary>
+internal class LintCommand : ConsoleCommand
+{
+    private readonly HCContext _context;
+    private string _severity;
+
+    public LintCommand(HCContext context)
+    {
+        _context = context;
+
+        IsCommand("lint", "Runs static grammar analysis and reports diagnostics (see complexity-cap.md).");
+        SkipsCommandSummaryBeforeRunning();
+        HasOption(
+            "s|severity=",
+            "minimum severity to report: info, warning, or error (default: info)",
+            o => _severity = o
+        );
+    }
+
+    public override int Run(string[] remainingArguments)
+    {
+        DiagnosticSeverity minSeverity = ParseSeverity(_severity);
+        var diagnostics = GrammarAnalyzer
+            .Analyze(_context.Language)
+            .Where(d => d.Severity >= minSeverity)
+            .OrderBy(d => d.Code)
+            .ToList();
+
+        if (diagnostics.Count == 0)
+        {
+            _context.Out.WriteLine("No grammar diagnostics found.");
+        }
+        else
+        {
+            foreach (GrammarDiagnostic diagnostic in diagnostics)
+            {
+                _context.Out.WriteLine("{0} [{1}] {2}", diagnostic.Code, diagnostic.Severity, diagnostic.Message);
+                _context.Out.WriteLine("    Suggestion: {0}", diagnostic.Suggestion);
+            }
+            _context.Out.WriteLine();
+            _context.Out.WriteLine("{0} diagnostic(s).", diagnostics.Count);
+        }
+
+        _context.Out.WriteLine();
+        return 0;
+    }
+
+    private static DiagnosticSeverity ParseSeverity(string severity)
+    {
+        switch (severity?.ToLowerInvariant())
+        {
+            case "warning":
+                return DiagnosticSeverity.Warning;
+            case "error":
+                return DiagnosticSeverity.Error;
+            default:
+                return DiagnosticSeverity.Info;
+        }
+    }
+}
diff --git a/src/SIL.Machine.Morphology.HermitCrab.Tool/ParseCommand.cs b/src/SIL.Machine.Morphology.HermitCrab.Tool/ParseCommand.cs
index 86bfc0db..86b96a8e 100644
--- a/src/SIL.Machine.Morphology.HermitCrab.Tool/ParseCommand.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab.Tool/ParseCommand.cs
@@ -1,4 +1,5 @@
-﻿using System.Collections.Generic;
+﻿using System;
+using System.Collections.Generic;
 using System.Diagnostics;
 using System.Linq;
 using ManyConsole;
@@ -8,6 +9,7 @@ namespace SIL.Machine.Morphology.HermitCrab;
 internal class ParseCommand : ConsoleCommand
 {
     private readonly HCContext _context;
+    private bool _diagnose;
 
     public ParseCommand(HCContext context)
     {
@@ -16,11 +18,18 @@ public ParseCommand(HCContext context)
         IsCommand("parse", "Parses a word");
         SkipsCommandSummaryBeforeRunning();
         HasAdditionalArguments(1, "<word>");
+        HasOption(
+            "d|diagnose",
+            "reports step budget usage and the top offending rules for this word (see complexity-cap.md)",
+            o => _diagnose = true
+        );
     }
 
     public override int Run(string[] remainingArguments)
     {
         string word = remainingArguments[0];
+        if (_diagnose)
+            return RunDiagnose(word);
         try
         {
             _context.ParseCount++;
@@ -58,6 +67,52 @@ public override int Run(string[] remainingArguments)
             _context.Out.WriteLine();
             return 1;
         }
+        finally
+        {
+            _diagnose = false;
+        }
+    }
+
+    private int RunDiagnose(string word)
+    {
+        try
+        {
+            ParseDiagnostics diagnostics = _context.Morpher.RerunWithDiagnostics(word, out IEnumerable<Word> results);
+            int resultCount = results.Count();
+            _context.Out.WriteLine(
+                "\"{0}\": {1} result(s), {2} step(s), {3:F1}ms, budget exhausted: {4}{5}",
+                word,
+                resultCount,
+                diagnostics.StepsUsed,
+                diagnostics.Elapsed.TotalMilliseconds,
+                diagnostics.BudgetExhausted,
+                diagnostics.BudgetExhausted ? $" ({diagnostics.Reason})" : ""
+            );
+            _context.Out.WriteLine("Top rules by application count:");
+            foreach ((IHCRule rule, int applications) in diagnostics.TopRules.Take(10))
+            {
+                double pct = 100.0 * applications / Math.Max(diagnostics.StepsUsed, 1);
+                _context.Out.WriteLine(
+                    "  {0,8} ({1,5:F1}%)  {2} '{3}'",
+                    applications,
+                    pct,
+                    rule.GetType().Name,
+                    rule.Name
+                );
+            }
+            _context.Out.WriteLine();
+            return 0;
+        }
+        catch (InvalidShapeException ise)
+        {
+            _context.Out.WriteLine("The word contains an invalid segment at position {0}.", ise.Position + 1);
+            _context.Out.WriteLine();
+            return 1;
+        }
+        finally
+        {
+            _diagnose = false;
+        }
     }
 
     private void PrintTrace(Trace trace, int indent, HashSet<int> lineIndices)
diff --git a/src/SIL.Machine.Morphology.HermitCrab.Tool/Program.cs b/src/SIL.Machine.Morphology.HermitCrab.Tool/Program.cs
index ff8e86bc..ac1b0aa5 100644
--- a/src/SIL.Machine.Morphology.HermitCrab.Tool/Program.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab.Tool/Program.cs
@@ -92,6 +92,7 @@ public static int Main(string[] args)
             new TracingCommand(context),
             new TestCommand(context),
             new StatsCommand(context),
+            new LintCommand(context),
         };
 
         string input;
diff --git a/src/SIL.Machine.Morphology.HermitCrab/GrammarAnalyzer.cs b/src/SIL.Machine.Morphology.HermitCrab/GrammarAnalyzer.cs
new file mode 100644
index 00000000..0fc0434a
--- /dev/null
+++ b/src/SIL.Machine.Morphology.HermitCrab/GrammarAnalyzer.cs
@@ -0,0 +1,349 @@
+using System.Collections.Generic;
+using System.Linq;
+using SIL.Machine.Annotations;
+using SIL.Machine.FeatureModel;
+using SIL.Machine.Matching;
+using SIL.Machine.Morphology.HermitCrab.MorphologicalRules;
+using SIL.Machine.Morphology.HermitCrab.PhonologicalRules;
+
+namespace SIL.Machine.Morphology.HermitCrab
+{
+    public enum DiagnosticSeverity
+    {
+        Info,
+        Warning,
+        Error,
+    }
+
+    /// <summary>
+    /// One finding from <see cref="GrammarAnalyzer.Analyze"/>: a static "don't do this" signal about a
+    /// specific rule shape, keyed by a stable <see cref="Code"/> so other tools (FLEx's parser report,
+    /// a CLI) can key documentation/UI off it. See complexity-cap.md §6 for the code catalogue and the
+    /// "Writing performant HC grammars" guide organized by these codes.
+    /// </summary>
+    public sealed class GrammarDiagnostic
+    {
+        internal GrammarDiagnostic(
+            string code,
+            DiagnosticSeverity severity,
+            object rule,
+            string message,
+            string suggestion
+        )
+        {
+            Code = code;
+            Severity = severity;
+            Rule = rule;
+            Message = message;
+            Suggestion = suggestion;
+        }
+
+        public string Code { get; }
+        public DiagnosticSeverity Severity { get; }
+
+        /// <summary>The culprit object — an <see cref="IHCRule"/> (rule/template) or a <see cref="Morpheme"/> (lexical entry).</summary>
+        public object Rule { get; }
+        public string Message { get; }
+        public string Suggestion { get; }
+
+        public override string ToString()
+        {
+            string ruleName = (Rule as IHCRule)?.Name ?? (Rule as Morpheme)?.Id ?? Rule?.ToString();
+            return $"{Code} [{Severity}] {ruleName}: {Message}";
+        }
+    }
+
+    /// <summary>
+    /// Layer 3 of complexity-cap.md: static analysis over a loaded <see cref="Language"/> that flags
+    /// rule shapes which are always-wrong or almost-always-wrong for parse complexity — independent of
+    /// any specific word, and independent of whether the grammar was loaded from XML or built
+    /// programmatically (FieldWorks' HCLoader), since both produce the same in-memory <see cref="Language"/>.
+    /// What this *cannot* catch is combinatorial interaction between individually-reasonable rules; that
+    /// is covered empirically by <see cref="Morpher.RerunWithDiagnostics"/> instead (see complexity-cap.md §6.2).
+    /// </summary>
+    public static class GrammarAnalyzer
+    {
+        public static IReadOnlyList<GrammarDiagnostic> Analyze(Language language)
+        {
+            var diagnostics = new List<GrammarDiagnostic>();
+            foreach (Stratum stratum in language.Strata)
+            {
+                foreach (IMorphologicalRule rule in stratum.MorphologicalRules)
+                {
+                    if (rule is AffixProcessRule affixRule)
+                        CheckAffixProcessRule(affixRule, diagnostics);
+                    else if (rule is CompoundingRule compoundingRule)
+                        CheckCompoundingRule(compoundingRule, diagnostics);
+                }
+
+                foreach (IPhonologicalRule prule in stratum.PhonologicalRules)
+                {
+                    if (prule is RewriteRule rewriteRule)
+                        CheckRewriteRule(rewriteRule, diagnostics);
+                }
+
+                CheckLexicalPatterns(stratum, diagnostics);
+            }
+
+            CheckCyclicFeedingPairs(language, diagnostics);
+
+            return diagnostics;
+        }
+
+        // HC0001 / HC0002 / HC0003
+        private static void CheckAffixProcessRule(AffixProcessRule rule, List<GrammarDiagnostic> diagnostics)
+        {
+            if (HasNoOvertExponent(rule))
+            {
+                if (rule.MaxApplicationCount > 1)
+                {
+                    diagnostics.Add(
+                        new GrammarDiagnostic(
+                            "HC0001",
+                            DiagnosticSeverity.Error,
+                            rule,
+                            "Affix rule has no overt exponent (every allomorph's output is a pure copy of "
+                                + "the input, adding no phonological material) and MaxApplicationCount > 1. "
+                                + "This unapplies to every word, every time, with no way to ever stop: "
+                                + "guaranteed exponential.",
+                            "Give the rule an overt exponent, or set MaxApplicationCount back to 1."
+                        )
+                    );
+                }
+                else
+                {
+                    diagnostics.Add(
+                        new GrammarDiagnostic(
+                            "HC0002",
+                            DiagnosticSeverity.Warning,
+                            rule,
+                            "Affix rule has no overt exponent (every allomorph's output is a pure copy of "
+                                + "the input, adding no phonological material). It still multiplies "
+                                + "candidates once per cascade position and is frequently unintended.",
+                            "Add an overt exponent (an inserted segment/boundary), or confirm this "
+                                + "zero-exponent rule (e.g. a purely feature-changing rule) is intentional."
+                        )
+                    );
+                }
+            }
+
+            if (rule.MaxApplicationCount > 1)
+            {
+                diagnostics.Add(
+                    new GrammarDiagnostic(
+                        "HC0003",
+                        DiagnosticSeverity.Warning,
+                        rule,
+                        $"MaxApplicationCount is {rule.MaxApplicationCount} (the XML multipleApplication "
+                            + "attribute raises it above the default of 1) — this is precisely where an "
+                            + "unbounded grammar opts into unboundedness.",
+                        "Confirm a bound this high is actually needed; prefer the smallest value that "
+                            + "covers legitimate words."
+                    )
+                );
+            }
+        }
+
+        private static bool HasNoOvertExponent(AffixProcessRule rule)
+        {
+            if (rule.Allomorphs.Count == 0)
+                return false;
+            return rule.Allomorphs.All(allo =>
+                allo.Rhs.All(action => action is CopyFromInput || action is ModifyFromInput)
+            );
+        }
+
+        // HC0006
+        private static void CheckCompoundingRule(CompoundingRule rule, List<GrammarDiagnostic> diagnostics)
+        {
+            if (rule.HeadRequiredSyntacticFeatureStruct.IsEmpty && rule.NonHeadRequiredSyntacticFeatureStruct.IsEmpty)
+            {
+                diagnostics.Add(
+                    new GrammarDiagnostic(
+                        "HC0006",
+                        DiagnosticSeverity.Warning,
+                        rule,
+                        "Compounding rule constrains the part of speech of neither the head nor the "
+                            + "non-head — every stem in the lexicon is a candidate on both sides, a "
+                            + "cross-product blowup that interacts with Morpher.MaxStemCount.",
+                        "Constrain HeadRequiredSyntacticFeatureStruct and/or "
+                            + "NonHeadRequiredSyntacticFeatureStruct to the parts of speech that can "
+                            + "actually compound."
+                    )
+                );
+            }
+        }
+
+        // HC0004 / HC0005
+        private static void CheckRewriteRule(RewriteRule rule, List<GrammarDiagnostic> diagnostics)
+        {
+            foreach (RewriteSubrule subrule in rule.Subrules)
+            {
+                // Deletion subrule: underlying (Lhs) longer than surface (Rhs) — synthesis deletes
+                // material, so analysis must hypothesize/reinsert it. Matches AnalysisRewriteRule's own
+                // ReapplyType.Deletion classification.
+                if (rule.Lhs.Children.Count > subrule.Rhs.Children.Count)
+                {
+                    if (subrule.LeftEnvironment.Children.Count == 0 && subrule.RightEnvironment.Children.Count == 0)
+                    {
+                        diagnostics.Add(
+                            new GrammarDiagnostic(
+                                "HC0005",
+                                DiagnosticSeverity.Warning,
+                                rule,
+                                "Deletion rule has no left or right environment constraint at all — "
+                                    + "analysis can hypothesize a deleted segment matching this pattern "
+                                    + "anywhere in the word, unboundedly reinserting it (interacts with "
+                                    + "Morpher.DeletionReapplications).",
+                                "Add a left and/or right environment constraint so reinsertion is only "
+                                    + "considered in the position(s) where deletion could plausibly have occurred."
+                            )
+                        );
+                    }
+                }
+
+                // Self-feeding: matches AnalysisRewriteRule's own ReapplyType.SelfOpaquing selection
+                // exactly — that path had no reapplication bound at all before complexity-cap Layer 1,
+                // i.e. an unconditional infinite loop for any grammar that hits it. Two distinct engine
+                // branches select it (see AnalysisRewriteRule's constructor):
+                //   - Lhs.Count == Rhs.Count (a same-length/feature-changing subrule): only when
+                //     Simultaneous *and* a Rhs segment constraint could satisfy its own environment again.
+                //   - Lhs.Count == 0 (epenthesis): unconditionally, whenever Simultaneous — the inserted
+                //     segment's own shape is irrelevant, so there's no unification check to gate it.
+                bool isSelfOpaquing;
+                if (rule.Lhs.Children.Count == subrule.Rhs.Children.Count)
+                {
+                    isSelfOpaquing =
+                        rule.ApplicationMode == RewriteApplicationMode.Simultaneous && IsSelfFeeding(subrule);
+                }
+                else if (rule.Lhs.Children.Count == 0)
+                {
+                    isSelfOpaquing = rule.ApplicationMode == RewriteApplicationMode.Simultaneous;
+                }
+                else
+                {
+                    isSelfOpaquing = false; // Deletion/expansion branches — always ReapplyType.Deletion.
+                }
+
+                if (isSelfOpaquing)
+                {
+                    diagnostics.Add(
+                        new GrammarDiagnostic(
+                            "HC0004",
+                            DiagnosticSeverity.Warning,
+                            rule,
+                            "Simultaneous-mode rewrite rule whose output can satisfy its own environment "
+                                + "again (self-feeding) — analysis can keep re-hypothesizing this rule's "
+                                + "effect on its own output indefinitely.",
+                            "Add an environment constraint that excludes the rule's own output, or switch "
+                                + "to Iterative application mode if that's the intent."
+                        )
+                    );
+                }
+            }
+        }
+
+        private static bool IsSelfFeeding(RewriteSubrule subrule)
+        {
+            foreach (Constraint<Word, int> constraint in subrule.Rhs.Children.OfType<Constraint<Word, int>>())
+            {
+                if (constraint.Type() != HCFeatureSystem.Segment)
+                    continue;
+                if (
+                    !constraint.IsUnifiableWithEnvironment(subrule.LeftEnvironment)
+                    || !constraint.IsUnifiableWithEnvironment(subrule.RightEnvironment)
+                )
+                {
+                    return true;
+                }
+            }
+            return false;
+        }
+
+        // HC0007
+        private static void CheckLexicalPatterns(Stratum stratum, List<GrammarDiagnostic> diagnostics)
+        {
+            foreach (LexEntry entry in stratum.Entries)
+            {
+                foreach (RootAllomorph allomorph in entry.Allomorphs)
+                {
+                    if (!allomorph.IsPattern)
+                        continue;
+                    int consecutiveOptional = 0;
+                    bool flagged = false;
+                    foreach (ShapeNode node in allomorph.Segments.Shape)
+                    {
+                        if (flagged)
+                            break;
+                        if (node.Annotation.Optional || node.IsIterative())
+                        {
+                            consecutiveOptional++;
+                            if (consecutiveOptional >= 2)
+                            {
+                                diagnostics.Add(
+                                    new GrammarDiagnostic(
+                                        "HC0007",
+                                        DiagnosticSeverity.Info,
+                                        entry,
+                                        $"Lexical pattern '{entry.Id}' has two or more adjacent "
+                                            + "optional/iterative segments — a known source of spurious "
+                                            + "ambiguity (multiple paths through the pattern produce the "
+                                            + "same string).",
+                                        "Prefer a single Kleene-star class over back-to-back optional groups."
+                                    )
+                                );
+                                flagged = true;
+                            }
+                        }
+                        else
+                        {
+                            consecutiveOptional = 0;
+                        }
+                    }
+                }
+            }
+        }
+
+        // HC0008
+        private static void CheckCyclicFeedingPairs(Language language, List<GrammarDiagnostic> diagnostics)
+        {
+            foreach (Stratum stratum in language.Strata)
+            {
+                List<AffixProcessRule> rules = stratum.MorphologicalRules.OfType<AffixProcessRule>().ToList();
+                for (int i = 0; i < rules.Count; i++)
+                {
+                    for (int j = i + 1; j < rules.Count; j++)
+                    {
+                        AffixProcessRule a = rules[i];
+                        AffixProcessRule b = rules[j];
+                        // Best-effort, high-confidence-only pairs (per complexity-cap.md §10 open
+                        // question #6): both sides add no overt exponent, and each rule's output
+                        // syntactic category is compatible with the other's input requirement — an
+                        // A-then-B-then-A-then-B chain that never terminates via shape change.
+                        if (
+                            HasNoOvertExponent(a)
+                            && HasNoOvertExponent(b)
+                            && a.OutSyntacticFeatureStruct.IsUnifiable(b.RequiredSyntacticFeatureStruct)
+                            && b.OutSyntacticFeatureStruct.IsUnifiable(a.RequiredSyntacticFeatureStruct)
+                        )
+                        {
+                            diagnostics.Add(
+                                new GrammarDiagnostic(
+                                    "HC0008",
+                                    DiagnosticSeverity.Info,
+                                    a,
+                                    $"'{a.Name}' and '{b.Name}' both add no overt exponent and each "
+                                        + "rule's output category is compatible with the other's input "
+                                        + "requirement — a cyclic feeding pair (A feeds B feeds A) is "
+                                        + "structurally possible.",
+                                    "Verify these two rules can't unapply to each other indefinitely; "
+                                        + "consider a MaxRuleApplicationsPerWord cap either way."
+                                )
+                            );
+                        }
+                    }
+                }
+            }
+        }
+    }
+}
diff --git a/src/SIL.Machine.Morphology.HermitCrab/HermitCrabExtensions.cs b/src/SIL.Machine.Morphology.HermitCrab/HermitCrabExtensions.cs
index 5cf2ad5a..535ab96e 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/HermitCrabExtensions.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/HermitCrabExtensions.cs
@@ -27,6 +27,34 @@ public static FeatureSymbol Type(this Constraint<Word, int> constraint)
             return (FeatureSymbol)constraint.FeatureStruct.GetValue(HCFeatureSystem.Type);
         }
 
+        /// <summary>
+        /// Whether <paramref name="constraint"/> could satisfy every segment constraint in
+        /// <paramref name="environment"/> — i.e. whether a segment matching <paramref name="constraint"/>
+        /// could itself sit in that environment again. Shared by <see cref="PhonologicalRules.AnalysisRewriteRule"/>
+        /// (which uses it to pick <c>ReapplyType.SelfOpaquing</c> at compile time) and
+        /// <see cref="GrammarAnalyzer"/> (which replicates that exact classification statically to flag
+        /// HC0004 self-feeding rules) — both need the identical rule to stay in sync.
+        /// </summary>
+        internal static bool IsUnifiableWithEnvironment(
+            this Constraint<Word, int> constraint,
+            Pattern<Word, int> environment
+        )
+        {
+            foreach (
+                Constraint<Word, int> envConstraint in environment.GetNodesDepthFirst().OfType<Constraint<Word, int>>()
+            )
+            {
+                if (
+                    envConstraint.Type() == HCFeatureSystem.Segment
+                    && !envConstraint.FeatureStruct.IsUnifiable(constraint.FeatureStruct)
+                )
+                {
+                    return false;
+                }
+            }
+            return true;
+        }
+
         // RUSTIFY Stage 2: the FST binds as Fst<Word,int> and its matcher filters / inspects the
         // shape's int-offset annotation projection (Annotation<int>), which shares the FeatureStruct
         // with the ShapeNode annotations — so these read identically to the ShapeNode overloads.
diff --git a/src/SIL.Machine.Morphology.HermitCrab/Morpher.cs b/src/SIL.Machine.Morphology.HermitCrab/Morpher.cs
index 5a8630c9..d4b553dc 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/Morpher.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/Morpher.cs
@@ -82,19 +82,31 @@ public ITraceManager TraceManager
         }
 
         /// <summary>
-        /// Generous default for <see cref="MaxParseSteps"/>, calibrated against the real Indonesian/Sena
-        /// grammars on the rustify engine (see complexity-cap.md Phase 0): observed legitimate max was
-        /// ~13,600 steps (Sena), so this ships ~150x above that ceiling — effectively invisible for real
-        /// grammars but still finite. 0 disables the step budget.
+        /// Generous default for <see cref="MaxParseSteps"/>. Calibrated against the real Indonesian
+        /// (~2,500-line grammar, worst observed word ~10,400 steps) and Sena (~33,000-line grammar, worst
+        /// observed word so far 14,905,517 steps / 105.3s, from a partial corpus sample) grammars — see
+        /// complexity-cap.md Phase 0. Legitimate cost varies by roughly 1000x between these two grammars
+        /// because Sena's agglutinative verb morphology combines many candidate affix slots, so this is set
+        /// with headroom above the largest legitimate word seen so far rather than as a fixed multiple of
+        /// Indonesian's ceiling. Because only ~1% of the Sena corpus has been sampled, this should be
+        /// re-validated against a full corpus run before being treated as final. In practice
+        /// <see cref="DefaultParseTimeout"/> is expected to trip before this does for slow-but-legitimate
+        /// words, since step cost and wall-clock time track closely (~140k steps/sec observed on Sena); this
+        /// step budget mainly exists to catch algorithmically cheap infinite loops. 0 disables the step
+        /// budget.
         /// </summary>
-        public const int DefaultMaxParseSteps = 2_000_000;
+        public const int DefaultMaxParseSteps = 50_000_000;
 
         /// <summary>
-        /// Generous default for <see cref="ParseTimeout"/> — a backstop far above any observed legitimate
-        /// single-word parse time on the rustify engine, but still bounded so one pathological word cannot
-        /// stall a "Parse All Words" batch indefinitely. <see cref="TimeSpan.Zero"/> disables the timeout.
+        /// Generous default for <see cref="ParseTimeout"/>. This is a genuine product tradeoff, not just a
+        /// safety margin: real Sena words have been observed taking 100+ seconds to parse legitimately (see
+        /// <see cref="DefaultMaxParseSteps"/>), so any finite timeout will occasionally cut off a real parse
+        /// on grammars like Sena. 30 seconds is chosen as generous enough for the vast majority of legitimate
+        /// words while still bounding worst-case per-word latency in a "Parse All Words" batch to something
+        /// human-tolerable. Consumers with expensive grammars and no batch-latency constraint should raise
+        /// this. <see cref="TimeSpan.Zero"/> disables the timeout.
         /// </summary>
-        public static readonly TimeSpan DefaultParseTimeout = TimeSpan.FromSeconds(10);
+        public static readonly TimeSpan DefaultParseTimeout = TimeSpan.FromSeconds(30);
 
         public int DeletionReapplications { get; set; }
 
@@ -279,9 +291,6 @@ out ParseDiagnostics diagnostics
 
         private static ParseDiagnostics CreateParseDiagnostics(ParseContext parseContext)
         {
-            if (!parseContext.Exhausted)
-                return ParseDiagnostics.None;
-
             IReadOnlyList<(IHCRule Rule, int Applications)> topRules = null;
             if (parseContext.DiagnosticsEnabled)
             {
@@ -292,7 +301,7 @@ private static ParseDiagnostics CreateParseDiagnostics(ParseContext parseContext
             }
 
             return new ParseDiagnostics(
-                true,
+                parseContext.Exhausted,
                 parseContext.Reason,
                 parseContext.StepsUsed,
                 parseContext.Elapsed,
diff --git a/src/SIL.Machine.Morphology.HermitCrab/ParseContext.cs b/src/SIL.Machine.Morphology.HermitCrab/ParseContext.cs
index 82731dde..99fbf69f 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/ParseContext.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/ParseContext.cs
@@ -73,9 +73,9 @@ public bool Step(IHCRule rule = null)
             if (rule != null && _ruleCounters != null)
                 _ruleCounters.AddOrUpdate(rule, 1, (_, count) => count + 1);
 
-            if (_maxSteps <= 0 && _timeoutTicks < 0)
-                return true;
-
+            // Always counted, even when both limits are disabled: StepsUsed must reflect real work
+            // (calibration/diagnostics rely on it), and a single Interlocked.Increment is the "steady-
+            // state cost ~one counter increment per rule application" the design promises either way.
             int steps = Interlocked.Increment(ref _steps);
             if (_maxSteps > 0 && steps >= _maxSteps)
             {
diff --git a/src/SIL.Machine.Morphology.HermitCrab/ParseDiagnostics.cs b/src/SIL.Machine.Morphology.HermitCrab/ParseDiagnostics.cs
index a661e505..66228f70 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/ParseDiagnostics.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/ParseDiagnostics.cs
@@ -10,14 +10,6 @@ namespace SIL.Machine.Morphology.HermitCrab
     /// </summary>
     public sealed class ParseDiagnostics
     {
-        public static readonly ParseDiagnostics None = new ParseDiagnostics(
-            false,
-            ParseExhaustionReason.None,
-            0,
-            TimeSpan.Zero,
-            null
-        );
-
         internal ParseDiagnostics(
             bool budgetExhausted,
             ParseExhaustionReason reason,
diff --git a/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/AnalysisRewriteRule.cs b/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/AnalysisRewriteRule.cs
index ae9bbe4e..22dea216 100644
--- a/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/AnalysisRewriteRule.cs
+++ b/src/SIL.Machine.Morphology.HermitCrab/PhonologicalRules/AnalysisRewriteRule.cs
@@ -54,8 +54,8 @@ public AnalysisRewriteRule(Morpher morpher, RewriteRule rule)
                             if (constraint.Type() == HCFeatureSystem.Segment)
                             {
                                 if (
-                                    !IsUnifiable(constraint, sr.LeftEnvironment)
-                                    || !IsUnifiable(constraint, sr.RightEnvironment)
+                                    !constraint.IsUnifiableWithEnvironment(sr.LeftEnvironment)
+                                    || !constraint.IsUnifiableWithEnvironment(sr.RightEnvironment)
                                 )
                                 {
                                     reapplyType = ReapplyType.SelfOpaquing;
@@ -102,22 +102,6 @@ public AnalysisRewriteRule(Morpher morpher, RewriteRule rule)
             }
         }
 
-        private static bool IsUnifiable(Constraint<Word, int> constraint, Pattern<Word, int> env)
-        {
-            foreach (Constraint<Word, int> curConstraint in env.GetNodesDepthFirst().OfType<Constraint<Word, int>>())
-            {
-                if (
-                    curConstraint.Type() == HCFeatureSystem.Segment
-                    && !curConstraint.FeatureStruct.IsUnifiable(constraint.FeatureStruct)
-                )
-                {
-                    return false;
-                }
-            }
-
-            return true;
-        }
-
         private bool ExceedsShapeGrowth(Word data)
         {
             return _morpher.MaxAnalysisShapeGrowth >= 0
diff --git a/tests/SIL.Machine.Morphology.HermitCrab.Tests/ComplexityCapCorpusTests.cs b/tests/SIL.Machine.Morphology.HermitCrab.Tests/ComplexityCapCorpusTests.cs
new file mode 100644
index 00000000..ec5151bf
--- /dev/null
+++ b/tests/SIL.Machine.Morphology.HermitCrab.Tests/ComplexityCapCorpusTests.cs
@@ -0,0 +1,253 @@
+using System.Diagnostics;
+using NUnit.Framework;
+
+namespace SIL.Machine.Morphology.HermitCrab;
+
+/// <summary>
+/// Complexity-cap Phase 0 (see complexity-cap.md §7, §9): calibration and no-regression corpus using
+/// the real Indonesian/Sena grammars. These grammars + wordlists are large, not licensed for this repo,
+/// and stay untracked (see .gitignore) — every test here is [Explicit] (not run by default CI) and
+/// skips itself when the files aren't present locally, exactly like the RustifyBenchmark precedent
+/// referenced in .gitignore's comment.
+/// </summary>
+[TestFixture]
+[Explicit("Requires the untracked samples/data/{indonesian,sena}-hc.xml corpus; see complexity-cap.md Phase 0.")]
+public class ComplexityCapCorpusTests
+{
+    private static string? FindRepoRoot()
+    {
+        var dir = new DirectoryInfo(AppContext.BaseDirectory);
+        while (dir != null)
+        {
+            if (File.Exists(Path.Combine(dir.FullName, "machine.sln")))
+                return dir.FullName;
+            dir = dir.Parent;
+        }
+        return null;
+    }
+
+    private static (string Grammar, string Words)? FindCorpus(string name)
+    {
+        string? root = FindRepoRoot();
+        if (root == null)
+            return null;
+        string grammar = Path.Combine(root, "samples", "data", $"{name}-hc.xml");
+        string words = Path.Combine(root, "samples", "data", $"{name}-words.txt");
+        if (!File.Exists(grammar) || !File.Exists(words))
+            return null;
+        return (grammar, words);
+    }
+
+    // "Unlimited" for calibration purposes only: a genuinely pathological word in a real corpus must
+    // not be allowed to hang the calibration run forever (see the Sena run that sat stuck for 23
+    // minutes before being killed — exactly the failure mode complexity-cap exists to catch). This is
+    // a calibration safety net only, ~2500x above any legitimate word observed so far; it is not a
+    // proposed shipped default.
+    private const int CalibrationStepCeiling = 50_000_000;
+
+    private static void RunCorpus(string name)
+    {
+        (string Grammar, string Words)? corpus = FindCorpus(name);
+        if (corpus == null)
+        {
+            Assert.Ignore(
+                $"samples/data/{name}-hc.xml and/or {name}-words.txt not present locally (untracked, see .gitignore) — skipping."
+            );
+            return;
+        }
+
+        Language language = XmlLanguageLoader.Load(corpus.Value.Grammar);
+        var morpher = new Morpher(new TraceManager(), language)
+        {
+            MaxParseSteps = CalibrationStepCeiling,
+            ParseTimeout = TimeSpan.Zero,
+        };
+
+        string[] words = File.ReadAllLines(corpus.Value.Words).Select(w => w.Trim()).Where(w => w.Length > 0).ToArray();
+
+        int maxSteps = 0;
+        string maxStepsWord = "";
+        var sw = Stopwatch.StartNew();
+        long maxWordMs = 0;
+        string maxWordMsWord = "";
+        int wordsParsed = 0;
+        int wordsSkipped = 0;
+        var pathologicalWords = new List<(string Word, int Steps)>();
+        foreach (string word in words)
+        {
+            ParseDiagnostics diagnostics;
+            var wordSw = Stopwatch.StartNew();
+            try
+            {
+                morpher.ParseWord(word, out _, false, out diagnostics).ToList();
+            }
+            catch (InvalidShapeException)
+            {
+                // Malformed/non-word lines in this ad hoc wordlist (e.g. gloss annotations that slipped
+                // in) aren't a complexity-cap concern — skip rather than fail the calibration run.
+                wordsSkipped++;
+                continue;
+            }
+            wordSw.Stop();
+            wordsParsed++;
+            // Flushed immediately (unlike TestContext.Out, which buffers until the test ends) so a
+            // hang/crash mid-run still shows which word was last attempted.
+            TestContext.Progress.WriteLine(
+                $"  [{wordsParsed}/{words.Length}] '{word}': {diagnostics.StepsUsed} steps, {wordSw.ElapsedMilliseconds}ms"
+            );
+
+            if (diagnostics.BudgetExhausted)
+                pathologicalWords.Add((word, diagnostics.StepsUsed));
+
+            if (diagnostics.StepsUsed > maxSteps)
+            {
+                maxSteps = diagnostics.StepsUsed;
+                maxStepsWord = word;
+            }
+            if (wordSw.ElapsedMilliseconds > maxWordMs)
+            {
+                maxWordMs = wordSw.ElapsedMilliseconds;
+                maxWordMsWord = word;
+            }
+        }
+        sw.Stop();
+
+        TestContext.Out.WriteLine(
+            $"{name}: {wordsParsed} words parsed ({wordsSkipped} skipped as malformed), total {sw.ElapsedMilliseconds}ms, "
+                + $"max steps {maxSteps} (word '{maxStepsWord}'), "
+                + $"max single-word time {maxWordMs}ms (word '{maxWordMsWord}'), "
+                + $"suggested default MaxParseSteps (100x observed max) = {Math.Max(maxSteps, 1) * 100}"
+        );
+
+        if (pathologicalWords.Count > 0)
+        {
+            TestContext.Out.WriteLine(
+                $"WARNING: {pathologicalWords.Count} word(s) hit the {CalibrationStepCeiling:N0}-step calibration "
+                    + "ceiling — these are candidates for genuinely pathological grammar interactions, not "
+                    + "legitimate baseline data points:"
+            );
+            foreach ((string word, int steps) in pathologicalWords)
+                TestContext.Out.WriteLine($"  '{word}': {steps} steps (hit ceiling)");
+        }
+
+        Assert.That(
+            pathologicalWords,
+            Is.Empty,
+            $"{pathologicalWords.Count} word(s) hit the calibration step ceiling — see output for which word(s); "
+                + "investigate with RerunWithDiagnostics before trusting the max-steps number above for calibration."
+        );
+    }
+
+    [Test]
+    public void Indonesian_Baseline_NoWordExhaustsUnlimitedBudget()
+    {
+        RunCorpus("indonesian");
+    }
+
+    /// <summary>
+    /// Ad hoc diagnostic, not a pass/fail assertion: reports which rule(s) account for the bulk of the
+    /// step count on the single most expensive word in the corpus, using RerunWithDiagnostics exactly
+    /// as the "Writing performant HC grammars" guide (docs/hermitcrab-grammar-performance.md)
+    /// recommends. Useful for eyeballing whether a corpus's worst-case word is a legitimate expensive
+    /// parse or a symptom of a specific bad rule.
+    /// </summary>
+    [Test]
+    public void Indonesian_TopOffendingRules_ForWorstWord()
+    {
+        ReportTopOffenders("indonesian", "mengamat-amati");
+    }
+
+    private static void ReportTopOffenders(string name, string word)
+    {
+        (string Grammar, string Words)? corpus = FindCorpus(name);
+        if (corpus == null)
+        {
+            Assert.Ignore($"samples/data/{name}-hc.xml not present locally — skipping.");
+            return;
+        }
+
+        Language language = XmlLanguageLoader.Load(corpus.Value.Grammar);
+        var morpher = new Morpher(new TraceManager(), language) { MaxParseSteps = 0, ParseTimeout = TimeSpan.Zero };
+
+        ParseDiagnostics diagnostics;
+        try
+        {
+            diagnostics = morpher.RerunWithDiagnostics(word, out IEnumerable<Word> results);
+            results.ToList();
+        }
+        catch (InvalidShapeException)
+        {
+            Assert.Ignore($"'{word}' is not a valid shape in the {name} grammar's character set.");
+            return;
+        }
+
+        TestContext.Out.WriteLine(
+            $"{name} '{word}': {diagnostics.StepsUsed} steps, {diagnostics.Elapsed.TotalMilliseconds:F1}ms"
+        );
+        TestContext.Out.WriteLine("Top rules by application count:");
+        foreach ((IHCRule rule, int applications) in diagnostics.TopRules.Take(10))
+        {
+            double pct = 100.0 * applications / Math.Max(diagnostics.StepsUsed, 1);
+            TestContext.Out.WriteLine($"  {applications, 6} ({pct, 5:F1}%)  {rule.GetType().Name} '{rule.Name}'");
+        }
+    }
+
+    [Test]
+    public void Sena_Baseline_NoWordExhaustsUnlimitedBudget()
+    {
+        RunCorpus("sena");
+    }
+
+    /// <summary>
+    /// Confirms the *shipped* defaults (Morpher.DefaultMaxParseSteps / DefaultParseTimeout) are
+    /// generous enough to be invisible on real, legitimate grammars — the "no-regression" half of
+    /// Phase 0 (§7): every word must still complete without tripping the budget at the defaults a
+    /// naive consumer gets out of the box.
+    /// </summary>
+    [Test]
+    public void Indonesian_ShippedDefaults_NeverTrip()
+    {
+        RunCorpusAtDefaults("indonesian");
+    }
+
+    [Test]
+    public void Sena_ShippedDefaults_NeverTrip()
+    {
+        RunCorpusAtDefaults("sena");
+    }
+
+    private static void RunCorpusAtDefaults(string name)
+    {
+        (string Grammar, string Words)? corpus = FindCorpus(name);
+        if (corpus == null)
+        {
+            Assert.Ignore(
+                $"samples/data/{name}-hc.xml and/or {name}-words.txt not present locally (untracked, see .gitignore) — skipping."
+            );
+            return;
+        }
+
+        Language language = XmlLanguageLoader.Load(corpus.Value.Grammar);
+        var morpher = new Morpher(new TraceManager(), language); // shipped defaults
+
+        string[] words = File.ReadAllLines(corpus.Value.Words).Select(w => w.Trim()).Where(w => w.Length > 0).ToArray();
+
+        foreach (string word in words)
+        {
+            ParseDiagnostics diagnostics;
+            try
+            {
+                morpher.ParseWord(word, out _, false, out diagnostics).ToList();
+            }
+            catch (InvalidShapeException)
+            {
+                continue;
+            }
+            Assert.That(
+                diagnostics.BudgetExhausted,
+                Is.False,
+                $"'{word}' tripped the shipped default budget (StepsUsed={diagnostics.StepsUsed}) — defaults are not generous enough for this corpus"
+            );
+        }
+    }
+}
diff --git a/tests/SIL.Machine.Morphology.HermitCrab.Tests/GrammarAnalyzerTests.cs b/tests/SIL.Machine.Morphology.HermitCrab.Tests/GrammarAnalyzerTests.cs
new file mode 100644
index 00000000..ac3fdc87
--- /dev/null
+++ b/tests/SIL.Machine.Morphology.HermitCrab.Tests/GrammarAnalyzerTests.cs
@@ -0,0 +1,337 @@
+using NUnit.Framework;
+using SIL.Machine.FeatureModel;
+using SIL.Machine.Matching;
+using SIL.Machine.Morphology.HermitCrab.MorphologicalRules;
+using SIL.Machine.Morphology.HermitCrab.PhonologicalRules;
+
+namespace SIL.Machine.Morphology.HermitCrab;
+
+[TestFixture]
+public class GrammarAnalyzerTests : HermitCrabTestBase
+{
+    [Test]
+    public void HC0001_NoOvertExponentWithMultipleApplication_IsError()
+    {
+        var any = FeatureStruct.New().Symbol(HCFeatureSystem.Segment).Value;
+        var rule = new AffixProcessRule
+        {
+            Name = "bad_rule",
+            MaxApplicationCount = 100,
+            RequiredSyntacticFeatureStruct = FeatureStruct.New(Language.SyntacticFeatureSystem).Symbol("V").Value,
+        };
+        rule.Allomorphs.Add(
+            new AffixProcessAllomorph
+            {
+                Lhs = { Pattern<Word, int>.New("1").Annotation(any).OneOrMore.Value },
+                Rhs = { new CopyFromInput("1") },
+            }
+        );
+        Morphophonemic.MorphologicalRules.Add(rule);
+
+        var diagnostics = GrammarAnalyzer.Analyze(Language);
+
+        Assert.That(
+            diagnostics,
+            Has.Some.Matches<GrammarDiagnostic>(d =>
+                d.Code == "HC0001" && d.Severity == DiagnosticSeverity.Error && d.Rule == rule
+            )
+        );
+    }
+
+    [Test]
+    public void HC0002_NoOvertExponentSingleApplication_IsWarning()
+    {
+        var any = FeatureStruct.New().Symbol(HCFeatureSystem.Segment).Value;
+        var rule = new AffixProcessRule
+        {
+            Name = "zero_exponent_rule",
+            RequiredSyntacticFeatureStruct = FeatureStruct.New(Language.SyntacticFeatureSystem).Symbol("V").Value,
+        };
+        rule.Allomorphs.Add(
+            new AffixProcessAllomorph
+            {
+                Lhs = { Pattern<Word, int>.New("1").Annotation(any).OneOrMore.Value },
+                Rhs = { new CopyFromInput("1") },
+            }
+        );
+        Morphophonemic.MorphologicalRules.Add(rule);
+
+        var diagnostics = GrammarAnalyzer.Analyze(Language);
+
+        Assert.That(
+            diagnostics,
+            Has.Some.Matches<GrammarDiagnostic>(d => d.Code == "HC0002" && d.Severity == DiagnosticSeverity.Warning)
+        );
+        Assert.That(diagnostics, Has.None.Matches<GrammarDiagnostic>(d => d.Code == "HC0001"));
+    }
+
+    [Test]
+    public void HC0001_RuleWithOvertExponent_IsNotFlagged()
+    {
+        var any = FeatureStruct.New().Symbol(HCFeatureSystem.Segment).Value;
+        var rule = new AffixProcessRule
+        {
+            Name = "ed_suffix",
+            MaxApplicationCount = 100,
+            RequiredSyntacticFeatureStruct = FeatureStruct.New(Language.SyntacticFeatureSystem).Symbol("V").Value,
+        };
+        rule.Allomorphs.Add(
+            new AffixProcessAllomorph
+            {
+                Lhs = { Pattern<Word, int>.New("1").Annotation(any).OneOrMore.Value },
+                Rhs = { new CopyFromInput("1"), new InsertSegments(Table3, "+d") },
+            }
+        );
+        Morphophonemic.MorphologicalRules.Add(rule);
+
+        var diagnostics = GrammarAnalyzer.Analyze(Language);
+
+        Assert.That(diagnostics, Has.None.Matches<GrammarDiagnostic>(d => d.Code == "HC0001" || d.Code == "HC0002"));
+        // MaxApplicationCount > 1 alone still trips HC0003 regardless of overt exponent.
+        Assert.That(diagnostics, Has.Some.Matches<GrammarDiagnostic>(d => d.Code == "HC0003" && d.Rule == rule));
+    }
+
+    [Test]
+    public void HC0004_SelfFeedingSimultaneousRule_IsFlagged()
+    {
+        // Matches AnalysisRewriteRule's own ReapplyType.SelfOpaquing selection: Simultaneous mode with
+        // a Rhs segment constraint that is NOT unifiable with its own environment.
+        var voc = FeatureStruct
+            .New(Language.PhonologicalFeatureSystem)
+            .Symbol(HCFeatureSystem.Segment)
+            .Symbol("voc+")
+            .Value;
+        var cons = FeatureStruct
+            .New(Language.PhonologicalFeatureSystem)
+            .Symbol(HCFeatureSystem.Segment)
+            .Symbol("voc-")
+            .Value;
+        var rule = new RewriteRule
+        {
+            Name = "self_feeding_rule",
+            ApplicationMode = RewriteApplicationMode.Simultaneous,
+            Lhs = Pattern<Word, int>.New().Value,
+        };
+        rule.Subrules.Add(
+            new RewriteSubrule
+            {
+                Rhs = Pattern<Word, int>.New().Annotation(voc).Value,
+                LeftEnvironment = Pattern<Word, int>.New().Annotation(cons).Value,
+            }
+        );
+        Allophonic.PhonologicalRules.Add(rule);
+
+        var diagnostics = GrammarAnalyzer.Analyze(Language);
+
+        Assert.That(diagnostics, Has.Some.Matches<GrammarDiagnostic>(d => d.Code == "HC0004" && d.Rule == rule));
+    }
+
+    [Test]
+    public void HC0004_SimultaneousEpenthesis_IsUnconditionallyFlagged()
+    {
+        // Epenthesis (Lhs.Children.Count == 0): the engine (AnalysisRewriteRule's constructor) selects
+        // ReapplyType.SelfOpaquing here whenever ApplicationMode is Simultaneous, with no unification
+        // check at all — unlike the same-length-subrule case. Must be flagged unconditionally too.
+        var voc = FeatureStruct
+            .New(Language.PhonologicalFeatureSystem)
+            .Symbol(HCFeatureSystem.Segment)
+            .Symbol("voc+")
+            .Value;
+        var rule = new RewriteRule
+        {
+            Name = "epenthesis_rule",
+            ApplicationMode = RewriteApplicationMode.Simultaneous,
+            Lhs = Pattern<Word, int>.New().Value, // empty Lhs = epenthesis
+        };
+        rule.Subrules.Add(new RewriteSubrule { Rhs = Pattern<Word, int>.New().Annotation(voc).Value });
+        Allophonic.PhonologicalRules.Add(rule);
+
+        var diagnostics = GrammarAnalyzer.Analyze(Language);
+
+        Assert.That(diagnostics, Has.Some.Matches<GrammarDiagnostic>(d => d.Code == "HC0004" && d.Rule == rule));
+    }
+
+    [Test]
+    public void HC0004_IterativeEpenthesis_IsNotFlagged()
+    {
+        var voc = FeatureStruct
+            .New(Language.PhonologicalFeatureSystem)
+            .Symbol(HCFeatureSystem.Segment)
+            .Symbol("voc+")
+            .Value;
+        var rule = new RewriteRule
+        {
+            Name = "epenthesis_rule_iterative",
+            ApplicationMode = RewriteApplicationMode.Iterative,
+            Lhs = Pattern<Word, int>.New().Value,
+        };
+        rule.Subrules.Add(new RewriteSubrule { Rhs = Pattern<Word, int>.New().Annotation(voc).Value });
+        Allophonic.PhonologicalRules.Add(rule);
+
+        var diagnostics = GrammarAnalyzer.Analyze(Language);
+
+        Assert.That(diagnostics, Has.None.Matches<GrammarDiagnostic>(d => d.Code == "HC0004"));
+    }
+
+    [Test]
+    public void HC0005_UnconstrainedDeletion_IsFlagged()
+    {
+        var highFrontUnrndVowel = FeatureStruct
+            .New(Language.PhonologicalFeatureSystem)
+            .Symbol(HCFeatureSystem.Segment)
+            .Symbol("cons-")
+            .Symbol("voc+")
+            .Symbol("high+")
+            .Symbol("low-")
+            .Symbol("back-")
+            .Symbol("round-")
+            .Value;
+        var rule = new RewriteRule
+        {
+            Name = "unconstrained_deletion",
+            Lhs = Pattern<Word, int>.New().Annotation(highFrontUnrndVowel).Value,
+        };
+        rule.Subrules.Add(new RewriteSubrule()); // Rhs defaults to empty (deletion), no environment constraints.
+        Allophonic.PhonologicalRules.Add(rule);
+
+        var diagnostics = GrammarAnalyzer.Analyze(Language);
+
+        Assert.That(diagnostics, Has.Some.Matches<GrammarDiagnostic>(d => d.Code == "HC0005" && d.Rule == rule));
+    }
+
+    [Test]
+    public void HC0005_ConstrainedDeletion_IsNotFlagged()
+    {
+        var highFrontUnrndVowel = FeatureStruct
+            .New(Language.PhonologicalFeatureSystem)
+            .Symbol(HCFeatureSystem.Segment)
+            .Symbol("cons-")
+            .Symbol("voc+")
+            .Symbol("high+")
+            .Symbol("low-")
+            .Symbol("back-")
+            .Symbol("round-")
+            .Value;
+        var highVowel = FeatureStruct
+            .New(Language.PhonologicalFeatureSystem)
+            .Symbol(HCFeatureSystem.Segment)
+            .Symbol("cons-")
+            .Symbol("voc+")
+            .Symbol("high+")
+            .Value;
+        var rule = new RewriteRule
+        {
+            Name = "constrained_deletion",
+            Lhs = Pattern<Word, int>.New().Annotation(highFrontUnrndVowel).Value,
+        };
+        rule.Subrules.Add(
+            new RewriteSubrule { LeftEnvironment = Pattern<Word, int>.New().Annotation(highVowel).Value }
+        );
+        Allophonic.PhonologicalRules.Add(rule);
+
+        var diagnostics = GrammarAnalyzer.Analyze(Language);
+
+        Assert.That(diagnostics, Has.None.Matches<GrammarDiagnostic>(d => d.Code == "HC0005"));
+    }
+
+    [Test]
+    public void HC0006_UnconstrainedCompounding_IsFlagged()
+    {
+        var rule = new CompoundingRule { Name = "unconstrained_compound" };
+        Morphophonemic.MorphologicalRules.Add(rule);
+
+        var diagnostics = GrammarAnalyzer.Analyze(Language);
+
+        Assert.That(diagnostics, Has.Some.Matches<GrammarDiagnostic>(d => d.Code == "HC0006" && d.Rule == rule));
+    }
+
+    [Test]
+    public void HC0006_ConstrainedCompounding_IsNotFlagged()
+    {
+        var rule = new CompoundingRule
+        {
+            Name = "constrained_compound",
+            HeadRequiredSyntacticFeatureStruct = FeatureStruct.New(Language.SyntacticFeatureSystem).Symbol("N").Value,
+            NonHeadRequiredSyntacticFeatureStruct = FeatureStruct
+                .New(Language.SyntacticFeatureSystem)
+                .Symbol("V")
+                .Value,
+        };
+        Morphophonemic.MorphologicalRules.Add(rule);
+
+        var diagnostics = GrammarAnalyzer.Analyze(Language);
+
+        Assert.That(diagnostics, Has.None.Matches<GrammarDiagnostic>(d => d.Code == "HC0006"));
+    }
+
+    [Test]
+    public void HC0007_AdjacentOptionalIterativeLexicalPattern_IsFlagged()
+    {
+        var naturalClass = new NaturalClass(new FeatureStruct()) { Name = "Any" };
+        Morphophonemic.CharacterDefinitionTable.AddNaturalClass(naturalClass);
+        LexEntry entry = AddEntry("pattern_entry", new FeatureStruct(), Morphophonemic, "([Any])([Any])");
+
+        var diagnostics = GrammarAnalyzer.Analyze(Language);
+
+        Assert.That(diagnostics, Has.Some.Matches<GrammarDiagnostic>(d => d.Code == "HC0007" && d.Rule == entry));
+    }
+
+    [Test]
+    public void HC0008_CyclicFeedingPair_IsFlagged()
+    {
+        var any = FeatureStruct.New().Symbol(HCFeatureSystem.Segment).Value;
+        var a = new AffixProcessRule
+        {
+            Name = "cycle_a",
+            RequiredSyntacticFeatureStruct = FeatureStruct.New(Language.SyntacticFeatureSystem).Symbol("V").Value,
+        };
+        a.Allomorphs.Add(
+            new AffixProcessAllomorph
+            {
+                Lhs = { Pattern<Word, int>.New("1").Annotation(any).OneOrMore.Value },
+                Rhs = { new CopyFromInput("1") },
+            }
+        );
+        var b = new AffixProcessRule
+        {
+            Name = "cycle_b",
+            RequiredSyntacticFeatureStruct = FeatureStruct.New(Language.SyntacticFeatureSystem).Symbol("V").Value,
+        };
+        b.Allomorphs.Add(
+            new AffixProcessAllomorph
+            {
+                Lhs = { Pattern<Word, int>.New("1").Annotation(any).OneOrMore.Value },
+                Rhs = { new CopyFromInput("1") },
+            }
+        );
+        Morphophonemic.MorphologicalRules.Add(a);
+        Morphophonemic.MorphologicalRules.Add(b);
+
+        var diagnostics = GrammarAnalyzer.Analyze(Language);
+
+        Assert.That(diagnostics, Has.Some.Matches<GrammarDiagnostic>(d => d.Code == "HC0008"));
+    }
+
+    [Test]
+    public void Analyze_WellBehavedGrammar_ProducesNoDiagnostics()
+    {
+        var any = FeatureStruct.New().Symbol(HCFeatureSystem.Segment).Value;
+        var edSuffix = new AffixProcessRule
+        {
+            Name = "ed_suffix",
+            RequiredSyntacticFeatureStruct = FeatureStruct.New(Language.SyntacticFeatureSystem).Symbol("V").Value,
+        };
+        edSuffix.Allomorphs.Add(
+            new AffixProcessAllomorph
+            {
+                Lhs = { Pattern<Word, int>.New("1").Annotation(any).OneOrMore.Value },
+                Rhs = { new CopyFromInput("1"), new InsertSegments(Table3, "+d") },
+            }
+        );
+        Morphophonemic.MorphologicalRules.Add(edSuffix);
+
+        var diagnostics = GrammarAnalyzer.Analyze(Language);
+
+        Assert.That(diagnostics, Is.Empty);
+    }
+}

From 13567446dc3ffaf04c5f0d04239109d1ef00be2b Mon Sep 17 00:00:00 2001
From: John Lambert <john_lambert@sil.org>
Date: Thu, 2 Jul 2026 16:17:18 -0400
Subject: [PATCH 4/6] complexity-cap.md: mark Phases 0-3 done, record commit
 hashes

Bookkeeping only - the status header and phase table still said "Plan
(not started)" after Phases 0-3 were implemented and committed.
---
 complexity-cap.md | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/complexity-cap.md b/complexity-cap.md
index 75fe21ae..2fa4ffdc 100644
--- a/complexity-cap.md
+++ b/complexity-cap.md
@@ -1,6 +1,8 @@
 # Complexity Cap: Bounding Pathological HermitCrab Parses
 
-**Status:** Plan (not started) — sequencing and defaults decided, see §8/§10
+**Status:** Phases 0–3 implemented and committed on `complexity-cap` (stacked on `hc-rustify`,
+see §8); Phase 4 (FieldWorks integration) is a separate follow-up in the FW repo. Sena
+calibration is a ~1% sample pending a full-corpus re-baseline — see §10 items 7–8.
 **Author:** drafted 2026-07-02
 **Related:** PR #446 (hc-rustify performance work), FieldWorks out-of-process HC worker (FW PR #983)
 
@@ -411,13 +413,13 @@ across rustify's 100-file rewrite is not. Concretely:
 
 ## 9. Phases
 
-| Phase | Deliverable | Depends on | Est. size |
-|---|---|---|---|
-| 0 | Branch off `hc-rustify`. Baseline `indonesian`/`sena` on rustify (max steps/time observed → derive generous `MaxParseSteps`/`ParseTimeout` defaults); build 1–2 pathological variants of the indonesian grammar; repro harness | `hc-rustify` | S |
-| 1 | `ParseContext`, `MaxParseSteps` + `ParseTimeout`, soft-stop checks, `ParseDiagnostics` overload, breach re-run with per-rule counters | 0 | M |
-| 2 | `MaxRuleApplicationsPerWord`, `MaxAnalysisShapeGrowth`, cascade depth cap | 1 (shares `ParseContext`) | M |
-| 3 | `GrammarAnalyzer` + HC0001–HC0008, CLI, "Writing performant HC grammars" guide | — (parallelizable) | M–L |
-| 4 | FieldWorks follow-ups: worker DTO status field, FLEx "diagnose word" + parser-report lint surfacing, set conservative caps in HCLoader | 1–3, FW repo | separate effort |
+| Phase | Deliverable | Depends on | Est. size | Status |
+|---|---|---|---|---|
+| 0 | Branch off `hc-rustify`. Baseline `indonesian`/`sena` on rustify (max steps/time observed → derive generous `MaxParseSteps`/`ParseTimeout` defaults); build 1–2 pathological variants of the indonesian grammar; repro harness | `hc-rustify` | S | **Done** (Indonesian fully baselined; Sena ~1% sampled — see §10.7) |
+| 1 | `ParseContext`, `MaxParseSteps` + `ParseTimeout`, soft-stop checks, `ParseDiagnostics` overload, breach re-run with per-rule counters | 0 | M | **Done** (commit b3fd2b55) |
+| 2 | `MaxRuleApplicationsPerWord`, `MaxAnalysisShapeGrowth`, cascade depth cap | 1 (shares `ParseContext`) | M | **Done** (commit e68f0984) |
+| 3 | `GrammarAnalyzer` + HC0001–HC0008, CLI, "Writing performant HC grammars" guide | — (parallelizable) | M–L | **Done** (commit c8a39aeb) |
+| 4 | FieldWorks follow-ups: worker DTO status field, FLEx "diagnose word" + parser-report lint surfacing, set conservative caps in HCLoader | 1–3, FW repo | separate effort | Not started (separate repo) |
 
 ## 10. Open questions
 

From 343515b183ed28e2b3fff0e822eba035207fdca6 Mon Sep 17 00:00:00 2001
From: John Lambert <john_lambert@sil.org>
Date: Thu, 2 Jul 2026 16:18:33 -0400
Subject: [PATCH 5/6] ComplexityCapCorpusTests: report top 5 words by step
 count

Small addition to the ad hoc Phase 0 calibration harness, left uncommitted
from the corpus investigation: keeps a running top-5 (by StepsUsed) instead
of only the single max, so a full-corpus re-baseline (see complexity-cap.md
Section 10 item 7) shows the shape of the tail, not just one data point.
---
 .../ComplexityCapCorpusTests.cs                        | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tests/SIL.Machine.Morphology.HermitCrab.Tests/ComplexityCapCorpusTests.cs b/tests/SIL.Machine.Morphology.HermitCrab.Tests/ComplexityCapCorpusTests.cs
index ec5151bf..58e61145 100644
--- a/tests/SIL.Machine.Morphology.HermitCrab.Tests/ComplexityCapCorpusTests.cs
+++ b/tests/SIL.Machine.Morphology.HermitCrab.Tests/ComplexityCapCorpusTests.cs
@@ -73,6 +73,7 @@ private static void RunCorpus(string name)
         int wordsParsed = 0;
         int wordsSkipped = 0;
         var pathologicalWords = new List<(string Word, int Steps)>();
+        var topWordsBySteps = new List<(string Word, int Steps, long Ms)>();
         foreach (string word in words)
         {
             ParseDiagnostics diagnostics;
@@ -99,6 +100,11 @@ private static void RunCorpus(string name)
             if (diagnostics.BudgetExhausted)
                 pathologicalWords.Add((word, diagnostics.StepsUsed));
 
+            topWordsBySteps.Add((word, diagnostics.StepsUsed, wordSw.ElapsedMilliseconds));
+            topWordsBySteps.Sort((a, b) => b.Steps.CompareTo(a.Steps));
+            if (topWordsBySteps.Count > 5)
+                topWordsBySteps.RemoveAt(topWordsBySteps.Count - 1);
+
             if (diagnostics.StepsUsed > maxSteps)
             {
                 maxSteps = diagnostics.StepsUsed;
@@ -119,6 +125,10 @@ private static void RunCorpus(string name)
                 + $"suggested default MaxParseSteps (100x observed max) = {Math.Max(maxSteps, 1) * 100}"
         );
 
+        TestContext.Out.WriteLine($"{name}: top {topWordsBySteps.Count} words by step count:");
+        foreach ((string word, int steps, long ms) in topWordsBySteps)
+            TestContext.Out.WriteLine($"  '{word}': {steps} steps, {ms}ms");
+
         if (pathologicalWords.Count > 0)
         {
             TestContext.Out.WriteLine(

From c1d7db641e0b9f8849d21d1ab5bc0ee1955963ef Mon Sep 17 00:00:00 2001
From: John Lambert <john_lambert@sil.org>
Date: Thu, 2 Jul 2026 17:29:31 -0400
Subject: [PATCH 6/6] complexity-cap.md: update Sena calibration open items
 with parse-optimization findings

A separate investigation (sharded Release-mode full-corpus scan, see
docs/hermitcrab-parse-algorithm-analysis.md on the sibling parse-optimization
branch, not yet committed anywhere) got much further than this branch's own
single-threaded Debug-mode recalibration attempt, which was aborted after
~1 hour at 283/7,121 Sena words to avoid burning many more hours on
redundant/inferior data.

Updates items 7-8 with that scan's numbers (p90 ~2M steps, ~16% of words
>1M steps, worst observed >=39.9M steps, 30s ParseTimeout trips on dozens
of legitimate words) and adds item 9: cinacemerwa (37.5M steps, 0 valid
parses) crashed the NUnit test host outright, apparently from memory
pressure independent of the step/timeout budgets - the current Layer 1/2
budgets bound steps and wall-clock but not allocations.
---
 complexity-cap.md | 37 +++++++++++++++++++++++++++----------
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/complexity-cap.md b/complexity-cap.md
index 2fa4ffdc..9ce70f7a 100644
--- a/complexity-cap.md
+++ b/complexity-cap.md
@@ -444,13 +444,30 @@ across rustify's 100-file rewrite is not. Concretely:
 6. **HC0004/HC0008 precision**: self-feeding/cycle detection via unification is
    approximate; acceptable false-positive rate for a Warning? Start conservative
    (high-confidence patterns only), widen with field feedback.
-7. **Sena calibration is based on a ~1% sample (72/7,121 words)**, not a full corpus run
-   (see §4.1) — the worst-observed-word figures used to set `DefaultMaxParseSteps`/
-   `DefaultParseTimeout` are a floor, not a proven ceiling. Re-baseline against the full
-   corpus (accept the multi-hour run, or parallelize it) before treating these as final,
-   and specifically check whether any word exceeds the current 50,000,000-step default.
-8. **`DefaultParseTimeout` = 30s will still truncate some legitimate Sena words** (one
-   observed at 105s). Whether 30s is the right number — vs. a larger default, vs. no
-   default timeout with only a step budget, vs. a per-consumer-tunable-only knob with no
-   shipped default at all — is a real product decision that needs field input, not
-   something this investigation can resolve alone.
+7. **Update 2026-07-02, still open:** a separate investigation (sharded 8-way, Release-mode
+   scan of the full Sena corpus — see `docs/hermitcrab-parse-algorithm-analysis.md`,
+   currently uncommitted on a sibling `parse-optimization` branch, not yet landed here)
+   got much further than this branch's own single-threaded Debug-mode attempt (which was
+   killed after ~1 hour at 283/7,121 words — some individual words alone took 50+ seconds
+   at that build/threading combination, and the earlier ~1% sample already showed the
+   corpus has a long tail). That scan found: p90 ≈ 2,000,000 steps; ~16% of words exceed
+   1,000,000 steps; worst observed so far ≥ 39,900,000 steps (`kukucitirani`) — under the
+   50,000,000-step default, but with much less headroom than the original ~1% sample
+   suggested, and still not confirmed as a true corpus-wide maximum. Re-baselining against
+   a complete, verified full-corpus run (ideally the sharded/Release harness, not this
+   branch's test-suite-based one) remains open.
+8. **Update 2026-07-02, confirmed, not yet resolved:** the same investigation confirms
+   `DefaultParseTimeout` = 30s trips on *dozens* of legitimate Sena words (single-threaded
+   times of 100–250s observed), not just the one word noted in the original finding above.
+   The product-decision question (raise the default? drop it in favor of the step budget
+   alone? make it a no-shipped-default, per-consumer-only knob?) still needs field input.
+9. **New finding, 2026-07-02:** the same investigation reports that `cinacemerwa` — Sena's
+   most expensive known word (37.5M steps, and notably a word that yields *zero* valid
+   parses) — crashed the NUnit test host outright, apparently from memory pressure during
+   candidate-explosion, independent of the step/timeout budgets (which bound *steps*, not
+   *allocations*). This means the current Layer 1/2 budgets do not fully protect against a
+   pathological word exhausting memory before it exhausts its step or time budget. Whether
+   this needs a third bound (e.g. a candidate-count or allocation ceiling) or is
+   sufficiently addressed by the algorithmic fixes under investigation on `parse-optimization`
+   (which would shrink the candidate set directly, see that branch's
+   `docs/hermitcrab-parse-algorithm-analysis.md` §4) is undecided.