Skip to content

Fix XmlLanguageWriter crash on dangling morpheme/allomorph co-occurrence rules#450

Open
johnml1135 wants to merge 1 commit into
masterfrom
fix/xml-writer-dangling-cooccurrence-rules
Open

Fix XmlLanguageWriter crash on dangling morpheme/allomorph co-occurrence rules#450
johnml1135 wants to merge 1 commit into
masterfrom
fix/xml-writer-dangling-cooccurrence-rules

Conversation

@johnml1135

@johnml1135 johnml1135 commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

What was broken

When FieldWorks exports one of its projects into a HermitCrab grammar file, it builds an in-memory Language object (via its own HCLoader) and then asks this library's XmlLanguageWriter to save it as XML. Writing crashed partway through on a real project (Amharic) with:

KeyNotFoundException at XmlLanguageWriter.WriteMorphemeCoOccurrenceRule

In plain terms: a Language can carry a rule like "morpheme A can't appear next to morpheme B" (a "co-occurrence rule"). Normally, every morpheme mentioned in such a rule also gets written out elsewhere in the file (as a lexical entry or an affix rule). But it's possible for FieldWorks to hand the writer a rule that points at a morpheme which never gets written — in this case, an affix that belongs to an inflectional template FieldWorks had disabled. The writer assumed that couldn't happen, looked up an ID for that morpheme in a dictionary, didn't find one, and blew up instead of finishing the file.

The fix

XmlLanguageWriter now checks, before writing a co-occurrence rule, whether both morphemes (or allomorphs) it mentions actually got written. If either one didn't, that one rule is skipped instead of crashing the whole export. Everything else in the file still gets written normally.

This is safe because a co-occurrence rule is just an extra constraint layered on top of morphemes that already exist independently. If one side of the constraint isn't in the exported grammar at all, the constraint has nothing to constrain — dropping it doesn't change the meaning of anything else in the file.

Research: could this same kind of bug exist elsewhere?

Before fixing just this one spot, I checked whether the writer makes the same "this will definitely have been written" assumption anywhere else. It does, in three places: an allomorph's stem-name restriction, an affix rule's required stem-name, and a lexical entry's family grouping.

I did not apply the same "just skip it" fix to those, for two reasons:

  1. They're a different kind of thing. A co-occurrence rule is optional extra information — dropping it just means one constraint is missing. A stem-name or family reference is part of what the entry itself is — silently dropping it wouldn't just lose a constraint, it would quietly change which word forms the grammar accepts, without any error to say so. That's worse than crashing: a crash tells you something is wrong; silent corruption doesn't.
  2. I checked whether it can actually happen, and it can't (today). I traced how FieldWorks' own exporter builds these three properties and found it already guards them correctly — it only ever attaches a stem-name to something if that stem-name was also added to the exported language, and the "family" field isn't used by FieldWorks' exporter at all. So there's no real bug there right now, just a theoretical one if that loader code changes in the future. Since there's no live problem to fix, and the right fix for that case would be different anyway (a clearer error, not a silent skip), I left it alone rather than "fixing" something that isn't actually broken.

Verified

  • 4 new tests reproduce the exact crash scenario (a dangling primary morpheme, a dangling "other" morpheme in the rule, a dangling allomorph, and a control case proving normal rules still get written) — all pass.
  • Full existing test suite still passes (65/65), including the pre-existing round-trip test that guards against any change in how normal files are written.
  • Re-ran FieldWorks' real export tool against the real Amharic project with the fix in place: it now produces a complete grammar file instead of crashing.

Worth knowing, not addressed here

  • When a rule gets skipped, nothing currently reports that it happened — someone exporting a project has no way to know a constraint was silently dropped. Might be worth surfacing a count of skipped rules later.
  • If a rule mentions several morphemes and only one of them is the "unwritten" kind, the whole rule is dropped, not just the part that doesn't apply. Not an issue for the case that prompted this fix, but worth knowing if a messier project hits it.

Test plan

  • New XmlLanguageWriterCoOccurrenceRuleTests.cs (4 tests)
  • Full SIL.Machine.Morphology.HermitCrab.Tests suite passes (65/65)
  • Manually re-verified against the real Amharic .fwdata export, which now completes successfully

🤖 Generated with Claude Code


This change is Reviewable

…nce rules

A Language built programmatically (e.g. FieldWorks' HCLoader) can populate
MorphemeCoOccurrenceRules/AllomorphCoOccurrenceRules with a morpheme or
allomorph that never ends up written under any Stratum -- e.g. an ad-hoc
prohibition still targets an affix in a slot whose owning inflectional-affix
template was disabled. XmlLanguageWriter.WriteLanguage assumed every
referenced morpheme/allomorph would already have an id, so it threw
KeyNotFoundException instead of writing the rest of the language. Found via
GenerateHCConfig against a real Amharic FieldWorks project.

Skip rules that reference an unresolvable morpheme/allomorph instead of
throwing, since dropping one unwritable constraint is preferable to failing
the whole export.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.33%. Comparing base (60c854b) to head (d16f1bc).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #450      +/-   ##
==========================================
+ Coverage   73.30%   73.33%   +0.02%     
==========================================
  Files         443      443              
  Lines       37204    37214      +10     
  Branches     5110     5110              
==========================================
+ Hits        27273    27290      +17     
+ Misses       8805     8801       -4     
+ Partials     1126     1123       -3     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants