Fix XmlLanguageWriter crash on dangling morpheme/allomorph co-occurrence rules#450
Open
johnml1135 wants to merge 1 commit into
Open
Fix XmlLanguageWriter crash on dangling morpheme/allomorph co-occurrence rules#450johnml1135 wants to merge 1 commit into
johnml1135 wants to merge 1 commit into
Conversation
…nce rules A Language built programmatically (e.g. FieldWorks' HCLoader) can populate MorphemeCoOccurrenceRules/AllomorphCoOccurrenceRules with a morpheme or allomorph that never ends up written under any Stratum -- e.g. an ad-hoc prohibition still targets an affix in a slot whose owning inflectional-affix template was disabled. XmlLanguageWriter.WriteLanguage assumed every referenced morpheme/allomorph would already have an id, so it threw KeyNotFoundException instead of writing the rest of the language. Found via GenerateHCConfig against a real Amharic FieldWorks project. Skip rules that reference an unresolvable morpheme/allomorph instead of throwing, since dropping one unwritable constraint is preferable to failing the whole export. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #450 +/- ##
==========================================
+ Coverage 73.30% 73.33% +0.02%
==========================================
Files 443 443
Lines 37204 37214 +10
Branches 5110 5110
==========================================
+ Hits 27273 27290 +17
+ Misses 8805 8801 -4
+ Partials 1126 1123 -3 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What was broken
When FieldWorks exports one of its projects into a HermitCrab grammar file, it builds an in-memory
Languageobject (via its ownHCLoader) and then asks this library'sXmlLanguageWriterto save it as XML. Writing crashed partway through on a real project (Amharic) with:In plain terms: a
Languagecan carry a rule like "morpheme A can't appear next to morpheme B" (a "co-occurrence rule"). Normally, every morpheme mentioned in such a rule also gets written out elsewhere in the file (as a lexical entry or an affix rule). But it's possible for FieldWorks to hand the writer a rule that points at a morpheme which never gets written — in this case, an affix that belongs to an inflectional template FieldWorks had disabled. The writer assumed that couldn't happen, looked up an ID for that morpheme in a dictionary, didn't find one, and blew up instead of finishing the file.The fix
XmlLanguageWriternow checks, before writing a co-occurrence rule, whether both morphemes (or allomorphs) it mentions actually got written. If either one didn't, that one rule is skipped instead of crashing the whole export. Everything else in the file still gets written normally.This is safe because a co-occurrence rule is just an extra constraint layered on top of morphemes that already exist independently. If one side of the constraint isn't in the exported grammar at all, the constraint has nothing to constrain — dropping it doesn't change the meaning of anything else in the file.
Research: could this same kind of bug exist elsewhere?
Before fixing just this one spot, I checked whether the writer makes the same "this will definitely have been written" assumption anywhere else. It does, in three places: an allomorph's stem-name restriction, an affix rule's required stem-name, and a lexical entry's family grouping.
I did not apply the same "just skip it" fix to those, for two reasons:
Verified
Worth knowing, not addressed here
Test plan
XmlLanguageWriterCoOccurrenceRuleTests.cs(4 tests)SIL.Machine.Morphology.HermitCrab.Testssuite passes (65/65).fwdataexport, which now completes successfully🤖 Generated with Claude Code
This change is