Port 'Better USFM versification analysis' from machine#326
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #326 +/- ##
==========================================
+ Coverage 91.76% 91.78% +0.02%
==========================================
Files 362 362
Lines 23623 23735 +112
==========================================
+ Hits 21678 21786 +108
- Misses 1945 1949 +4 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
8b7e80a to
6532d87
Compare
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6532d87 to
6848f80
Compare
Enkidu93
left a comment
There was a problem hiding this comment.
@Enkidu93 partially reviewed 12 files and all commit messages, and made 2 comments.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on ddaspit).
.claude/skills/port-pr/SKILL.md line 56 at r2 (raw file):
| `tests/SIL.Machine.Tests/<PascalArea>/<PascalCase>Tests.cs` (or the matching `*.Tests` project) | `tests/<area>/test_<snake_case>.py` | | `PascalCase` methods / `camelCase` locals / `_camelCase` fields | `snake_case` functions/vars | | `IReadOnlyList<T>` / `IDictionary<,>` / `ISet<T>` etc. | `Sequence[T]`/`list` / `Mapping`/`dict` / `Set`/`set` etc. |
I wonder if you want to specify only Set, for example, not also set since I think the convention in machine.py has been to use the uppercase types from typing for hints. Also, I think we only use Mapping in one rather untouched class - are you thinking Mapping for IReadonlyDictionary? But maybe these won't really affect things.
Ports machine PR #432 — refactors USFM versification error detection into a richer diagnostic analysis API.
Analyzer handler (
machine/corpora/usfm_versification_analyzer_handler.py)Replaces
usfm_versification_error_detector.py. Introduces:UsfmVersificationDiagnosticType—MISSING,EXTRA,INVALID,INCORRECT_VERSE_SEGMENT,UNSUPPORTED_VERSE_RANGE.UsfmVersificationDiagnostic— carriesreferences,filename,line_numbers, and a computednum_affected_verses; merges contiguous references.UsfmVersificationAnalysis— aggregate result withtotal_num_affected_verses,total_num_encountered_verses,diagnostics, andproject_settings.UsfmVersificationAnalyzerHandler— walks expected (from the versification) vs. encountered verses, classifies missing/extra/invalid verses and segments, and flags verse ranges that cross chapter boundaries when mapped to the Original versification.Analyzer base + subclasses
ParatextProjectVersificationErrorDetectorBase→UsfmVersificationAnalyzerBase;get_usfm_versification_errors(...)→analyze_usfm_versification(book_ids_and_chapters, handler)returning aUsfmVersificationAnalysis, with optional per-book chapter filtering.FileParatextProjectVersificationErrorDetector→FileUsfmVersificationAnalyzer;ZipParatextProjectVersificationErrorDetector→ZipUsfmVersificationAnalyzer.machine/corpora/__init__.pyexports accordingly.Versification
Versification.all_included_verses(only_chapters=None)now accepts an optionalDict[int, Optional[Set[int]]]filter mapping book number → set of chapters (orNonefor all chapters in that book).Tests
UsfmVersificationAnalyzerTests→tests/corpora/test_usfm_versification_analyzer.py(replacestest_usfm_versification_error_detector.py).MemoryParatextProjectVersificationErrorDetector→MemoryUsfmVersificationAnalyzer.test_usfm_manual.pyto useZipUsfmVersificationAnalyzer.analyze_usfm_versification.test_all_included_versesintest_versification.py.Note on one deviation
The C#
DefaultParatextProjectSettingsdefaults to the English versification, whereas machine.py's has long defaulted to Original (a pre-existing divergence the source PR did not touch). The new cross-chapterUnsupportedVerseRangetest relies on the English→Original mapping, so instead of changing a shared test utility used by many unrelated tests, this port passesversification=ENGLISH_VERSIFICATIONexplicitly in that single test. It reproduces the C# expectations exactly.Verification
black,isort,flake8, andpyrightall pass clean;pytest= 802 passed, 3 skipped.Closes #319
🤖 Generated with Claude Code
This change is