Skip to content

Port 'Better USFM versification analysis' from machine#326

Open
ddaspit wants to merge 1 commit into
mainfrom
port-better-usfm-versification-analysis
Open

Port 'Better USFM versification analysis' from machine#326
ddaspit wants to merge 1 commit into
mainfrom
port-better-usfm-versification-analysis

Conversation

@ddaspit

@ddaspit ddaspit commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Ports machine PR #432 — refactors USFM versification error detection into a richer diagnostic analysis API.

Analyzer handler (machine/corpora/usfm_versification_analyzer_handler.py)

Replaces usfm_versification_error_detector.py. Introduces:

  • UsfmVersificationDiagnosticTypeMISSING, EXTRA, INVALID, INCORRECT_VERSE_SEGMENT, UNSUPPORTED_VERSE_RANGE.
  • UsfmVersificationDiagnostic — carries references, filename, line_numbers, and a computed num_affected_verses; merges contiguous references.
  • UsfmVersificationAnalysis — aggregate result with total_num_affected_verses, total_num_encountered_verses, diagnostics, and project_settings.
  • UsfmVersificationAnalyzerHandler — walks expected (from the versification) vs. encountered verses, classifies missing/extra/invalid verses and segments, and flags verse ranges that cross chapter boundaries when mapped to the Original versification.

Analyzer base + subclasses

  • ParatextProjectVersificationErrorDetectorBaseUsfmVersificationAnalyzerBase; get_usfm_versification_errors(...)analyze_usfm_versification(book_ids_and_chapters, handler) returning a UsfmVersificationAnalysis, with optional per-book chapter filtering.
  • FileParatextProjectVersificationErrorDetectorFileUsfmVersificationAnalyzer; ZipParatextProjectVersificationErrorDetectorZipUsfmVersificationAnalyzer.
  • Updated machine/corpora/__init__.py exports accordingly.

Versification

Versification.all_included_verses(only_chapters=None) now accepts an optional Dict[int, Optional[Set[int]]] filter mapping book number → set of chapters (or None for all chapters in that book).

Tests

  • Ported UsfmVersificationAnalyzerTeststests/corpora/test_usfm_versification_analyzer.py (replaces test_usfm_versification_error_detector.py).
  • Renamed test helper MemoryParatextProjectVersificationErrorDetectorMemoryUsfmVersificationAnalyzer.
  • Updated test_usfm_manual.py to use ZipUsfmVersificationAnalyzer.analyze_usfm_versification.
  • Added the two new chapter-filter cases to test_all_included_verses in test_versification.py.

Note on one deviation

The C# DefaultParatextProjectSettings defaults to the English versification, whereas machine.py's has long defaulted to Original (a pre-existing divergence the source PR did not touch). The new cross-chapter UnsupportedVerseRange test relies on the English→Original mapping, so instead of changing a shared test utility used by many unrelated tests, this port passes versification=ENGLISH_VERSIFICATION explicitly in that single test. It reproduces the C# expectations exactly.

Verification

black, isort, flake8, and pyright all pass clean; pytest = 802 passed, 3 skipped.

Closes #319

🤖 Generated with Claude Code


This change is Reviewable

@codecov-commenter

codecov-commenter commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.62533% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.78%. Comparing base (3520bdc) to head (6848f80).

Files with missing lines Patch % Lines
...ine/corpora/usfm_versification_analyzer_handler.py 96.20% 6 Missing ⚠️
tests/corpora/test_usfm_manual.py 25.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #326      +/-   ##
==========================================
+ Coverage   91.76%   91.78%   +0.02%     
==========================================
  Files         362      362              
  Lines       23623    23735     +112     
==========================================
+ Hits        21678    21786     +108     
- Misses       1945     1949       +4     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ddaspit ddaspit force-pushed the port-better-usfm-versification-analysis branch 2 times, most recently from 8b7e80a to 6532d87 Compare July 1, 2026 15:24
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ddaspit ddaspit force-pushed the port-better-usfm-versification-analysis branch from 6532d87 to 6848f80 Compare July 1, 2026 18:14
@ddaspit ddaspit marked this pull request as ready for review July 1, 2026 18:23

@Enkidu93 Enkidu93 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@Enkidu93 partially reviewed 12 files and all commit messages, and made 2 comments.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on ddaspit).


.claude/skills/port-pr/SKILL.md line 56 at r2 (raw file):

| `tests/SIL.Machine.Tests/<PascalArea>/<PascalCase>Tests.cs` (or the matching `*.Tests` project) | `tests/<area>/test_<snake_case>.py` |
| `PascalCase` methods / `camelCase` locals / `_camelCase` fields | `snake_case` functions/vars |
| `IReadOnlyList<T>` / `IDictionary<,>` / `ISet<T>` etc. | `Sequence[T]`/`list` / `Mapping`/`dict` / `Set`/`set` etc. |

I wonder if you want to specify only Set, for example, not also set since I think the convention in machine.py has been to use the uppercase types from typing for hints. Also, I think we only use Mapping in one rather untouched class - are you thinking Mapping for IReadonlyDictionary? But maybe these won't really affect things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Port 'Better USFM versification analysis'

3 participants