Skip to content

Out-of-process Server-GC HermitCrab worker for bulk parsing#983

Open
johnml1135 wants to merge 3 commits into
mainfrom
hc-optimisations
Open

Out-of-process Server-GC HermitCrab worker for bulk parsing#983
johnml1135 wants to merge 3 commits into
mainfrom
hc-optimisations

Conversation

@johnml1135

@johnml1135 johnml1135 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

What this is

Offloads HermitCrab morphological parsing to a separate Server-GC process so bulk "Parse All Words" scales (~10–12×) without changing FieldWorks.exe's own Workstation GC (a process-wide setting fixed at startup, needed for UI responsiveness). Rebased on latest main; 18 files (+~1954/−30), all worker + parser integration.

Review by change group

Group Files What & why
Worker host (new net48 exe) Src/LexText/HCWorker/ (HCWorker.csproj, Program.cs, HCWorkerService.cs, App.config) HCWorker.exe hosts one SIL.Machine HermitCrab Morpher behind a WCF net.pipe service (App.config: gcServer). Parses whole batches with its own internal parallelism and projects each analysis to a flat MorphDto[] (the id-collection half of HCParser.GetMorphs, run where the Word/Allomorph/Morpheme graph lives). Consumes SIL.Machine.Morphology.HermitCrab as the CPM-pinned NuGet package; references ParserCore (mirrors the sibling HC exe GenerateHCConfig).
Shared WCF contract + client ParserCore/IHCWorkerService.cs, HCWorkerClient.cs, HCWorkerProcessManager.cs, PipeBindingFactory.cs One IHCWorkerService contract + DTOs used by both the worker and the client (no hand-synced duplicate). HCWorkerClient is a WCF proxy with respawn+replay+retry-once; HCWorkerProcessManager does lazy spawn + watchdog (modeled on FLExBridgeHelper). PipeBindingFactory is the one binding definition (256 MB cap — real grammars are several MB).
HCParser routing ParserCore/HCParser.cs Bulk + interactive parsing route through the worker; a GetMorphs overload runs the LCM-resolution half over MorphDto[]; FormID/FormID2 made public so the worker keys the same Properties bag. Try-a-Word tracing stays in-process (its trace manager touches LCM inline).
Parser batch/scheduler integration ParserCore/ParserWorker.cs, ParserScheduler.cs, ParseFiler.cs ParserWorker routes a whole batch through HCParser.ParseWordsBatch (one WCF call) with a fallback to the per-wordform path; scheduler/filer adjustments to support it.
Tests + benchmark HCWorker/HCWorkerTests/*, ParserCore/ParserCoreTests/ParseWorkerTests.cs, DisambiguateInFLExDBTests/ParserConcurrencyBenchmark.cs Worker round-trip + DTO-extraction unit tests; parser-worker tests; a headless before/after concurrency benchmark.
Build wiring FieldWorks.sln Registers HCWorker + HCWorkerTests (restore is .sln-based; the traversal build picks them up from Src/**).

Status

./build.ps1 passes with the worker. The HCWorker + HCWorkerTests projects build clean.

Relationship to the SIL.Machine RUSTIFY work

Independent of it. The worker consumes the released SIL.Machine.Morphology.HermitCrab package (SilMachineVersion = 3.8.2), so it works today. When FieldWorks later bumps to a RUSTIFY release, the worker gets the allocation gains for free (it uses only the public Morpher/ParseWord API). Note: that future bump will require a mechanical Pattern<Word, ShapeNode>Pattern<Word, int> migration in HCLoader.cs (it constructs HC rules) — see the SIL.Machine PR's API note; the worker itself needs no such change.

🤖 Generated with Claude Code


This change is Reviewable

@github-actions

This comment has been minimized.

johnml1135 and others added 2 commits July 1, 2026 12:37
Offloads HermitCrab morphological parsing to a separate Server-GC
process so bulk "Parse All Words" scales without changing
FieldWorks.exe's own Workstation GC (a process-wide setting fixed at
startup, needed for UI responsiveness).

New (Src/LexText/HCWorker): HCWorker.exe hosts one SIL.Machine
HermitCrab Morpher behind a WCF net.pipe service (App.config:
gcServer). It parses whole batches with its own internal parallelism
and projects each analysis to a flat MorphDto[] (the id-collection
half of HCParser.GetMorphs, run where the Word/Allomorph/Morpheme
graph lives). Consumes SIL.Machine.Morphology.HermitCrab as the
CPM-pinned NuGet package and references ParserCore for the shared
contract + HCParser key constants (mirrors the sibling HC exe
GenerateHCConfig).

ParserCore:
- IHCWorkerService + DTOs: the single shared WCF contract.
- PipeBindingFactory: one net.pipe binding for both ends (256 MB cap;
  real grammars are several MB, the 64 KB default fails obscurely).
- HCWorkerClient / HCWorkerProcessManager: WCF proxy with
  respawn+replay+retry-once, and a lazy spawn/watchdog modeled on
  FLExBridgeHelper.
- HCParser: bulk + interactive parsing route through the worker; a
  GetMorphs overload runs the LCM-resolution half over MorphDto[];
  FormID/FormID2 made public. Try-a-Word tracing stays in-process.
- ParserWorker: bulk batch routes through HCParser.ParseWordsBatch
  (one WCF call), with a fallback to the per-wordform path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fixes surfaced by build.ps1 (restore is FieldWorks.sln-based, compile
is FieldWorks.proj traversal-based):
- Register HCWorker + HCWorkerTests in FieldWorks.sln so restore
  generates their assets (else NETSDK1004 at build).
- HCWorker.csproj: link ..\..\CommonAssemblyInfo.cs at the correct
  depth, and exclude the nested HCWorkerTests/** from the SDK glob
  (mirrors ParserCore excluding ParserCoreTests).
- HCWorkerService.cs: add using
  SIL.Machine.Morphology.HermitCrab.MorphologicalRules.
- HCWorkerTests.csproj: add the SIL.LCModel.Core.Tests /
  SIL.LCModel.Utils.Tests / FwUtilsTests / SIL.TestUtilities refs the
  injected AssemblyInfoForUiIndependentTests requires.
- HCWorkerServiceTests.cs: the multi-morph test builds a rule against
  the released SIL.Machine 3.8.2 package, whose rule patterns are
  Pattern<Word, ShapeNode> (not the branch's Pattern<Word, int>).

HCWorker + HCWorkerTests build clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

NUnit Tests

    1 files  ±0      1 suites  ±0   10m 32s ⏱️ +33s
4 306 tests +7  4 232 ✅ +6  74 💤 +1  0 ❌ ±0 
4 315 runs  +7  4 241 ✅ +6  74 💤 +1  0 ❌ ±0 

Results for commit 936a289. ± Comparison against base commit 471dd29.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant