Out-of-process Server-GC HermitCrab worker for bulk parsing#983
Open
johnml1135 wants to merge 3 commits into
Open
Out-of-process Server-GC HermitCrab worker for bulk parsing#983johnml1135 wants to merge 3 commits into
johnml1135 wants to merge 3 commits into
Conversation
This comment has been minimized.
This comment has been minimized.
Offloads HermitCrab morphological parsing to a separate Server-GC process so bulk "Parse All Words" scales without changing FieldWorks.exe's own Workstation GC (a process-wide setting fixed at startup, needed for UI responsiveness). New (Src/LexText/HCWorker): HCWorker.exe hosts one SIL.Machine HermitCrab Morpher behind a WCF net.pipe service (App.config: gcServer). It parses whole batches with its own internal parallelism and projects each analysis to a flat MorphDto[] (the id-collection half of HCParser.GetMorphs, run where the Word/Allomorph/Morpheme graph lives). Consumes SIL.Machine.Morphology.HermitCrab as the CPM-pinned NuGet package and references ParserCore for the shared contract + HCParser key constants (mirrors the sibling HC exe GenerateHCConfig). ParserCore: - IHCWorkerService + DTOs: the single shared WCF contract. - PipeBindingFactory: one net.pipe binding for both ends (256 MB cap; real grammars are several MB, the 64 KB default fails obscurely). - HCWorkerClient / HCWorkerProcessManager: WCF proxy with respawn+replay+retry-once, and a lazy spawn/watchdog modeled on FLExBridgeHelper. - HCParser: bulk + interactive parsing route through the worker; a GetMorphs overload runs the LCM-resolution half over MorphDto[]; FormID/FormID2 made public. Try-a-Word tracing stays in-process. - ParserWorker: bulk batch routes through HCParser.ParseWordsBatch (one WCF call), with a fallback to the per-wordform path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fixes surfaced by build.ps1 (restore is FieldWorks.sln-based, compile is FieldWorks.proj traversal-based): - Register HCWorker + HCWorkerTests in FieldWorks.sln so restore generates their assets (else NETSDK1004 at build). - HCWorker.csproj: link ..\..\CommonAssemblyInfo.cs at the correct depth, and exclude the nested HCWorkerTests/** from the SDK glob (mirrors ParserCore excluding ParserCoreTests). - HCWorkerService.cs: add using SIL.Machine.Morphology.HermitCrab.MorphologicalRules. - HCWorkerTests.csproj: add the SIL.LCModel.Core.Tests / SIL.LCModel.Utils.Tests / FwUtilsTests / SIL.TestUtilities refs the injected AssemblyInfoForUiIndependentTests requires. - HCWorkerServiceTests.cs: the multi-morph test builds a rule against the released SIL.Machine 3.8.2 package, whose rule patterns are Pattern<Word, ShapeNode> (not the branch's Pattern<Word, int>). HCWorker + HCWorkerTests build clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
14435fc to
936a289
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this is
Offloads HermitCrab morphological parsing to a separate Server-GC process so bulk "Parse All Words" scales (~10–12×) without changing FieldWorks.exe's own Workstation GC (a process-wide setting fixed at startup, needed for UI responsiveness). Rebased on latest
main; 18 files (+~1954/−30), all worker + parser integration.Review by change group
Src/LexText/HCWorker/(HCWorker.csproj,Program.cs,HCWorkerService.cs,App.config)HCWorker.exehosts oneSIL.MachineHermitCrabMorpherbehind a WCF net.pipe service (App.config:gcServer). Parses whole batches with its own internal parallelism and projects each analysis to a flatMorphDto[](the id-collection half ofHCParser.GetMorphs, run where theWord/Allomorph/Morphemegraph lives). ConsumesSIL.Machine.Morphology.HermitCrabas the CPM-pinned NuGet package; referencesParserCore(mirrors the sibling HC exeGenerateHCConfig).ParserCore/IHCWorkerService.cs,HCWorkerClient.cs,HCWorkerProcessManager.cs,PipeBindingFactory.csIHCWorkerServicecontract + DTOs used by both the worker and the client (no hand-synced duplicate).HCWorkerClientis a WCF proxy with respawn+replay+retry-once;HCWorkerProcessManagerdoes lazy spawn + watchdog (modeled onFLExBridgeHelper).PipeBindingFactoryis the one binding definition (256 MB cap — real grammars are several MB).ParserCore/HCParser.csGetMorphsoverload runs the LCM-resolution half overMorphDto[];FormID/FormID2made public so the worker keys the samePropertiesbag. Try-a-Word tracing stays in-process (its trace manager touches LCM inline).ParserCore/ParserWorker.cs,ParserScheduler.cs,ParseFiler.csParserWorkerroutes a whole batch throughHCParser.ParseWordsBatch(one WCF call) with a fallback to the per-wordform path; scheduler/filer adjustments to support it.HCWorker/HCWorkerTests/*,ParserCore/ParserCoreTests/ParseWorkerTests.cs,DisambiguateInFLExDBTests/ParserConcurrencyBenchmark.csFieldWorks.slnHCWorker+HCWorkerTests(restore is.sln-based; the traversal build picks them up fromSrc/**).Status
./build.ps1passes with the worker. TheHCWorker+HCWorkerTestsprojects build clean.Relationship to the SIL.Machine RUSTIFY work
Independent of it. The worker consumes the released
SIL.Machine.Morphology.HermitCrabpackage (SilMachineVersion= 3.8.2), so it works today. When FieldWorks later bumps to a RUSTIFY release, the worker gets the allocation gains for free (it uses only the publicMorpher/ParseWordAPI). Note: that future bump will require a mechanicalPattern<Word, ShapeNode>→Pattern<Word, int>migration inHCLoader.cs(it constructs HC rules) — see the SIL.Machine PR's API note; the worker itself needs no such change.🤖 Generated with Claude Code
This change is