Skip to content

Fix RegFree manifest-file race under parallel MSBuild#988

Open
johnml1135 wants to merge 1 commit into
mainfrom
fix-regfree-manifest-race
Open

Fix RegFree manifest-file race under parallel MSBuild#988
johnml1135 wants to merge 1 commit into
mainfrom
fix-regfree-manifest-race

Conversation

@johnml1135

@johnml1135 johnml1135 commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Six EXE projects (FieldWorks, LCMBrowser, UnicodeCharEditor, GenerateHCConfig, ComManifestTestHost, NativeBuild) each import Build/RegFree.targets and generate manifests for the same shared managed assemblies (FwUtils.dll, SimpleRootSite.dll, ManagedVwWindow.dll) into the same $(OutDir). Under a parallel MSBuild build, their CreateComponentManifests targets can run in different MSBuild worker processes at the same time and race to read/write the exact same manifest file, throwing an IOException that fails the whole build.
  • This was hit intermittently in CI on phase1-base (PR Phase-1 base: Avalonia migration spine (UIMode defaults Legacy) #964): RegFree.targets(91,3): error : IOException: The process cannot access the file 'FwUtils.manifest' because it is being used by another process. A retry of the same CI job passed cleanly, confirming it's a timing-dependent race, not a code regression.
  • Root cause confirmed by inspecting Build/Src/FwBuildTasks/RegFree.cs: the crash is at XmlWriter.Create(manifestFile, settings) (line 349 in the observed stack trace), which is exactly where multiple processes writing the same output path collide.

Fix

RegFree.Execute()'s full read-modify-write of the manifest file (load existing manifest → process assemblies/DLLs/fragments → write result) is now wrapped in a cross-process named Mutex keyed by the resolved output path (ManifestLockName). Invocations targeting the same manifest file now serialize; invocations for different manifest files are unaffected and still run fully in parallel — this doesn't reduce build parallelism for the common case, only for the specific shared-file collision.

string.GetHashCode() is deliberately not used for the mutex name: .NET randomizes string hash codes per process for security, so two different MSBuild worker processes could compute different hash codes for the identical path, defeating cross-process synchronization entirely. MD5 (not used for anything security-sensitive here, just as a stable fingerprint) is deterministic across processes, machines, and .NET versions.

Test plan

  • Added RegFreeConcurrencyTests.Execute_ConcurrentInvocationsTargetingSameManifest_AllSucceedAndProduceValidXml: runs 12 concurrent RegFree.Execute() calls against the same manifest path (simulating what the 6 EXE projects' parallel MSBuild nodes do) and asserts they all succeed and produce valid, uncorrupted XML.
  • Verified the test actually catches the regression: temporarily reverted the mutex fix, confirmed the test fails reliably (3/3 runs), then restored the fix and confirmed it passes reliably (5/5 runs).
  • Full .\build.ps1 -BuildTests succeeds.
  • Full FwBuildTasksTests suite: 146 passed, 3 pre-existing skips, 0 failures.

🤖 Generated with Claude Code


This change is Reviewable

Six EXE projects (FieldWorks, LCMBrowser, UnicodeCharEditor,
GenerateHCConfig, ComManifestTestHost, NativeBuild) each import
RegFree.targets and generate manifests for the same shared managed
assemblies (FwUtils.dll, SimpleRootSite.dll, ManagedVwWindow.dll) into
the same $(OutDir). Under a parallel MSBuild build, their
CreateComponentManifests targets can run in different MSBuild worker
processes at the same time and race to read/write the exact same
manifest file, throwing an IOException that fails the whole build
(observed intermittently in CI, e.g. FieldWorks PR #964).

Wrap RegFree.Execute()'s read-modify-write of the manifest file in a
cross-process named Mutex keyed by the resolved output path, so
concurrent invocations targeting the same file serialize instead of
racing; invocations for different manifest files are unaffected and
still run fully in parallel. string.GetHashCode() is deliberately not
used for the mutex name since .NET randomizes it per process, which
would defeat cross-process synchronization - MD5 is used instead as a
deterministic fingerprint.

Added a regression test that runs 12 concurrent RegFree.Execute() calls
against the same manifest path and asserts they all succeed and produce
valid, uncorrupted XML. Verified it actually catches the regression:
temporarily reverted the mutex fix, confirmed the test fails reliably
(3/3 runs), then restored the fix and confirmed it passes reliably
(5/5 runs).

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

NUnit Tests

    1 files  ±0      1 suites  ±0   10m 30s ⏱️ -9s
4 299 tests ±0  4 226 ✅ ±0  73 💤 ±0  0 ❌ ±0 
4 308 runs  ±0  4 235 ✅ ±0  73 💤 ±0  0 ❌ ±0 

Results for commit 33cdfd2. ± Comparison against base commit 323a022.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant