Fix watcher degradation on watch exhaustion and prolonged lock contention by thismilktea · Pull Request #877 · colbymchenry/codegraph

thismilktea · 2026-06-14T08:19:14Z

Closes #876

Summary

This PR hardens the live file watcher around two reliability failure modes:

Watch-resource exhaustion (EMFILE / ENFILE) now disables live watching cleanly instead of leaving the watcher half-broken.
Prolonged sync lock contention (LockUnavailableError) no longer retries forever at the normal debounce cadence; it now uses bounded backoff and eventually degrades auto-sync explicitly.

The goal is to fail closed and clearly once live watching is no longer trustworthy, while preserving the current behavior for normal edits and short-lived contention.

What changed

Watch exhaustion

Detects watch-resource exhaustion more explicitly
Degrades/stops the watcher instead of just logging forever
Emits a single actionable warning
Adds an onDegraded callback so callers can observe permanent watcher degradation

Lock contention

Keeps the current quiet behavior for brief lock contention
Adds bounded retry backoff for repeated LockUnavailableError
Stops infinite normal-debounce retries under long-lived contention
Degrades auto-sync after the retry threshold is crossed

Internal cleanup

Separates normal debounce scheduling from retry scheduling
Tightens exhaustion detection so message matching is only used as a fallback when no err.code is available

Why

Before this change, the watcher could remain "alive" after it had effectively stopped being trustworthy:

EMFILE / ENFILE could leave live watching unusable without a clean degraded/off transition
Prolonged lock contention could keep the watcher retrying forever with no terminal state
Callers could continue assuming auto-sync was still working even while the index drifted stale

This is especially problematic for long-running MCP/daemon sessions.

Tests

Added / extended watcher tests for:

Startup watch exhaustion
Runtime recursive watcher exhaustion
Prolonged LockUnavailableError degradation
Degraded-state callback notification

Verified with:

npx vitest run __tests__/watcher.test.ts __tests__/watch-policy.test.ts

…tion

@thismilktea

…contention (#891) The live file watcher could stay "alive" after it had stopped being trustworthy. EMFILE/ENFILE watch-resource exhaustion only logged (and was silently tolerated on the Linux per-directory path), and prolonged LockUnavailableError retried forever at the normal debounce cadence — both left auto-sync dead while the index silently drifted stale. Especially bad for long-running MCP/daemon sessions. Add a one-way degrade(): on watch-resource exhaustion (any watch strategy) or on lock contention past a bounded exponential-backoff budget, log once, fire a new onDegraded callback, and stop. start() now returns false consistently when the per-directory path degrades at startup — it previously returned true on Linux, so the MCP server reported the watcher "active" when it had degraded. Wire onDegraded into the MCP server so callers are actually told, and expose isDegraded()/getDegradedReason(). Builds on the approach in #877 by @thismilktea. Validated on macOS (recursive), Linux (per-directory, Docker) and Windows (recursive) — 30/30 watcher + watch-policy tests on each. Closes #876 Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

colbymchenry · 2026-06-15T04:33:25Z

Thanks for this, @thismilktea — solid diagnosis and the degrade/backoff approach was the right one. I've merged it into main (with a couple of corrections) as #891, and credited you in the changelog.

Two things I adjusted on top of your branch:

Linux start() consistency. On the per-directory watch path (Linux), a startup exhaustion degraded but start() still returned true, so the MCP server would report the watcher "active" on a watcher that had just disabled itself — and the new should not start when fs.watch setup exhausts test would fail there. Both watch strategies now return false consistently (verified natively in Docker).
onDegraded wiring. The callback wasn't consumed anywhere, so MCP/daemon callers still weren't told. It's now wired into the MCP server (File watcher degraded — …).

Also validated the recursive path on Windows (Parallels) in addition to macOS/Linux. Closing in favor of #891 — thanks again for driving this. 🙏

thismilktea and others added 3 commits June 14, 2026 15:44

fix(watcher): degrade on watch exhaustion and prolonged lock contention

1b06627

fix(watcher): degrade cleanly on exhaustion and prolonged lock conten…

07a22c1

…tion

Merge branch 'colbymchenry:main' into fix/watcher-degrade-lock-conten…

7ee92ab

…tion

colbymchenry mentioned this pull request Jun 15, 2026

fix(watcher): degrade cleanly on watch exhaustion and prolonged lock contention #891

Merged

colbymchenry mentioned this pull request Jun 15, 2026

Watcher can stay half-broken after watch exhaustion or prolonged lock contention #876

Closed

colbymchenry closed this Jun 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix watcher degradation on watch exhaustion and prolonged lock contention#877

Fix watcher degradation on watch exhaustion and prolonged lock contention#877
thismilktea wants to merge 3 commits into
colbymchenry:mainfrom
thismilktea:fix/watcher-degrade-lock-contention

thismilktea commented Jun 14, 2026

Uh oh!

colbymchenry commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

thismilktea commented Jun 14, 2026

Summary

What changed

Watch exhaustion

Lock contention

Internal cleanup

Why

Tests

Uh oh!

colbymchenry commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants