fix(extraction): don't index linked git worktrees as embedded repos#873
Open
allenwoods wants to merge 1 commit into
Open
fix(extraction): don't index linked git worktrees as embedded repos#873allenwoods wants to merge 1 commit into
allenwoods wants to merge 1 commit into
Conversation
Workflows that keep `git worktree add` checkouts inside the repo (commonly in a gitignored `.worktrees/` or `.claude/worktrees/`) hit the embedded-repo discovery from colbymchenry#514: each worktree contains a `.git` entry, so it was treated as an independent nested repo and fully re-indexed. Since a linked worktree is a second checkout of the SAME repo as the main checkout, every symbol got duplicated once per worktree — an N-worktree repo inflated the index ~Nx and made query/callers/impact return N copies of each result. findNestedGitRepos now skips a `.git` entry that is a linked worktree pointer: a FILE whose `gitdir:` resolves into another repo's `worktrees/<id>` admin dir. A submodule pointer (`gitdir: .../modules/...`) and a genuine embedded repo (`.git` is a directory) are unaffected, so colbymchenry#514 and submodule handling don't regress. Test: a real `git worktree add` under a gitignored dir is excluded from scanDirectory, while a genuine embedded repo in the same dir is still discovered. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Linked git worktrees parked inside a repo (e.g.
.worktrees/,.claude/worktrees/) were being indexed as separate embedded repos, duplicating every symbol once per worktree.Why
The embedded-repo discovery from #514 walks gitignored directories looking for nested
.gitentries and treats each as an independent repo. Agit worktree addcheckout has a.gitfile (not a directory), so each worktree was picked up and fully re-indexed. Since a linked worktree is a second checkout of the same repo as the main checkout, this duplicated every symbol N times — an N-worktree repo inflated the index ~Nx and madequery/callers/impactreturn N copies of each result.Real case that surfaced this: an Elixir/Phoenix repo with ~18 worktrees under
.worktrees/indexed to 678 MB / 16k files with every symbol duplicated 18×; after this fix, 37 MB / 1k files, one result per symbol.Fix
findNestedGitReposnow skips a.gitentry that is a linked-worktree pointer — a FILE whosegitdir:resolves into another repo'sworktrees/<id>admin dir:gitdir: …/modules/…) → unaffected.gitis a directory) → unaffected, so Multiple sub-Git projects cannot be indexed as a whole. #514 does not regressTest
Added a test that creates a real
git worktree addunder a gitignored dir and asserts its files are excluded fromscanDirectory, while a genuine embedded repo in the same ignored dir is still discovered.Full suite green (1491 tests, 2 skipped);
tsc --noEmitclean.🤖 Generated with Claude Code