Skip to content

fix(registry): resolve repo casing case-insensitively on sync#604

Open
galuis116 wants to merge 2 commits into
JSONbored:mainfrom
galuis116:fix/registry-sync-case-insensitive-fullname
Open

fix(registry): resolve repo casing case-insensitively on sync#604
galuis116 wants to merge 2 commits into
JSONbored:mainfrom
galuis116:fix/registry-sync-case-insensitive-fullname

Conversation

@galuis116

Copy link
Copy Markdown
Contributor

Closes #603.

persistRegistrySnapshot keyed the upsert conflict target (onConflictDoUpdate({ target: fullName })) and the de-registration (notInArray(fullName, …)) on the case-sensitive repositories.fullName primary key. But repo names arrive from multiple sources (the upstream registry vs GitHub-canonical webhook/API casing) and the rest of the system resolves repos case-insensitively (getRepository has a lower(fullName) fallback). On a casing mismatch the sync inserted a duplicate primary-key row instead of updating the existing one, and the case-sensitive de-registration nulled the registration on the other casing — splitting a repo into two rows so a registered repo could read back as unregistered. Same class as the already-fixed #223/#225 (case-sensitive repoFullName comparison), at an unfixed site.

Change

  • Fetch existing repo names once and resolve each snapshot repo to its existing row by lowercased name, so a casing variant updates that row (no duplicate primary key).
  • De-register by lowercased comparison, so a casing variant of a still-registered repo is never wrongly de-registered.
  • No stored-case change for existing rows; behavior is identical when casing already matches.

Verification

  • Two regression tests (both fail on the old code): a GitHub-canonical row + a differently-cased registry entry yields a single registered row; a casing change between snapshots does not de-register the repo.
  • registry.test.ts 7/7; tsc --noEmit clean; full suite green; sync.ts 98% branch; global branch coverage holds (>= 97% gate).

persistRegistrySnapshot keyed the upsert conflict target and the de-registration on the case-sensitive fullName primary key, while repo names arrive from multiple sources (registry vs GitHub-canonical) and the rest of the system resolves repos case-insensitively. A casing mismatch inserted a duplicate row and de-registered the wrong one (same class as JSONbored#223/JSONbored#225). Resolve each snapshot repo to an existing row by lowercased name before upsert, and de-register by lowercased comparison.
@galuis116 galuis116 requested a review from JSONbored as a code owner June 11, 2026 11:24
@dosubot dosubot Bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Jun 11, 2026
@gittensory

gittensory Bot commented Jun 11, 2026

Copy link
Copy Markdown

Important

Gittensory found maintainer review notes

Scoped related-work signals were found for this PR. They are advisory unless the gate reports a blocker.

Readiness score: 77/100

Signal Result Evidence Action
Linked issue ✅ Linked #603, #223 No action.
Related work ⚠️ 2 scoped overlaps Top overlaps are listed below; lower-confidence bulk is hidden. Review top overlaps.
Review load ❌ 8/20 Readiness component derived from cached public PR metadata and labels; size label size:S. Add scope summary.
Validation evidence ✅ 25/25 PR body includes validation/test evidence. No action.
Open PR queue ⚠️ 5/10 9 open PR(s), 8 likely reviewable, 1 unlinked. Expect slower review.
Contributor context ✅ Confirmed Gittensor contributor galuis116; Gittensor profile; 183 PR(s), 51 issue(s). No action.
Gate result ✅ Passing No configured blocker found. No action.
Signal definitions
  • Related work = same linked issue, overlapping active PRs, or title/path similarity.
  • Review load = cached public PR metadata such as size labels, changed paths, and preflight status.
  • Open PR queue = repo-wide review pressure; it is not a PR quality failure.
  • Contributor context = public GitHub/Gittensor identity context; non-Gittensor status is not a blocker.
Review context
Maintainer notes
  • Possible duplicate or overlapping work: 2 related open work cluster(s) were detected.
Contributor next steps
  • Review top overlaps.
  • Add scope summary.
  • Expect slower review.
  • Check active issues and PRs before submitting.
  • Re-run Gittensory review

Checked by Gittensory, a quiet PR intelligence layer for OSS maintainers. Learn more about Gittensor contribution workflows.

@gittensory gittensory Bot added the gittensory:reviewed Gittensor contributor context label Jun 11, 2026
@reviewwed

reviewwed Bot commented Jun 11, 2026

Copy link
Copy Markdown

reviewbot · advisory review

Reviewed 2 changed file(s) — two independent AI reviewers.

Suggested action: 🛠️ Request changes. Address the suggestions below before merging.

Reviewer A · gpt-oss-120b — recommends 🛠️ request changes
The PR adds case‑insensitive handling for repository names during registry sync and adds tests for the new behavior. The intent is clear, but the implementation inserts rows that may already exist, which can violate the primary‑key constraint on repositories.fullName.

Suggestions

  • Change the repository insert to an upsert (ON CONFLICT DO UPDATE) so that a case‑variant row updates the existing record instead of causing a duplicate‑key error.
  • Import and use the appropriate conflict‑resolution helper from drizzle‑orm (e.g., .onConflictDoUpdate) and ensure all relevant columns are updated.
  • Consider refreshing the list of existing full names after the insert/upsert if further logic depends on the latest state.

Worth double-checking

  • Primary‑key violation when a snapshot contains a repo with different casing that already exists.
  • Potential stale‑repo de‑registration if the list of existing names isn’t refreshed after inserts.

Reviewer B · nemotron-3-120b-a12breview unavailable

@superagent-security

Copy link
Copy Markdown

Superagent didn't find any vulnerabilities or security issues in this PR.

@reviewwed

reviewwed Bot commented Jun 12, 2026

Copy link
Copy Markdown

gittensory · advisory review

Reviewed 2 changed file(s) — two independent AI reviewers.

Suggested action: 🛠️ Request changes. (reviewers split: request changes / merge) Address the suggestions below before merging.

Reviewer A · gpt-oss-120b — recommends 🛠️ request changes
The PR adds case‑insensitive handling for repository names during registry sync and adds tests for the new behavior. The implementation correctly resolves canonical names for inserts and de‑registration, but it still uses a plain insert which will conflict with existing rows, likely causing errors and failing the new tests.

Suggestions

  • Replace the plain insert with an upsert (e.g., insert(...).onConflictDoUpdate({ target: repositories.fullName, set: { ... } })) so that a case‑variant repo updates the existing row instead of causing a primary‑key violation.
  • Consider consolidating the fetch of existing repository names and the de‑registration logic after all inserts/updates have been applied, to avoid race conditions if the DB is large.
  • Add a test case verifying that the isRegistered flag is correctly set when an existing row is updated via the upsert path.

Worth double-checking

  • Potential duplicate‑key errors during sync if the insert is not changed to an upsert.
  • If the upsert does not set all necessary columns (e.g., owner, name), the row may end up with stale data.

Reviewer B · nemotron-3-120b-a12b — recommends ✅ merge
The PR makes the registry sync case‑insensitive when matching repository names, preventing duplicate primary‑key inserts when the same repo arrives with different casing from various sources. It updates the insert and deregister logic accordingly and adds two focused unit tests that verify the behavior. The change is clean, well‑tested, and addresses a real issue.

Suggestions

  • Consider adding a guard (or log/warn) if the existing repository table contains multiple rows that differ only in casing, as the current Map would silently pick one arbitrarily.
  • If the repositories table is expected to grow large, evaluate whether fetching all fullName rows on every snapshot could become a performance bottleneck and whether a more targeted lookup (e.g., via a lowercase index) would be preferable.
  • The import change swaps notInArray for inArray; ensure no other callers of notInArray are affected (none appear in the diff).

Worth double-checking

  • Querying all repository rows on each sync could be inefficient for large datasets.
  • If duplicate‑casing rows already exist in the DB, the sync may arbitrarily choose one as the canonical representative, potentially masking data‑integrity issues.
  • The deregistration logic now relies on staleFullNames; ensure that a repo whose casing changes but remains registered is never incorrectly marked as stale (the added test covers this).

@JSONbored JSONbored left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see latest review for requested changes and ensure CI is green when done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gittensory:reviewed Gittensor contributor context gittensory-review size:S This PR changes 10-29 lines, ignoring generated files.

Projects

Status: Todo

2 participants