Skip to content

fix(node): gate the ref-updates feeds on read visibility (#112, #114)#143

Open
beardthelion wants to merge 12 commits into
mainfrom
fix/gate-ref-updates-feeds
Open

fix(node): gate the ref-updates feeds on read visibility (#112, #114)#143
beardthelion wants to merge 12 commits into
mainfrom
fix/gate-ref-updates-feeds

Conversation

@beardthelion

@beardthelion beardthelion commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Three surfaces served received_ref_updates rows with no visibility check, so an anonymous caller who could guess or enumerate a private repo's name could read its ref metadata: refName, oldSha, newSha, pusherDid, nodeDid, timestamp, and the repo slug. Same withholding-bypass class as #97 (repo listing) and #110 (blob content), on the ref-history surface.

Approach

One canonical, fail-closed decision reused everywhere, rather than a bespoke check per surface.

ref_update_row_visible (pure, in visibility.rs) takes the deduped local repo set plus their visibility rules and decides each row against the same listable_at_root gate the repos resolver uses. A row is dropped when its slug matches a local repo the caller cannot read; rows that match no local repo (remote, gossip-only) pass through under the existing origin announce-gate assumption. The classify has no default-keep: the only keep paths are all-matches-readable and no-local-match, so an unreadable match always drops.

Two details that matter for correctness:

  • The stored slug is peer-supplied and written verbatim from the wire, so it is treated as untrusted input, not a lookup key. Matching is by repo name plus a prefix-tolerant owner-key comparison normalized the same way on both sides, which fails closed for any local repo under aliased slug forms (full DID, short key, URL-truncated, or a non-did:key method) and can only ever over-drop a genuinely remote row, never over-serve a private local one.
  • Resolution is against list_all_repos_deduped, whose CTE picks the canonical row over any mirror row, so a private canonical repo is never masked by a public mirror row. A DB error during the gate load propagates (500) rather than serving unfiltered.

The two query feeds filter to an empty result on deny (a non-existent repo returns empty too, so it discloses nothing); the live subscription is gated write-side, since /graphql/ws carries no caller identity to gate against.

Tests

Each gate has load-bearing regression coverage, confirmed failing before the fix and passing after: anonymous leak reproduced then closed, owner still sees their own private rows, and the alias / truncated-key / mirror-coexistence / multi-segment-DID cases all fail closed. A DB-layer test pins that a private canonical row beats a public mirror in dedup, and a replication_withheld_set test pins the announce boolean the subscription broadcast now gates on. Full workspace suite, clippy -D warnings, and fmt all clean.

Closes #112
Closes #114

Summary by CodeRabbit

  • Bug Fixes
    • Ref-update feeds/listings are now fail-closed for anonymous viewers, hiding private local-repository activity while still showing remote/gossip-only updates.
    • REST and GraphQL ref-update results apply per-repository visibility rules (including DID/slug aliasing) before limiting/paging.
    • Ref-update pagination is now stable, offset-based, and respects small limits without skipping visible items; request limits are internally clamped.
    • Live ref-update broadcasts are only sent when pushes are publicly announcable; quarantined mirrors stay withheld.
  • Tests
    • Expanded end-to-end coverage for anonymous vs owner access, aliasing, quarantined mirrors, paging correctness, and announce behavior.

,#114)

Add ref_update_row_visible: given the deduped local repo set + their
visibility rules, decide whether a received_ref_updates row (keyed by its
peer-supplied slug) is visible to the caller. Matches slug->repo by name
plus prefix-tolerant owner key, and drops any row matching a local repo
the caller cannot read at root; rows matching no local repo (remote/
gossip-only) and malformed slugs pass through. Fail-closed by
construction (no default-keep). Pure/I-O-free so visibility.rs stays
exhaustively unit-tested.

Shared gate for the GraphQL (#112) and REST (#114) ref-updates feeds;
call sites land in the following commits. Includes a DB-layer test
pinning that a private canonical row beats a public mirror row in
list_all_repos_deduped (else the feed gate would over-serve).
The GraphQL ref_updates resolver mapped every received_ref_updates row
straight out with no visibility gate, so an anonymous caller could read
a private repo's ref metadata (refName, SHAs, pusherDid, nodeDid) by
name. Thread the caller DID and retain rows through the shared
ref_update_row_visible filter (deduped local set + rules, same '/'
decision the repos resolver uses); applies to both the repo:Some and
repo:None branches. Remote/gossip-only rows still pass.

Removes the now-live filter's dead_code allow. Tested via the GraphQL
schema: anon leak reproduced before the gate (RED) and closed after
(GREEN); owner still sees their private row; full-DID and truncated-key
slug aliases fail closed; remote rows kept.
GET /api/v1/events/ref-updates returned every received_ref_updates row
across all repos, including private ones, to any anonymous caller (the
REST analog of #112). Thread the caller DID via optional_signature and
retain rows through the shared ref_update_row_visible filter, deriving
count from the filtered set. Keeps the crate::error::Result return type;
the deduped/rules loads fail closed (500) on a DB error rather than
serving unfiltered. Remote/gossip-only rows still pass.

Tested via the crate's HTTP-API harness: anon leak reproduced before the
gate (RED) and closed after (GREEN); owner still sees their private row;
mixed feed yields only public rows with a filtered count; full-DID and
truncated-key slug aliases fail closed; remote rows kept.
…unce (#112,#114)

Code review surfaced a third reader of the same ref metadata the query
and REST feeds gate: the GraphQL subscription. /graphql/ws is mounted
after the optional_signature layer (no caller to gate against), and the
push handler broadcast every ref update to it unconditionally — outside
the if-announce block that already gates the sibling gossip and Arweave
sends. So an anonymous websocket subscriber received live ref metadata
(repo, refs, SHAs, pusher/node DID) for private repos as they were
pushed — the subscription analog of the #112/#114 leak.

Move the ref_update_tx broadcast inside the existing if-announce guard so
only publicly-readable pushes reach the unauthenticated subscription,
matching gossip and Arweave. Pin the gate decision with a
replication_withheld_set test (announce false for private, true for
public).

Also harden the feed filter's owner-key normalization: normalize the
slug owner to the last ':'-segment (same rule as record_key) instead of
stripping only did:key:, so a multi-segment DID (did:web) and its
full-DID slug match and fail closed rather than over-serving. Regression
test added (RED under the old normalization).
…dd did:web keep test

Code review flagged that the unauthenticated /graphql/ws ref_updates
subscription now depends on a single-point invariant: safety rests
entirely on every sender to ref_update_tx being announce-gated, since the
resolver has no caller to gate against. Document that on the resolver so
a future sender can't silently reopen the leak.

Add the symmetric keep-side test for the did:web owner-key normalization
(public multi-segment-DID repo row is kept for anon), so a regression
that over-drops legitimate did:web rows is caught alongside the existing
drop-side test.
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds fail-closed visibility gating for ref-update feeds in REST and GraphQL, pages feed reads through new DB helpers, and gates ref-update subscription broadcasts on announce.

Changes

Ref-update visibility and announce gating

Layer / File(s) Summary
Visibility gate implementation
crates/gitlawb-node/src/visibility.rs
Adds ref_update_row_visible(deduped, rules_by_repo, caller, row_repo) with fail-closed matching for local repo slugs and direct keep behavior for remote or unmatched rows.
DB paging and quarantine helpers
crates/gitlawb-node/src/db/mod.rs
Adds quarantined-repo listing, replaces ref-update list methods with offset-paged queries, and adds a deduplication test that preserves canonical privacy over a public mirror.
Feed collection and repo gates
crates/gitlawb-node/src/api/events.rs, crates/gitlawb-node/src/graphql/query.rs, crates/gitlawb-node/src/api/mod.rs
Routes global and repo-scoped ref-update reads through the shared collector, adds auth and repo-root gating, clamps limits, and extends GraphQL and REST coverage for anonymous, owner, alias, remote, paging, and quarantined cases.
Announce-gated broadcast path
crates/gitlawb-node/src/api/repos.rs, crates/gitlawb-node/src/graphql/subscription.rs
Conditions GraphQL ref-update broadcast sends on announce, adds a test for private/public announce behavior, and documents the broadcast invariant.

Estimated code review effort: 4 (Complex) | ~60 minutes

Possibly related issues

Possibly related PRs

  • Gitlawb/node#25: Introduces the visibility-rule machinery that this PR extends for ref-update feed filtering.
  • Gitlawb/node#52: Touches the same repo-read gating pattern used by the repo-scoped ref-update endpoint here.
  • Gitlawb/node#125: Adds the quarantine-related repo plumbing that this PR’s feed collector relies on.

Suggested labels: sev:medium, subsystem:api, kind:security

Suggested reviewers: jatmn, kevincodex1

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and accurately summarizes the main change: gating ref-update feeds on read visibility.
Description check ✅ Passed The description explains the bug, fix, and tests, and includes the required issue closure context; a few template sections are omitted.
Linked Issues check ✅ Passed The PR implements the requested visibility gate for both GraphQL refUpdates and the REST ref-updates feed, matching issues #112 and #114.
Out of Scope Changes check ✅ Passed The extra DB, subscription, and authz-test changes support the same visibility fix rather than introducing unrelated scope.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/gate-ref-updates-feeds

Comment @coderabbitai help to get the list of available commands.

@beardthelion beardthelion added crate:node gitlawb-node — the serving node and REST API kind:bug Defect fix — wrong or unsafe behavior subsystem:visibility Path-scoped visibility and content withholding labels Jul 1, 2026

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found issues that need to be addressed before this is ready.

Findings

  • [P1] Preserve full non-did:key owners when matching ref-update rows
    crates/gitlawb-node/src/visibility.rs:171
    ref_update_row_visible reduces both the row owner and every repo owner to the last : segment before doing the prefix match. That contradicts the existing owner-key rules in did_matches and DEDUP_CTE, which strip only bare did:key: owners and intentionally keep non-key DID methods whole so did:web:alice/widget and another method/host ending in alice remain different owners. With the current normalization, a private local repo such as did:web:host:alice/widget makes an unrelated remote row like did:gitlawb:alice/widget or did:web:other:alice/widget look like a local match and get dropped for anonymous callers. Please reuse the same did:key-aware owner normalization as the repo dedupe/owner matching code, and add a regression proving same-name non-key DID owners with the same trailing segment do not collide.

  • [P2] Apply the visibility filter before limiting the ref-update feeds
    crates/gitlawb-node/src/api/events.rs:24
    The REST feed loads only limit rows and then drops private rows in memory, and the GraphQL resolver does the same through list_ref_updates_filtered(repo.as_deref(), limit) before retaining visible rows. If the newest rows are private, an anonymous request such as limit=5 can return zero or fewer-than-five events even when older public ref updates exist, so the activity feed no longer returns the latest visible updates. Please either push the visibility predicate before the SQL limit or continue paging/over-fetching until the requested number of visible rows has been collected, and add a small-limit mixed public/private regression for both REST and GraphQL.

#112,#114)

The comment claimed parity with did_matches/DEDUP_CTE, but the last-segment
normalization deliberately diverges: the emitter (publish_ref_update) broadcasts
the short, method-stripped slug, so a repo's own rows arrive keyed by the
trailing segment and the gate must match on it to stay fail-closed. Reusing the
did:key-aware keep-whole rule would fail open on a did:web repo's short slug.

Add feed_private_didweb_short_slug_dropped_for_anon to pin that load-bearing
canonical case; relabel the full-DID test as the non-canonical hand-forged form.
…114)

The REST global feed and the GraphQL ref_updates resolver loaded `limit` rows
and dropped private ones in memory, so an anonymous request could get fewer than
`limit` events (or zero) when the newest rows named private local repos, even
though older public ref updates existed.

Add a shared collect_visible_ref_updates helper that applies the fail-closed
gate before the limit: it over-fetches in bounded pages (128 rows, capped at
max(limit, 2048) scanned) until `limit` visible rows are collected or the table
is exhausted. Both feeds route through it so the one gate cannot drift. Back it
with list_ref_updates_page (stable timestamp,id ordering for offset paging) and
drop the now-unused list_ref_updates / list_ref_updates_filtered. The page size
is a parameter on an inner fn so a regression can exercise multi-page paging.

Regressions: newest-private, older-public, small limit still returns the latest
visible rows on both surfaces (RED before, GREEN after); and a page=2 test pins
offset paging across page boundaries (no skip or duplicate).
@beardthelion

Copy link
Copy Markdown
Collaborator Author

Thanks for the review. Both findings addressed in the two commits above.

P1 (last-segment owner normalization). The over-drop you describe is real, but the suggested remedy would reopen the leak this PR closes, so I kept the normalization and fixed the comment that misdescribed it (that comment is what your finding correctly caught).

The constraint is the emit side. publish_ref_update (api/repos.rs) builds the broadcast slug as {owner_did.split(':').last}/{name} for every owner. So a repo's own rows always arrive keyed by the trailing segment, and two owners sharing that segment (did:web:host:alice and did:gitlawb:alice) broadcast the identical slug alice/widget. The slug is lossy by construction; it doesn't carry the DID method.

Given that, reusing the did:key-aware keep-whole normalization fails open on the canonical case: a private did:web:host:alice/widget repo's own row arrives as alice/widget, and alice doesn't prefix-match the whole did:web:host:alice record, so it's kept and served to anon, which is the exact #112/#114 leak. Both readings of the suggestion leak: keep-whole + prefix-tolerant leaks the did:web short slug; keep-whole + exact equality (literal did_matches) also drops the 8-char truncated tolerance and leaks the truncated did:key case feed_truncated_key_slug_dropped_for_anon pins. I ran the matrix both ways to confirm.

So the divergence from did_matches / DEDUP_CTE is deliberate: those compare trusted canonical owner DIDs where keeping non-key methods whole is correct; this slug is untrusted and already method-stripped, so it needs the stricter fail-closed rule. The cost is a fail-safe over-drop when a remote owner collides on both trailing segment and repo name, which is negligible for full did:key ids and only possible for did:web / truncated handles. It can hide a remote row, never serve a private local one. A real fix would need the wire slug to carry the full owner DID (and mirror-row storage to match), which is a cross-peer format change beyond this PR; happy to file a follow-up if you think the completeness gap warrants it.

Commit corrects the normalization comment (it wrongly claimed DEDUP_CTE parity) and adds feed_private_didweb_short_slug_dropped_for_anon to pin the load-bearing short-slug case; the prior did:web test used the non-canonical full-DID form and is relabeled as such.

P2 (filter before the limit). Fixed. Both feeds now run the gate before the limit through a shared collect_visible_ref_updates helper that over-fetches in bounded pages until limit visible rows are collected (capped at max(limit, 2048) scanned, fail-safe). Routing both the REST feed and the GraphQL resolver through the one helper also keeps them from drifting. Added small-limit mixed public/private regressions for both surfaces (newest-private, older-public: RED before, GREEN after), plus a page-boundary test pinning the offset paging.

@beardthelion beardthelion requested a review from jatmn July 1, 2026 18:42

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/gitlawb-node/src/db/mod.rs (1)

2120-2120: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Use the same deterministic tie-break here.

Line 2120 still orders repo-scoped rows only by timestamp, so equal-timestamp rows around LIMIT can be nondeterministic while the new paged helper is stable.

Proposed fix
-             FROM received_ref_updates WHERE repo = $1 ORDER BY timestamp DESC LIMIT $2",
+             FROM received_ref_updates WHERE repo = $1 ORDER BY timestamp DESC, id DESC LIMIT $2",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/db/mod.rs` at line 2120, The repo-scoped query in the
received_ref_updates path still orders only by timestamp, which can make LIMIT
results unstable when timestamps tie. Update the ORDER BY used by the repo
filter query to include the same deterministic tie-break as the new paged
helper, matching the existing stable ordering in the related
received_ref_updates pagination logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gitlawb-node/src/api/events.rs`:
- Around line 47-60: The visibility check in events handling is using
list_all_repos_deduped(), which excludes quarantined local mirrors and can cause
ref updates from those repos to be treated as remote rows. Update the logic
around the ref-update scan in the events path to use a local-match universe that
includes quarantined rows, or add a DB helper that returns quarantined repos as
unreadable visibility records, and ensure ref_update_row_visible() fails closed
for those slugs instead of allowing anonymous access.
- Around line 40-55: The shared collector in the events API is using the
caller’s raw limit, so GraphQL can bypass the intended bounded behavior. Update
the helper around the visible initialization and max_scan logic to clamp the
effective limit internally to the same upper bound used by REST before
allocating the Vec and calculating the scan cap. Make the fix in the shared
events collector function so both REST and GraphQL paths inherit the bound
automatically.

---

Outside diff comments:
In `@crates/gitlawb-node/src/db/mod.rs`:
- Line 2120: The repo-scoped query in the received_ref_updates path still orders
only by timestamp, which can make LIMIT results unstable when timestamps tie.
Update the ORDER BY used by the repo filter query to include the same
deterministic tie-break as the new paged helper, matching the existing stable
ordering in the related received_ref_updates pagination logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9e171b12-cbcc-4b73-96de-0dff323a389e

📥 Commits

Reviewing files that changed from the base of the PR and between 02b4b7e and e3626ec.

📒 Files selected for processing (4)
  • crates/gitlawb-node/src/api/events.rs
  • crates/gitlawb-node/src/db/mod.rs
  • crates/gitlawb-node/src/graphql/query.rs
  • crates/gitlawb-node/src/visibility.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/gitlawb-node/src/graphql/query.rs
  • crates/gitlawb-node/src/visibility.rs

Comment thread crates/gitlawb-node/src/api/events.rs Outdated
Comment thread crates/gitlawb-node/src/api/events.rs
… gate comment

The comment credited publish_ref_update with building the wire slug, but that
p2p method only broadcasts a pre-built RefUpdateEvent; git_receive_pack assembles
the {last-segment}/{name} slug and hands it over. Fix the attribution.
…114)

Resolve two CodeRabbit findings on the ref-update feeds:

- Clamp the effective limit inside the shared collector (max 200). REST already
  capped, but the GraphQL resolver passed its caller-provided limit uncapped,
  letting a large request return unbounded rows and scan unbounded DB rows. Both
  surfaces are now bounded identically.

- Fold quarantined mirrors into the gate's match universe as unreadable.
  list_all_repos_deduped excludes quarantined rows, so a ref-update naming a
  quarantined local mirror matched nothing, was misclassified as remote, and was
  served to anon, bypassing the withheld-from-every-listing-surface contract. Add
  list_quarantined_repos and drop matching rows fail-closed.

Regressions: oversized-limit clamp (201 rows -> 200) and quarantined-mirror
withheld-from-anon, both RED before, GREEN after.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
crates/gitlawb-node/src/db/mod.rs (2)

2133-2136: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Use the same deterministic tie-break as the paged feed.

list_ref_updates_page orders by timestamp DESC, id DESC, but this repo-scoped query limits on timestamp alone. Add id DESC so tied timestamps don’t make the per-repo feed nondeterministic.

Proposed fix
-             FROM received_ref_updates WHERE repo = $1 ORDER BY timestamp DESC LIMIT $2",
+             FROM received_ref_updates WHERE repo = $1 ORDER BY timestamp DESC, id DESC LIMIT $2",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/db/mod.rs` around lines 2133 - 2136, The repo-scoped
ref updates query in list_ref_updates should use the same deterministic ordering
as list_ref_updates_page, since ordering only by timestamp can return tied rows
nondeterministically. Update the SQL in the received_ref_updates query inside
list_ref_updates to order by timestamp DESC, id DESC so rows with equal
timestamps are consistently sorted.

2150-2176: 🩺 Stability & Availability | 🔵 Trivial | 🏗️ Heavy lift

Consider keyset pagination for the live ref-update feed.

The shared collector walks this API with repeated OFFSET queries; inserts between page reads can shift offsets and cause skipped or duplicated rows in one response. A (timestamp, id) keyset cursor or a transaction snapshot would make the page boundary stable under concurrent writes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/db/mod.rs` around lines 2150 - 2176, The pagination
in list_ref_updates_page uses OFFSET-based queries, which can drift under
concurrent inserts and cause skipped or duplicated ref updates. Update the
list_ref_updates_page query path to use a stable keyset cursor based on the
existing timestamp and id ordering, or otherwise ensure the page is read from a
consistent snapshot. Keep the ordering by timestamp DESC, id DESC and thread the
cursor through the ReceivedRefUpdate listing logic so repeated fetches from the
shared collector remain stable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/gitlawb-node/src/db/mod.rs`:
- Around line 2133-2136: The repo-scoped ref updates query in list_ref_updates
should use the same deterministic ordering as list_ref_updates_page, since
ordering only by timestamp can return tied rows nondeterministically. Update the
SQL in the received_ref_updates query inside list_ref_updates to order by
timestamp DESC, id DESC so rows with equal timestamps are consistently sorted.
- Around line 2150-2176: The pagination in list_ref_updates_page uses
OFFSET-based queries, which can drift under concurrent inserts and cause skipped
or duplicated ref updates. Update the list_ref_updates_page query path to use a
stable keyset cursor based on the existing timestamp and id ordering, or
otherwise ensure the page is read from a consistent snapshot. Keep the ordering
by timestamp DESC, id DESC and thread the cursor through the ReceivedRefUpdate
listing logic so repeated fetches from the shared collector remain stable.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b5f230eb-5731-41c4-8b1d-208633b29c14

📥 Commits

Reviewing files that changed from the base of the PR and between e3626ec and 2bf20b5.

📒 Files selected for processing (4)
  • crates/gitlawb-node/src/api/events.rs
  • crates/gitlawb-node/src/db/mod.rs
  • crates/gitlawb-node/src/graphql/query.rs
  • crates/gitlawb-node/src/visibility.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • crates/gitlawb-node/src/visibility.rs
  • crates/gitlawb-node/src/api/events.rs
  • crates/gitlawb-node/src/graphql/query.rs

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I rechecked the ref-update surfaces and found one issue that still needs to be addressed before this is ready.

Findings

  • [P1] Gate the repo events feed on the caller's repo visibility
    crates/gitlawb-node/src/api/events.rs:138
    The new shared visibility collector is used for /api/v1/events/ref-updates and GraphQL refUpdates, but the adjacent repo-scoped REST feed still bypasses it. /api/v1/repos/{owner}/{repo}/events is mounted under optional_signature, yet list_repo_events never reads the caller DID or calls authorize_repo_read; it resolves the repo with get_repo, then returns local cert events and received_ref_updates rows directly through list_ref_certificates / list_repo_ref_updates. An anonymous caller who can guess a private repo's owner/name can still read its ref metadata from this endpoint, which leaves the same ref-history leak class open on a fourth surface. Please gate this handler with the same repo root visibility decision and cover the anonymous-private, owner-private, and quarantined-mirror cases for /api/v1/repos/{owner}/{repo}/events as well.

…#112, #114)

GET /api/v1/repos/{owner}/{repo}/events (list_repo_events) served a repo's ref
certificates and received gossip ref-updates with no caller identity and no
visibility check, so an anonymous caller who guessed a private repo's owner/name
read its ref metadata. It was the last repo-scoped reader in the #112/#114
ref-metadata leak class.

Gate it in two layers, using the caller DID the optional_signature layer already
supplies:
- The cert half is gated by a repo-root authorize_repo_read (keyed by the unique
  repo record id): a denied, quarantined, or not-hosted repo returns an opaque 404
  (RepoNotFound), never a 403.
- The gossip half is filtered per row through the shared collect_visible_ref_updates.
  received_ref_updates rows are keyed by the lossy, non-unique wire slug
  {last-segment}/{name}, so two owners (did:web:a:alice, did:web:b:alice) collide on
  `alice/name` and a plain slug query would serve a colliding private repo's rows to
  anyone allowed to read this one; the per-row gate drops them fail-closed.

Both halves now propagate DB errors as 500 instead of swallowing them into an empty
200. Removes the now-unused list_repo_ref_updates.

Deliberate behavior change: a repo this node does not host returns 404 (it holds no
visibility record for it and fails closed; remote gossip is read via the global
/api/v1/events/ref-updates feed). Drops list_repo_events from the egress-guard
KNOWN_UNGATED allowlist; the source-scrape staleness assert enforces the removal.

Tests cover anon-private, owner-private (both datasets), anon/authed-public,
quarantined, released-from-quarantine, non-owner, not-hosted, did:web (owner and
anon), the slug collision, oversized-limit clamping, and DB-error-fails-closed on
both the gate and the cert fetch.
@beardthelion

Copy link
Copy Markdown
Collaborator Author

Done. list_repo_events now gates on the repo-root read decision via authorize_repo_read, same as the sibling repo-scoped reads: anon/non-owner reads of a private repo, quarantined mirrors, and repos this node doesn't host all return 404 instead of serving ref certs or gossip ref-updates. Dropped it from KNOWN_UNGATED; the egress-guard staleness assert enforces that.

While gating the gossip half I found it also leaked across a slug collision. received_ref_updates rows are keyed by the non-unique {last-segment}/{name} wire slug, so a public alice/widget could surface a private did:web:evil:alice/widget's rows. Routed that half through the shared collect_visible_ref_updates so it's filtered per row like the cross-repo feeds. Tests cover anon/owner/non-owner, quarantined, not-hosted, did:web, the collision, and DB-error-fails-closed.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gitlawb-node/src/api/events.rs`:
- Around line 144-148: The `fetch_events` limit handling only caps the upper
bound, so a negative `limit` can still become a huge `usize` at `truncate(limit
as usize)`. Update the `limit` parsing/clamping in `events.rs` to reject or
clamp negative values before the cast, and keep the existing
`MAX_VISIBLE_REF_UPDATES` cap intact. Add a regression test around
`fetch_events` using `?limit=-1` with at least one local certificate event to
confirm the response is still limited.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a66ef7c8-b57f-4eb5-81bb-dd17cc4acc33

📥 Commits

Reviewing files that changed from the base of the PR and between 2bf20b5 and 52c2b97.

📒 Files selected for processing (3)
  • crates/gitlawb-node/src/api/events.rs
  • crates/gitlawb-node/src/api/mod.rs
  • crates/gitlawb-node/src/db/mod.rs
💤 Files with no reviewable changes (2)
  • crates/gitlawb-node/src/api/mod.rs
  • crates/gitlawb-node/src/db/mod.rs

Comment thread crates/gitlawb-node/src/api/events.rs Outdated

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I rechecked the ref-update surfaces and found one issue that still needs to be addressed before this is ready.

Findings

  • [P2] Clamp negative event limits before casting
    crates/gitlawb-node/src/api/events.rs:144
    CodeRabbit's current review item is still valid here: list_repo_events only applies .min(MAX_VISIBLE_REF_UPDATES), so ?limit=-1 stays negative until the final all_events.truncate(limit as usize). That cast turns -1 into usize::MAX, which bypasses the intended response cap for local ref-certificate events on /api/v1/repos/{owner}/{repo}/events. Please complete the review request by clamping the lower bound before the cast, and add the negative-limit regression with at least one local certificate event so this cannot come back.

Gravirei added a commit to Gravirei/node that referenced this pull request Jul 2, 2026
- Drop idx_ref_updates_owner from Migration v10; no query reads
  owner_did yet and CREATE INDEX (non-CONCURRENT) inside a transaction
  takes a write-blocking lock on populated nodes for zero benefit.
  Defer the index to Gitlawb#143 alongside the query that uses it.

- Expose owner_did on repo-scoped events feed: add the field to the
  gossip_events json! block in list_repo_events so GET
  /api/v1/repos/{owner}/{repo}/events is consistent with
  GET /api/v1/events/ref-updates.

- Carry owner_did through GraphQL: add owner_did: Option<String> to
  RefUpdateBroadcast (state.rs) and RefUpdateType (types.rs), populate
  it from the DB row in query.rs, from the broadcast in subscription.rs,
  and from the trusted local record.owner_did in the broadcast send in
  repos.rs.

- Add migration_v10_existing_node_upgrade test: seeds a v1..v9 state,
  strips the column and version row, re-runs migrate(), and asserts the
  column and v10 row appear. This is the regression guard: it fails if
  the DDL is accidentally moved back into v1.

Refs Gitlawb#144
A caller-supplied `?limit=-1` cleared the `.min(MAX_VISIBLE_REF_UPDATES)`
upper cap unchanged and reached `all_events.truncate(limit as usize)` in
list_repo_events, where `-1 as usize` is usize::MAX, so the truncate was a
no-op and the handler returned the full unbounded local ref-cert set.

Floor both feed handlers' limit parse with `.clamp(0, MAX_VISIBLE_REF_UPDATES)`.
Only the repo handler's clamp is load-bearing: it guards the local `truncate`,
which does not pass through the shared collector. The global feed's clamp is a
consistency measure (its limit is re-clamped in the collector, whose want==0
short-circuits before any scan), kept in lockstep so the two per-handler caps
the module comment calls out don't drift.

Tests, both load-bearing by execution:
- repo_events_negative_limit_clamped: seeds local certs (the vulnerable half),
  RED (3) before the clamp and green (0) after.
- feed_negative_limit_returns_empty: seeds 5 visible rows; green (0) with the
  collector clamp, 5 without it, proving the collector is the global feed's guard.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/gitlawb-node/src/api/events.rs (1)

63-67: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Make quarantined matches deny before normal visibility.

Setting is_public = false does not make the row unreadable to the owner; lines 407-431 prove private rows are still kept for owners. Track quarantined repo IDs/slugs separately and force a drop before calling the normal listability check, then add a signed-owner regression for a quarantined mirror on the global feed. Otherwise quarantine is not “withheld from every listing surface.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/api/events.rs` around lines 63 - 67, Quarantined
repositories are only being marked non-public in the events listing flow, but
that still allows owner-visible private rows through the normal listability
path. Update the events aggregation logic in the code that handles listability
and deduping so quarantined repo IDs/slugs are tracked separately and filtered
out before the standard visibility check runs, rather than only setting
is_public to false. Then add a regression test for the global feed using a
signed-owner case to verify a quarantined mirror is fully withheld from every
listing surface.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/gitlawb-node/src/api/events.rs`:
- Around line 63-67: Quarantined repositories are only being marked non-public
in the events listing flow, but that still allows owner-visible private rows
through the normal listability path. Update the events aggregation logic in the
code that handles listability and deduping so quarantined repo IDs/slugs are
tracked separately and filtered out before the standard visibility check runs,
rather than only setting is_public to false. Then add a regression test for the
global feed using a signed-owner case to verify a quarantined mirror is fully
withheld from every listing surface.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 634704e6-3e8e-4d5c-a85e-251ae076cdbe

📥 Commits

Reviewing files that changed from the base of the PR and between 52c2b97 and 151810a.

📒 Files selected for processing (1)
  • crates/gitlawb-node/src/api/events.rs

…ate (#114)

The ref-update feeds folded quarantined mirrors into the deduped repo set with is_public=false and relied on the row visibility gate to withhold them. That held for anonymous callers but not for one matching the mirror's owner_did: visibility_check short-circuits to Allow for the owner before is_public is read, so an owner-matched quarantined row was served on both the REST global feed and the GraphQL ref_updates resolver.

Drop any row that names a quarantined repo in the shared collector, before the visibility gate runs, so the drop bypasses the owner short-circuit. Quarantine is a status decided at admission and checked separately from the mirror's visibility fields, so it must deny every caller including the owner.

Extract the slug match into one ref_update_row_names_repo predicate reused by both the gate and the quarantine drop, so the two cannot disagree about which rows a repo owns.

Tests cover the owner case at the collector, REST handler, and GraphQL resolver in both bare-key and full-did:key owner forms, plus a must-not case proving the drop withholds only quarantined rows while still serving unrelated visible ones.
@beardthelion

Copy link
Copy Markdown
Collaborator Author

Good catch, this was real, confirmed by execution: a quarantined mirror whose owner_did matched the caller was served on both the REST feed and the GraphQL resolver, because visibility_check short-circuits to Allow for the owner before is_public is read, so the is_public=false fold-in never denied the owner.

Fixed in d976a29. The shared collector now drops any row that names a quarantined repo before the visibility gate runs, so the drop bypasses the owner short-circuit entirely; the fold-in and its is_public=false step are gone. Matching goes through a single ref_update_row_names_repo predicate reused by both the gate and the drop, so the two can't disagree about which rows a repo owns.

Regressions cover a quarantined repo withheld from a caller matching its owner_did in both bare-key and full-did:key forms, at the collector, the REST handler, and the GraphQL resolver, plus a must-not case confirming the drop withholds only quarantined rows and still serves unrelated visible ones.

@beardthelion beardthelion requested a review from jatmn July 3, 2026 03:49

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I rechecked the ref-update surfaces and found a repository-state blocker that needs to be handled before this is ready.

Findings

  • [P1] Rebase on main and resolve the merge conflicts first
    GitHub mergeability
    GitHub currently reports this branch as CONFLICTING with mergeStateStatus: DIRTY, so the PR cannot be merged as-is. Please rebase on the latest main branch and resolve those conflicts before addressing any further review, so the final code and tests are evaluated against the version that will actually merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

crate:node gitlawb-node — the serving node and REST API kind:bug Defect fix — wrong or unsafe behavior subsystem:visibility Path-scoped visibility and content withholding

Projects

None yet

2 participants