Summary
GET /api/v1/ipfs/pins and GET /api/v1/arweave/anchors serve metadata for repos that were public when pushed but later made private. The visibility gate on the write side is evaluated once at push time and is never reconciled when a repo's visibility is tightened, so the index rows persist and the read endpoints return them with no current-visibility filter.
This is the real mechanism behind #121 (and what PR #134 only partially addresses): the leak is not that private repos are indexed (they are not, the write path is correctly gated), it is that visibility is mutable after indexing and nothing purges or re-filters the index.
Mechanism (verified)
Write side is gated at push, point-in-time:
pinned_cids is written only via pin_new_objects (crates/gitlawb-node/src/ipfs_pin.rs:133, crates/gitlawb-node/src/pinata.rs:114), called only inside the withheld.is_some() blocks of the push handler (crates/gitlawb-node/src/api/repos.rs:1055, :1167).
arweave_anchors is written only via record_arweave_anchor (crates/gitlawb-node/src/api/repos.rs:1249), inside if announce && !irys_url.is_empty() (repos.rs:1231).
announce/withheld come from replication_withheld_set = listable_at_root(rules, is_public, owner_did, None) (repos.rs:49), the anonymous root-read decision evaluated during the push async task.
Nothing reconciles the index on visibility change:
- There is no
DELETE FROM pinned_cids or DELETE FROM arweave_anchors anywhere in the tree.
set_visibility (crates/gitlawb-node/src/api/visibility.rs:82) adds/updates rules and touches neither table.
- The read queries are unfiltered:
list_pinned_cids is SELECT ... FROM pinned_cids ORDER BY pinned_at DESC (db/mod.rs:2038), and the global list_arweave_anchors(None,..) is a plain SELECT ... FROM arweave_anchors with no visibility predicate (db/mod.rs:2344).
Repro sequence
- Create a public repo (
is_public=true, no rules).
- Push to it:
announce=true, so arweave_anchors and pinned_cids rows are written (with IPFS/Irys configured).
- Tighten visibility:
PUT /api/v1/repos/{owner}/{repo}/visibility adding a / deny rule (mode A, or mode B excluding the public). listable_at_root(..., None) now denies; the repo is effectively private.
GET /api/v1/arweave/anchors (no ?repo=) and GET /api/v1/ipfs/pins still return the now-private repo's anchor rows (slug, owner DID, ref_name, old_sha, new_sha, cid, arweave_url) and pinned object CIDs.
Object content stays gated by the per-caller check in GET /ipfs/{cid} (#110/#133), so this is metadata disclosure (branch names, commit SHAs, ownership, object CIDs), not content. The anchor rows are also permanently on public Arweave from when the repo was public; that part is inherent to permanent storage and not fixable post-hoc, but the node's own listing should not keep serving it.
Why PR #134's auth-only gate is only a partial mitigation
#134 requires authentication for the global listings. That closes anonymous scraping, but identities are permissionless on this node (optional_signature verifies a self-produced signature; register is open), so any throwaway DID still reads the stale rows. Authentication is not authorization here (same class as INV-1).
Fix direction
Filter the listings by current visibility rather than (or in addition to) requiring auth:
- For
list_anchors/list_pins, resolve each row's repo and apply authorize_repo_read / listable_at_root against current rules before returning it, or restrict the global listing to a node-admin capability.
- And/or reconcile the index on visibility downgrade: have
set_visibility purge or mark rows for repos that are no longer announceable.
A regression test should push-while-public, downgrade, then assert the listing excludes the repo.
Summary
GET /api/v1/ipfs/pinsandGET /api/v1/arweave/anchorsserve metadata for repos that were public when pushed but later made private. The visibility gate on the write side is evaluated once at push time and is never reconciled when a repo's visibility is tightened, so the index rows persist and the read endpoints return them with no current-visibility filter.This is the real mechanism behind #121 (and what PR #134 only partially addresses): the leak is not that private repos are indexed (they are not, the write path is correctly gated), it is that visibility is mutable after indexing and nothing purges or re-filters the index.
Mechanism (verified)
Write side is gated at push, point-in-time:
pinned_cidsis written only viapin_new_objects(crates/gitlawb-node/src/ipfs_pin.rs:133,crates/gitlawb-node/src/pinata.rs:114), called only inside thewithheld.is_some()blocks of the push handler (crates/gitlawb-node/src/api/repos.rs:1055,:1167).arweave_anchorsis written only viarecord_arweave_anchor(crates/gitlawb-node/src/api/repos.rs:1249), insideif announce && !irys_url.is_empty()(repos.rs:1231).announce/withheldcome fromreplication_withheld_set=listable_at_root(rules, is_public, owner_did, None)(repos.rs:49), the anonymous root-read decision evaluated during the push async task.Nothing reconciles the index on visibility change:
DELETE FROM pinned_cidsorDELETE FROM arweave_anchorsanywhere in the tree.set_visibility(crates/gitlawb-node/src/api/visibility.rs:82) adds/updates rules and touches neither table.list_pinned_cidsisSELECT ... FROM pinned_cids ORDER BY pinned_at DESC(db/mod.rs:2038), and the globallist_arweave_anchors(None,..)is a plainSELECT ... FROM arweave_anchorswith no visibility predicate (db/mod.rs:2344).Repro sequence
is_public=true, no rules).announce=true, soarweave_anchorsandpinned_cidsrows are written (with IPFS/Irys configured).PUT /api/v1/repos/{owner}/{repo}/visibilityadding a/deny rule (mode A, or mode B excluding the public).listable_at_root(..., None)now denies; the repo is effectively private.GET /api/v1/arweave/anchors(no?repo=) andGET /api/v1/ipfs/pinsstill return the now-private repo's anchor rows (slug, owner DID,ref_name,old_sha,new_sha,cid,arweave_url) and pinned object CIDs.Object content stays gated by the per-caller check in
GET /ipfs/{cid}(#110/#133), so this is metadata disclosure (branch names, commit SHAs, ownership, object CIDs), not content. The anchor rows are also permanently on public Arweave from when the repo was public; that part is inherent to permanent storage and not fixable post-hoc, but the node's own listing should not keep serving it.Why PR #134's auth-only gate is only a partial mitigation
#134 requires authentication for the global listings. That closes anonymous scraping, but identities are permissionless on this node (
optional_signatureverifies a self-produced signature;registeris open), so any throwaway DID still reads the stale rows. Authentication is not authorization here (same class as INV-1).Fix direction
Filter the listings by current visibility rather than (or in addition to) requiring auth:
list_anchors/list_pins, resolve each row's repo and applyauthorize_repo_read/listable_at_rootagainst current rules before returning it, or restrict the global listing to a node-admin capability.set_visibilitypurge or mark rows for repos that are no longer announceable.A regression test should push-while-public, downgrade, then assert the listing excludes the repo.