Skip to content

list_ref_certificates fetches a repo's entire cert set with no LIMIT (permissionless read amplification on public repos) #147

Description

@beardthelion

Summary

list_ref_certificates (crates/gitlawb-node/src/db/mod.rs:1846) runs SELECT ... FROM ref_certificates WHERE repo_id = $1 ORDER BY issued_at DESC with fetch_all and no LIMIT, loading every certificate row for a repo into memory. The table grows one row per ref per push (api/repos.rs:919 loops issue_ref_certificate over every advanced ref; cert.rs:44 mints a fresh UUID per call), with no upsert/dedup and no prune or retention anywhere, so it accumulates permanently. An anonymous caller reading a public repo turns a single cheap GET into an unbounded fetch, allocation, and response body.

This is an availability/cost problem, separate from #120 (which adds the missing visibility gate to these handlers) and #114 (which bounded only the gossip half of the events feed). #120's fix does not help here: for a public repo authorize_repo_read allows an anonymous caller straight through, so the load stays fully permissionless after that gate lands.

Where

Two consumers of the unbounded fetch, both on the anonymous read group (optional_signature):

  • crates/gitlawb-node/src/api/certs.rs:20list_certs (GET /api/v1/repos/{owner}/{repo}/certs, routed at server.rs:317). No limit/cursor param and no truncate: it serializes the entire cert set into the response.
  • crates/gitlawb-node/src/api/events.rs:181list_repo_events caps the response with all_events.truncate(limit), but only after the full fetch, so the DB read and the intermediate Vec allocation stay unbounded regardless of ?limit.

Growing the table needs an authenticated pusher (git-receive-pack sits behind require_signature), but any registered agent can push (enforce_owner_push defaults false) and git_write_routes carries no rate limit, so a single writer can inflate one repo's cert count without bound. Every read afterward is permissionless and amplified.

Impact

Measured against a live DB (200k synthetic cert rows for one repo, then reverted): ~60 MB of column data; the no-LIMIT query returns all rows in ~120 ms; list_certs would serialize ~82 MB per request. The node holds the Vec<RefCertificate> from fetch_all, the re-mapped Vec<serde_json::Value>, and the response body concurrently, roughly 200+ MB transient heap per in-flight /certs request from one unauthenticated GET. Reads have no per-caller rate limit, so N concurrent anonymous GETs multiply toward OOM. Same availability/amplification class as #82.

Fix

Bound the fetch at the DB layer instead of in memory:

Found as a follow-up to the PR #143 events-feed work. Verified by execution: list_ref_certificates returns all 250 seeded rows past the 200-row feed ceiling, and the figures above are from a live-DB measurement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    crate:nodegitlawb-node — the serving node and REST APIkind:securityVulnerability fix or hardeningsev:highMajor break or real security/trust risk, no easy workaroundsubsystem:apiNode REST API request/response surfacesubsystem:attestationCertificates, anchoring, per-ref attestation

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions