Skip to content

fix(accept): guard globalRateBucket.allow with a mutex (data race)#73

Merged
TeoSlayer merged 1 commit into
mainfrom
fix/accept-ratebucket-data-race
Jun 15, 2026
Merged

fix(accept): guard globalRateBucket.allow with a mutex (data race)#73
TeoSlayer merged 1 commit into
mainfrom
fix/accept-ratebucket-data-race

Conversation

@TeoSlayer

Copy link
Copy Markdown
Contributor

Summary

globalRateBucket.allow() (the process-wide accept rate limiter) mutates
tokens/lastFill without synchronization, but it is called concurrently
from every Acceptor connection-handler goroutine (handleJSONConn). That
is a read-modify-write data race on the token-refill state.

Under concurrent load the -race detector flags it reliably — e.g. the
downstream pilotprotocol lock-graph stress harness
(TestConcurrentDialEncryptDecrypt) fails its Architecture gates job
purely because of this race in the rendezvous dependency.

Fix

  • Add a sync.Mutex to globalRateBucket and lock around the
    refill/consume in allow().
  • Add TestGlobalRateBucket_ConcurrentAllow, a concurrent regression test
    (64 goroutines × 1000 calls) that fails under -race without the lock
    and passes with it.

Testing

  • go test -race ./accept/ green.
  • Verified the new test catches the race: temporarily removing the lock
    reproduces WARNING: DATA RACE; restoring it passes.

The process-wide rate bucket's allow() is called concurrently from every
Acceptor connection-handler goroutine, but tokens/lastFill were updated
without synchronization — a read-modify-write data race (caught by the
-race detector under concurrent load, e.g. the downstream lock-graph
stress harness). Add a mutex and a concurrent regression test.
@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@TeoSlayer TeoSlayer merged commit 594de9e into main Jun 15, 2026
2 checks passed
TeoSlayer added a commit to pilot-protocol/pilotprotocol that referenced this pull request Jun 15, 2026
rendezvous v0.2.4 had a data race in globalRateBucket.allow() (the
process-wide accept rate limiter mutated tokens/lastFill without a lock,
called concurrently from every connection handler). This surfaced as the
Architecture gates job's TestConcurrentDialEncryptDecrypt failing under
-race. Bump to the fixed rendezvous (mutex-guarded; pilot-protocol/rendezvous#73),
which pulls beacon v0.2.6 transitively.

Verified: go test -race TestConcurrentDialEncryptDecrypt ./tests passes
(104s, no race).

Co-authored-by: Teodor Calin <teodor@vulturelabs.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants