fix(network): bind strict-PQ peer identity to staking ML-DSA key so validators produce blocks#131
Draft
Darkhorse7stars wants to merge 1 commit into
Draft
fix(network): bind strict-PQ peer identity to staking ML-DSA key so validators produce blocks#131Darkhorse7stars wants to merge 1 commit into
Darkhorse7stars wants to merge 1 commit into
Conversation
…alidators produce blocks
On a strict-PQ chain a peer's consensus identity is its ML-DSA-65 NodeID
(StakingConfig.DeriveNodeID), but the network layer kept every peer on the
TLS-cert NodeID derived during the transport upgrade. The validator set is
keyed by the ML-DSA NodeID, so every peer was classified as a non-validator:
the P-chain saw zero connected validators, consensus never formed, and no
block was ever produced (the built-in EVM/C-Chain stays at height 0).
Two coupled defects:
1. network.NewNetwork built the PQ handshake identity with
peer.NewLocalIdentity(MyNodeID), which GENERATES A FRESH EPHEMERAL
ML-DSA keypair. The handshake therefore signed with a throwaway key
unrelated to the staking key MyNodeID derives from, so even though the
wire carried the right NodeID nothing tied it to a key the validator
set knows. (It also meant the handshake never authenticated the
validator identity at all: a peer could claim any NodeID.)
2. peer.runPQHandshakeIfRequired discarded HandshakeResult.PeerNodeID and
left p.id on the transport TLS-cert NodeID.
Fix:
- Thread the node's persistent staking ML-DSA keypair
(StakingConfig.StakingMLDSA{,Pub}) onto network.Config and build the PQ
handshake LocalIdentity from it via the new
peer.NewLocalIdentityFromStakingKey. The handshake now signs with the
same key that derives MyNodeID.
- After a successful handshake, peer.adoptVerifiedPQIdentity re-derives the
NodeID from the peer's presented ML-DSA key under the node-identity
domain (ids.Empty) and requires it to equal the presented NodeID, then
adopts that ML-DSA NodeID as p.id. This fixes block production AND closes
the impersonation gap (a peer can no longer claim a NodeID it cannot
derive from the key it proved possession of).
Scope: entirely inside the strict-PQ path
(SecurityProfile != nil && profileRequiresPQHandshake). Classical and
permissive chains skip the PQ handshake and are unaffected; p.id stays the
TLS-cert NodeID exactly as before. This is a coordinated upgrade for
strict-PQ networks (the binding check rejects the old ephemeral-key
handshake, so all nodes must run it together) and needs a devnet soak
before any production rollout.
Adds white-box tests for the bind / adopt / reject paths.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
On a strict-PQ chain a peer's consensus identity is its ML-DSA-65 NodeID (
StakingConfig.DeriveNodeID), but the network layer kept every peer on the TLS-cert NodeID derived during the transport upgrade (peer/upgrader.go→ids.NodeIDFromCert). The validator set is keyed by the ML-DSA NodeID, so every peer was classified as a non-validator → the P-chain saw zero connected validators → consensus never formed → no block was ever produced (the built-in EVM/C-Chain stays at height 0, RPC serves reads buteth_blockNumbernever advances).This was observed on the Liquidity strict-PQ devnet: 3 validators, all healthy, all BLS-correct, peer mesh formed — but P-chain height stuck at 0.
Root cause — two coupled defects
1. The PQ handshake signed with an ephemeral key, not the staking key.
network.NewNetworkbuilt the handshake identity withpeer.NewLocalIdentity(MyNodeID), which generates a fresh ML-DSA keypair per process (see its doc-comment). So the handshake signature proved possession of a throwaway key with no relationship to the staking key thatMyNodeIDderives from. The wire carried the right NodeID, but nothing bound it to a key the validator set knows. (Corollary: the handshake never actually authenticated the validator identity — a peer could assert any NodeID.)2. The verified peer NodeID was discarded.
peer.runPQHandshakeIfRequiredusedHandshakeResult.AEADKeybut droppedHandshakeResult.PeerNodeID, leavingp.idon the transport TLS-cert NodeID. Consensus then looked up that TLS NodeID in an ML-DSA-keyed validator set and found nothing.Fix
StakingConfig.StakingMLDSA{,Pub}) ontonetwork.Config(mirrored innode.Node.initNetworking) and build the handshakeLocalIdentityfrom it via the newpeer.NewLocalIdentityFromStakingKey. The handshake now signs with the same key that derivesMyNodeID.peer.adoptVerifiedPQIdentity(new): after a successful handshake, re-derive the NodeID from the peer's presented ML-DSA key under the node-identity domain (ids.Empty— the exact domainDeriveNodeIDuses forMyNodeID) and require it to equal the presented NodeID, then adopt that ML-DSA NodeID asp.id.This both fixes block production and closes an identity-impersonation gap: a peer can no longer claim a NodeID it cannot derive from the key it proved possession of.
Because
peer.Startruns the handshake synchronously before the message-pump goroutines and beforenetwork.upgradeadds the peer toconnectingPeers/connectedPeers,p.idis already the ML-DSA NodeID by the time any peer-set bookkeeping keys by it — no re-keying race.Blast radius
Entirely inside the strict-PQ path (
SecurityProfile != nil && profileRequiresPQHandshake). Classical / permissive chains skip the PQ handshake and are unaffected —p.idstays the TLS-cert NodeID exactly as before. No wire-format change (same INIT/RESP frames); only which key signs, plus an added local verification.Rollout / review notes
getCurrentValidatorslists all NodeIDs with weight + EVMeth_blockNumberincrements) before any production rollout. Drafted for chain-team review; do not blind-merge/deploy to a live network.Tests
go build ./...clean;go vetclean;go test ./network/peer/...green. Addsnetwork/peer/pq_identity_adopt_test.gocovering: bound identity → adopted; unbound/forged NodeID → rejected + identity untouched; nil/empty result → rejected.Follow-ups (not in this PR)
MyNodeIDself-check andAllowConnectioninnetwork.upgradestill evaluate the TLS-cert NodeID (best-effort pre-filters; authoritative gating is post-handshake onp.id). Worth migrating to the ML-DSA NodeID for completeness.peersLockinnetwork.upgrade; a slow peer serializes connection establishment. Pre-existing; orthogonal to this fix.