Skip to content

messages: Fix msg_headers deser to consume trailing tx-count varint#321

Open
djdarcy wants to merge 1 commit into
petertodd:masterfrom
djdarcy:fix-msg-headers-trailing-varint
Open

messages: Fix msg_headers deser to consume trailing tx-count varint#321
djdarcy wants to merge 1 commit into
petertodd:masterfrom
djdarcy:fix-msg-headers-trailing-varint

Conversation

@djdarcy
Copy link
Copy Markdown

@djdarcy djdarcy commented May 15, 2026

Hey @petertodd there is a small bug in msg_headers.msg_deser (see: #320). Basically the P2P headers message wire format puts each entry inside the standard CBlock framing: 80-byte block header + transaction-count varint. The varint is always 0 (the whole point of a headers message is blocks-without-transactions) but it is still part of the on-wire bytes. The current deserializer calls VectorSerializer.stream_deserialize(CBlockHeader, f), which reads exactly 80 bytes per header and skips the trailing varint. Meaning after header[0] the stream is one byte ahead of where the deserializer thinks it is, and header[1] onwards is misaligned by one byte.

The bug doesn't show up in self-round-trip tests because msg_ser also skipped the trailing byte, so serialize-then-deserialize was symmetric. It only surfaces against on-wire data from a real peer:

import socket
from bitcoin.messages import (
    MsgSerializable, msg_version, msg_verack,
    msg_getheaders, msg_headers, msg_ping, msg_pong,
)

sock = socket.create_connection(('seed.bitcoin.sipa.be', 8333), timeout=10)
f = sock.makefile('rb')

# Standard version/verack handshake.
sock.sendall(msg_version().to_bytes())
for _ in range(10):
    m = MsgSerializable.stream_deserialize(f)
    if isinstance(m, msg_version):
        sock.sendall(msg_verack().to_bytes())
    elif isinstance(m, msg_verack):
        break

# Ask for headers starting from genesis.
gh = msg_getheaders()
gh.locator.vHave = [b'\x00' * 32]
sock.sendall(gh.to_bytes())

while True:
    m = MsgSerializable.stream_deserialize(f)
    if isinstance(m, msg_ping):
        pong = msg_pong(); pong.nonce = m.nonce
        sock.sendall(pong.to_bytes())
        continue
    if isinstance(m, msg_headers):
        hs = m.headers
        print("header[0] hash:", hs[0].GetHash()[::-1].hex())
        print("header[1] prev:", hs[1].hashPrevBlock[::-1].hex())
        print("chain OK?     ", hs[1].hashPrevBlock == hs[0].GetHash())
        break

Observed output against any mainnet peer:

header[0] hash: 000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f
header[1] prev: 000000839a8e6886ab5951d76f411475428afc90947ee320161bbf18eb604800
chain OK?      False

Note header[1].hashPrevBlock is header[0].GetHash() shifted left by one byte.

This PR consumes the trailing varint in msg_deser and writes one back out in msg_ser, keeping the symmetric self-round-trip property intact. Note: the existing Test_msg_headers.test_serialization still passes unchanged.

A small wire-format regression test (builds a body from mainnet block 0 + block 1 and checks chain continuity, exact byte consumption, and round-trip identity) is available as a gist: https://gist.github.com/djdarcy/dc8843b6183f65165621c8fc09fcedc4. I held it back from this PR's diff to keep the change minimal -- happy to drop it into bitcoin/tests/test_messages.py if you'd like in-tree coverage.

FWIW the bug is also why opentimestamps-client currently carries a workaround that intercepts the headers command at the framing layer and parses the body manually (see opentimestamps/opentimestamps-client#164). Once this is added / released, that workaround can come out.

Closes #320.

Tested on Python 3.12 + Win11; all 20 tests in bitcoin/tests/test_messages.py still pass.

The P2P 'headers' message wire format is a varint N followed by N
entries, where each entry is an 80-byte CBlockHeader plus the standard
CBlock framing's transaction-count varint. The varint is always 0 (the
whole point of a 'headers' message is blocks-without-transactions) but
it is still part of the on-wire bytes, so a deserializer that reads
only the 80-byte header gets misaligned for header[1] onwards.

The bug doesn't show up in self-round-trip tests because msg_ser also
skipped the trailing byte, so serialize-then-deserialize was symmetric.
It only surfaces against on-wire data from a real peer, where header[1]
.hashPrevBlock ends up shifted one byte left of header[0].GetHash().

This fix consumes the trailing varint in msg_deser and writes one back
out in msg_ser, keeping the symmetric round-trip property intact while
matching the actual on-wire format. The existing
Test_msg_headers.test_serialization round-trip test continues to pass.

A regression test (genesis + block 1 chain continuity, exact byte
consumption, round-trip identity against a wire-formatted body) is
available as a gist for review and can be added to
bitcoin/tests/test_messages.py if desired:
https://gist.github.com/djdarcy/dc8843b6183f65165621c8fc09fcedc4

Closes petertodd#320.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

msg_headers deserializer misaligns against real peers (skips trailing tx-count varint)

1 participant