-
Notifications
You must be signed in to change notification settings - Fork 6k
BIPs 455–457: SwiftSync Specification #2152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,283 @@ | ||
| ``` | ||
| BIP: ? | ||
| Layer: Peer Services | ||
| Title: Peer sharing of block spent coins | ||
| Authors: Robert Netzke <rob@2140.dev>, Ruben Somsen <ruben@2140.dev> | ||
| Deputies: Edil Medeiros <edil@vinteum.org> | ||
| Status: Draft | ||
| Type: Specification | ||
| Assigned: ? | ||
| License: BSD-3-Clause | ||
|
rustaceanrob marked this conversation as resolved.
|
||
| Discussion: https://groups.google.com/g/bitcoindev/c/FpSWUxItXQs/m/pnfjP6rFCgAJ | ||
| ``` | ||
|
|
||
| ## Abstract | ||
|
|
||
| Inputs of a Bitcoin block are referenced by the outpoint data structure. This commonly poses a limitation during initial | ||
| block download (IBD), such that a client must process blocks sequentially to validate the chain history. The SwiftSync | ||
| protocol allows blocks to be evaluated in arbitrary order, however additional data is required that must be served over | ||
| the peer-to-peer network. This document describes how to share this data over the peer to peer network. | ||
|
|
||
| ## Motivation | ||
|
|
||
| A common approach to IBD is to process blocks sequentially to ensure the existence of input data when validating a | ||
| block. Metadata corresponding to an input, such as the amount, must be present in a local cache to validate the block, | ||
| hence sequential validation is a natural choice. This is a result of the height, coinbase flag, input script, and amount | ||
| of the block inputs being omitted from the data committed to by proof of work in the current block, and, thus, this data | ||
| cannot be trusted if received over the wire naively. Using the SwiftSync protocol, a client is able to verify the | ||
| correctness of this data, even if served by a potentially untrusted party. This allows a significant improvement in IBD | ||
| performance, as block validation may be done in parallel. | ||
|
|
||
| ## Specification | ||
|
rustaceanrob marked this conversation as resolved.
|
||
|
|
||
| In Bitcoin Core, to roll-back the chain state in the event of a block reorganization, the height, coinbase flag, script | ||
| and amount metadata for each spend transaction output of a block are stored in a data structure known colloquially as | ||
| "undo data". This terminology stems from its use to "undo" the effect of a block by repopulating the UTXO set with the | ||
| coins that were spent by the reorganized block. To remain general in language, this data will be referred as "spent | ||
| coins." | ||
|
|
||
| Bitcoin Core full archival nodes store spent coins for all blocks. This is useful in the context of SwiftSync, as no | ||
| additional index must be created or maintained to serve this data to peers. There are, however, some discrepancies | ||
| between how this data is serialized on disk in Bitcoin Core and how this proposal seeks to serialize this data over the | ||
| peer-to-peer protocol, which are detailed in the rationale section. | ||
|
|
||
| This section defines how to request and serve block spent coins over the peer-to-peer protocol, as well as signaling | ||
| support of this feature to peers. | ||
|
|
||
| ### Definitions | ||
|
|
||
| - `[]byte`: arbitrary sequence of bytes with no fixed length | ||
| - `<N bytes>`: byte vector of size N, where N is specified inline. N is fixed length and known at compile time (e.g. | ||
| \<32 bytes>) | ||
| - `vector<Foo>`: vector of arbitrary length of elements of type Foo | ||
| - `CompactSize`: encoding of unsigned integers defined in peer-to-peer messages, as defined in the Function Appendix | ||
| section | ||
| - `CompressAmount`: compression function for integer amounts, as defined in the Function Appendix section | ||
|
|
||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and | ||
| "OPTIONAL" in this document are to be interpreted as described in RFC 2119. | ||
|
|
||
| ### Data structures | ||
|
|
||
| #### Height and Coinbase Flag Code | ||
|
|
||
| When validating a block, a client must confirm coinbase outputs are mature, which is given by the height of the coin. | ||
| The height and coinbase flag are encoded as a 32 bit integer[^1]. To encode the height and flag, binary left shift the | ||
| height one bit, treat the coinbase flag as a bit, insert it into the newly opened bit position. To decode the height, | ||
| binary right shift the code. To decode the coinbase flag, mask the first bit position of the header code and interpret | ||
| the bit as a boolean. | ||
|
|
||
| Take an 8-bit example of a height with binary encoding `0010 0111`. To encode a coinbase output at this height, one | ||
| begins with a left shift: `0100 1110`, and places the coinbase flag in the least significant bit: `0100 1111`. | ||
|
|
||
| #### Reconstructable Script Format | ||
|
|
||
| When validating historical data, common script types may be represented more concise than the usual encoding. Bare | ||
| scripts and future output types are not compressed, however this format is extensible. The `Expansion` column is the | ||
| usual representation of the `Script` column and the `Format` column shows the compressed form. Scripts are serialized in | ||
| this format by concatenating the `Prefix` and `Format` fields specified below. | ||
|
|
||
| | Prefix | Script | Format | Expansion | | ||
| | :----- | :------- | :------------------------------------ | :----------------------------------------------------------- | | ||
| | `0x00` | Unknown | `CompactSize(Len([]bytes)) + []bytes` | `[]bytes` | | ||
| | `0x01` | `P2PKH` | `<20 bytes>` | `OP_DUP OP_HASH160 20 <20 bytes> OP_EQUALVERIFY OP_CHECKSIG` | | ||
| | `0x02` | `P2PK` | `<32-byte public key (0x02 parity)>` | `33 0x02 <32 byte public key> OP_CHECKSIG` | | ||
| | `0x03` | `P2PK` | `<32-byte public key (0x03 parity)>` | `33 0x03 <32 byte public key> OP_CHECKSIG` | | ||
| | `0x04` | `P2PK` | `<64 byte public key>` | `65 0x04 <64 byte public key> OP_CHECKSIG` | | ||
|
Comment on lines
+84
to
+86
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given that they are pretty uncommon these days, and it only saves two bytes, have you considered grouping the P2PK outputs with the other bare scripts and just encoding these per the |
||
| | `0x05` | `P2SH` | `<20 bytes>` | `OP_HASH160 20 <20 bytes> OP_EQUAL` | | ||
| | `0x06` | `P2WSH` | `<32 bytes>` | `OP_0 32 <32 bytes>` | | ||
| | `0x07` | `P2WPKH` | `<20 bytes>` | `OP_0 20 <20 bytes>` | | ||
| | `0x08` | `P2TR` | `<32-byte X-only public key>` | `OP_1 32 <32 bytes>` | | ||
|
|
||
| #### Amount Format | ||
|
|
||
| The 64 bit unsigned integers representing amounts are compressed by first using the `CompressAmount` function defined | ||
| below, and serializing the result with `CompactSize`. | ||
|
|
||
| #### Coin | ||
|
|
||
| | Field | Type | Serialization | Description | | ||
| | :----------------------- | :---------------------------- | :---------------------------------- | :------------------------------------------------ | | ||
| | Input index | 32-bit unsigned integer | Little endian | The index in the block inputs, coinbase excluded. | | ||
| | Height and coinbase flag | Height + Coinbase Flag Code | Defined above | — | | ||
| | Script | Reconstructable script format | Defined above | — | | ||
| | Amount | 64-bit unsigned integer | `CompressAmount` then `CompactSize` | Satoshi-denominated value. | | ||
|
|
||
| ### Messages | ||
|
|
||
| #### MSG_GET_SPENT_COINS | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the idea that a peer would issue this request for every block in the chain? If we assume mainnet at height, and a 150 ms round trip time, then a peer would spend nearly 80 hours just downloading this undo data. You may want to consider a batched variant, similar to the way messages like
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We've found that bandwidth throughput is the limiting factor when downloading blocks in parallel. Not all spent coins have to be downloaded if a client keeps a cache, as this document describes. In the batched variant, the cache is not possible and the bandwidth requirement increases significantly. |
||
|
|
||
| `MSG_GET_SPENT_COINS` defines a request for the inputs of a block. | ||
|
|
||
| Define `cmdString` as `getbspent`. Define BIP-324 message type as ???. | ||
|
|
||
| | Field | Type | Description | | ||
| | :---------- | :---------------------- | :------------------------------------------------------------------- | | ||
| | `blockhash` | `<32 bytes>` | Hash of the block for which inputs are requested. | | ||
| | `cutoff` | 32-bit unsigned integer | If greater than zero, include only coins created before this height. | | ||
|
rustaceanrob marked this conversation as resolved.
|
||
|
|
||
| Rationale of the `cutoff` field is detailed in the rationale section below. | ||
|
|
||
| #### MSG_SPENT_COINS | ||
|
|
||
| `MSG_SPENT_COINS` defines the data structure for inputs of a block. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably not really y'all's intended use case, but if you optionally make it possible to include merkle proofs for the set of coins, then this message can be used to obtain a proof that an output was spent in a given block.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would actually also be useful for BIP 157+158 peers, as the final version that shipped includes the script spent (instead of the outpoint), which means that if you're using the filters to find a block where a given script has been spent, you need to make some assumptions about what the prev script is for a given transaction.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The most recent response on this mailing list post mentions commitment to the UTXO set as part of the block header. There are additional ways to do this outside of a soft fork as well, i.e. utreexo proofs. For now I think it best to leave this unspecified in this version of the message while the community shares ideas, but I do think this is interesting. |
||
|
|
||
| Define `cmdString` as `bspent`. Define BIP-324 message type as ???. | ||
|
|
||
| | Field | Type | Description | | ||
| | :---------- | :------------------------------- | :------------------------------------------------ | | ||
| | `blockhash` | `<32 bytes>` | Block hash these coins are spent from. | | ||
| | `len` | `CompactSize(Len(vector<Coin>))` | Length of the coins vector. | | ||
| | `coins` | `vector<Coin>` | Coins spent, after filtering on request `cutoff`. | | ||
|
|
||
| A client supporting the `bspent` MUST include coins created _before_ the `cutoff` field in `getbspent` requests. A | ||
| client receiving a `bspent` message with un-requested or missing coins MUST disconnect from the serving peer. | ||
|
|
||
| ## Signaling | ||
|
|
||
| Support for serving historical block spent coins is advertised by a feature message, introduced by | ||
| [BIP-434](https://github.com/bitcoin/bips/blob/master/bip-0434.md). | ||
|
|
||
| | featureid | featuredata | | ||
| | :------------------ | :---------- | | ||
| | `blockspentcoinsv1` | `0x00` | | ||
|
rustaceanrob marked this conversation as resolved.
|
||
|
|
||
| A client advertising this feature SHOULD respond to `getbspent` messages, subject to rate-limiting and bandwidth | ||
| limiting. | ||
|
|
||
| ## Rationale | ||
|
|
||
| The lifetime, or interval between creation and spending height, of the coins on the Bitcoin blockchain demonstrate an | ||
| empirical phenomena that the majority of coins are spent within 100 blocks. In fact, approximately 41 percent of coins | ||
| are spent within 10 blocks at the time of writing[^2]. Clients may leverage this to reduce the bandwidth required to | ||
| fetch spent coins by using an in-memory cache. For example, a client may store coins that were created in a 5 block | ||
| window, and request only coins that are older than this height via the `cutoff` filter. This results in a significant | ||
| bandwidth reduction at the cost of a cache that can be set dynamically by the client depending on available memory. | ||
|
Comment on lines
+150
to
+155
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This cutoff cache optimization seems to nudge implementors back to sequentially processing blocks with the added burden of requesting extra data over the wire. Also with the current messages I still need to get the data for the block (even if there's only one unspent cache miss?), right? At Is this understanding correct?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Caching requires sequential processing, but you can have multiple sequential threads in parallel.
You're going to have to request the undo data regardless for non-assumevalid SwiftSync - it is not related to caching.
Caching does not prevent the need for requesting undo data. You can safely assume pretty much every block has cache misses. No cache missses is equivalent to having the full UTXO set (and impossible with multiple sequential threads), which defeats the point.
I have no strong opinion on batching, but round-trip latency won't add up sequentially if requests are sent out in parallel. Concrete example: Let's say you're starting another sequential thread from block height 1001 and you intend to cache the last 5 blocks worth of outputs. For the first block you'd request the full undo data. For block 1002 until 1005 you'd request everything created until block height 1000. From height 1006 onwards your 5-block window starts to shift so you'd request everything created until block height 1001, and so on. All this data can be requested in parallel. As long as your caching strategy is not based on what you witnessed during the previous block, at no point do you have to wait for one block to finish processing before requesting the data for upcoming blocks.
Comment on lines
+154
to
+155
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah cool. I was missing this context above when
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added a short note when introducing the request message that the |
||
|
|
||
| Beyond the use of a dynamic coin height filter, there are additional reasons to not simply read the spent coins from | ||
| disk and send it over the wire. Legacy fields (`nVersion`) are set to `0x00` when writing and reading this data to | ||
| maintain compatibility of disk format with old clients. Furthermore, using the amount compression specified above, an | ||
| 11gb reduction in bandwidth is achieved. `CompactSize`, which is commonly used in P2P messages to describe collection | ||
| lengths, was selected over `VARINT`, which is used internally within Bitcoin Core to represent variable length integers | ||
| The application of `VARINT` as opposed to `CompactSize` offers a further reduction of 4gb, however the `VARINT` | ||
| primitive is currently a Bitcoin Core implementation detail and has not been included in a P2P message. Reusing existing | ||
| network primitives results in the majority of savings, so this specification opts to lower implementation burden for | ||
| clients. With respect to reconstructable script, utilizing this format results in a savings of around 12gb. The scheme | ||
| is loss-less, and may be upgradable by appending script variants. For reference, the naive encoding of block spent coins | ||
| is 118gb as of block 930,000[^2][^3][^4]. | ||
|
|
||
| ## Function Appendix | ||
|
|
||
| Bitcoin Core utilizes a technique to remove trailing zeros from the representation of amounts. This technique offers a | ||
| significant size reduction in amount serialization. These functions are duplicated from the | ||
| [test framework](https://github.com/bitcoin/bitcoin/blob/master/test/functional/test_framework/compressor.py). | ||
|
|
||
| ### Compress Amount | ||
|
|
||
| ```python | ||
| def compress_amount(n): | ||
| if n == 0: | ||
| return 0 | ||
| e = 0 | ||
| while ((n % 10) == 0) and (e < 9): | ||
| n //= 10 | ||
| e += 1 | ||
| if e < 9: | ||
| d = n % 10 | ||
| assert (d >= 1 and d <= 9) | ||
| n //= 10 | ||
| return 1 + (n*9 + d - 1)*10 + e | ||
| else: | ||
| return 1 + (n - 1)*10 + 9 | ||
| ``` | ||
|
|
||
| ## Decompress Amount | ||
|
|
||
| ```python | ||
| def decompress_amount(x): | ||
| if x == 0: | ||
| return 0 | ||
| x -= 1 | ||
| e = x % 10 | ||
| x //= 10 | ||
| n = 0 | ||
| if e < 9: | ||
| d = (x % 9) + 1 | ||
| x //= 9 | ||
| n = x * 10 + d | ||
| else: | ||
| n = x + 1 | ||
| while e > 0: | ||
| n *= 10 | ||
| e -= 1 | ||
| return n | ||
| ``` | ||
|
|
||
| `CompactSize` is commonly used to represent the size of collections in peer-to-peer messages. | ||
|
|
||
| ## Encode Compact Size | ||
|
|
||
| ```python | ||
| def encode_compactsize(n): | ||
| if n < 0xfd: | ||
| return bytes([n]) | ||
| elif n <= 0xffff: | ||
| return b"\xfd" + n.to_bytes(2, "little") | ||
| elif n <= 0xffffffff: | ||
| return b"\xfe" + n.to_bytes(4, "little") | ||
| else: | ||
| return b"\xff" + n.to_bytes(8, "little") | ||
| ``` | ||
|
|
||
| ## Decode Compact Size | ||
|
|
||
| ```python | ||
| def decode_compactsize(b): | ||
| prefix = b[0] | ||
| if prefix < 0xfd: | ||
| return prefix | ||
| elif prefix == 0xfd: | ||
| return int.from_bytes(b[1:3], "little") | ||
| elif prefix == 0xfe: | ||
| return int.from_bytes(b[1:5], "little") | ||
| else: | ||
| return int.from_bytes(b[1:9], "little") | ||
| ``` | ||
|
|
||
| ## Compatibility | ||
|
|
||
| Clients seeking to perform fully-validating SwiftSync require peers that serve undo data. Serving data requires no | ||
| additional index and may be enabled via advertising the `feature` message. | ||
|
|
||
| ## Reference Implementation and Test Vectors | ||
|
|
||
| ### Reference Implementation | ||
|
|
||
| - [Bitcoin Core](https://github.com/rustaceanrob/bitcoin/tree/bip-block-undo) | ||
|
|
||
| ### Test Vectors | ||
|
|
||
| - [Reconstructable script](test_vectors/block_undo/reconstructable_script.json) | ||
| - [Compressed Amount](test_vectors/block_undo/compressed_amount.json) | ||
|
|
||
| In order: | ||
| `P2PKH, P2SH, P2TR, P2WPKH, P2WSH, P2PK (odd), P2PK (even), P2PK (uncompressed), OP_RETURN (unspendable/unknown)` | ||
|
|
||
| ## Copyright | ||
|
|
||
| This BIP is licensed under the 3-clause BSD license. | ||
|
|
||
| [^1]: When representing objects in memory, programming languages will align the bytes of the fields of an object. A | ||
| boolean is commonly padded to 4 or 8 bytes, but only requires a bit. Further, if the coinbase flag was a separate field | ||
| as represented in the message, it would require at least one byte. Losing one bit of precision in the block height still | ||
| allows for valid encodings of heights up to 2,147,483,647. | ||
| [^2]: Relevant statistics may be generated via binaries in | ||
| the [`swiftsync-research`](https://github.com/rustaceanrob/swiftsync-research) repository | ||
| [^3]: Reconstructable | ||
| scripts are borrowed from [UTREEXO](https://github.com/bitcoin/bips/pull/1923) which is subsequently borrowed from Cory | ||
| Field's UHS proposal | ||
| [^4]: Astute readers may notice uncompressed public keys may be compressed before they are sent | ||
| and decompressed by the receiving client. Although this would slightly reduce bandwidth, it would increase the | ||
| complexity of client code, as a `secp256k1` context would be required to decode the message, which is not currently a | ||
| requirement. As of height 936,212 the number of uncompressed public keys spent in blocks is 853,515. This represents a | ||
| very modest savings in bandwidth, around 30MB. As such, this technique is omitted for implementation simplicity. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| [ | ||
| [0, "0x0"], | ||
| [1, "0x1"], | ||
| [1000000, "0x7"], | ||
| [100000000, "0x9"], | ||
| [5000000000, "0x32"], | ||
| [2100000000000000, "0x1406f40"] | ||
| ] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| [ | ||
| ["76a9142365e46227cc171083ea275f45ea8646c61d1fbb88ac", "012365e46227cc171083ea275f45ea8646c61d1fbb"], | ||
| ["a914b472a266d0bd89c13706a4132ccfb16f7c3b9fcb87", "05b472a266d0bd89c13706a4132ccfb16f7c3b9fcb"], | ||
| ["5120720b1ffb2c63684973c5e9898b188c9d367fa2bc1ce76b8ea02872b5e3ffe705", "08720b1ffb2c63684973c5e9898b188c9d367fa2bc1ce76b8ea02872b5e3ffe705"], | ||
| ["00146262b97a514ea54d12f51e0a4fe4c09fb74ff7bd", "076262b97a514ea54d12f51e0a4fe4c09fb74ff7bd"], | ||
| ["00200000000000000000000000000000000000000000000000000000000000000000", "060000000000000000000000000000000000000000000000000000000000000000"], | ||
| ["210334ed84e3c579d5ff9122fb4215210ec5aaad51c3f60bf971d939db1c5b56a9fbac", "0334ed84e3c579d5ff9122fb4215210ec5aaad51c3f60bf971d939db1c5b56a9fb"], | ||
| ["210299745a46d9f42b4f578e32d5582120a4688b4224f7e20081f781efc198d11edeac", "0299745a46d9f42b4f578e32d5582120a4688b4224f7e20081f781efc198d11ede"], | ||
| ["410441a5367189b64cc1601c2a708556e37ade94ec808be746e45e35d86d2ee0cb9cd3b2e65ee51baf285cda78589605c3a59ba0492d577349ad3f0afaac862aa59eac", "0441a5367189b64cc1601c2a708556e37ade94ec808be746e45e35d86d2ee0cb9cd3b2e65ee51baf285cda78589605c3a59ba0492d577349ad3f0afaac862aa59e"], | ||
| ["6a", "00016a"] | ||
| ] |
Uh oh!
There was an error while loading. Please reload this page.