Skip to content
This repository was archived by the owner on Oct 3, 2023. It is now read-only.

Commit 8f12846

Browse files
authored
Merge pull request #17 from sebright/clarify-binary-encoding
Add more details to the general binary encoding specification (closes #11).
2 parents 81f67e4 + 9e0ca2f commit 8f12846

1 file changed

Lines changed: 30 additions & 16 deletions

File tree

encodings/BinaryEncoding.md

Lines changed: 30 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
# BINARY FORMAT
22

3+
The binary format can be used to encode different data types, each with different fields. This
4+
document first describes the general format and then applies it to specific data types,
5+
including Trace Context and Tag Context.
6+
37
## General Format
48
Each encoding will have a 1 byte version followed by the version format encoding:
59

@@ -10,37 +14,47 @@ This will allow us to, in 1 deprecation cycle to completely switch to a new form
1014
## Version Format (version_id = 0)
1115
The version format for the version_id = 0 is based on ideas from proto encoding. The main
1216
requirements are to allow adding and removing fields in less than 1 deprecation cycle. It
13-
contains a list of repeated fields:
17+
contains a list of fields:
1418

1519
`<field><field>...`
1620

1721
### Field
22+
Each field is a 1-byte field ID paired with a field value, where the format of the field value is
23+
determined by both the field ID and the data type. For example, field 0 in `Trace Context` may
24+
have a completely different format than field 0 in `Tag Context` or field 1 in `Trace Context`.
25+
1826
Each field that we send on the wire will have the following format:
1927

2028
`<field_id><field_format>`
2129

2230
* `field_id` is a single byte.
2331

24-
* `field_format` must be defined for each metadata field separately, that means that for field_id
25-
= 0 in trace context the field_value may have a completely different representation than the
26-
field_id = 0 in the server-stats metadata.
32+
* `field_format` must be defined for each field separately.
33+
34+
The specification for a data type's format must also specify whether each field is optional or
35+
repeated. For example, `Trace-id` in `Trace Context` is optional, and `String tag` in `Tag Context`
36+
is repeated. The specification for a data type's format MAY define a default value for any
37+
optional field, which must be used when the field is missing.
2738

28-
Each field is optional and MAY have defined a default value that can be used (if implementation
29-
needs one) when the field is missing. Fields can be repeated, e.g. StringTag in the tagging example.
39+
The specification for a data type can define versions within a version of the format, called data
40+
type version, where each data type version adds new fields. The data type version can be useful
41+
for describing what fields an implementation supports, but it is not included in the
42+
serialized data.
3043

3144
### Serialization Rules
32-
Because each field has its own format that is not generically defined we are forced to always add
33-
new field ids at the end. The serialization MUST ensure that fields are serialized in version
34-
order (i.e. fields from version (i) must precede fields from version (i+1)). This ordering
35-
allows old decoders to ignore any new fields even if they do not know the format for that field.
36-
Systems that receive extra fields that they cannot decode MAY pass them on when possible (by
37-
passing-through the whole opaque tail of bytes starting with the field id that the current
38-
binary does not understand).
45+
Fields MUST be serialized in data type version order (i.e. all fields from version (i) of a data
46+
type must precede all fields from version (i+1)). That is because each field has its own format,
47+
and old implementations may not be able to determine where newer field values end. This ordering
48+
allows old decoders to ignore any new fields when they do not know the format for those fields.
49+
Fields within a data type version can be serialized in any order, and fields with the same field
50+
ID do not need to be serialized consecutively.
3951

4052
### Deserialization Rules
41-
Because all the fields will be decoded in the same order as they were defined/added, the
42-
deserialization will simply read the encoded input until the end of the input (if no new fields
43-
were received) or until the first unknown field_id.
53+
Because all the fields will be decoded in data type version order, the deserialization will
54+
simply read the encoded input until the end of the input or until the first unknown field_id.
55+
Implementations MAY pass on any fields that they cannot decode, when possible (by passing-through
56+
the whole opaque tail of bytes starting with the first field id that the current binary does not
57+
understand).
4458

4559
### How can we add new fields?
4660
If we follow the rules that we always append the new ids at the end of the buffer we can add up

0 commit comments

Comments
 (0)