11# BINARY FORMAT
22
3+ The binary format can be used to encode different data types, each with different fields. This
4+ document first describes the general format and then applies it to specific data types,
5+ including Trace Context and Tag Context.
6+
37## General Format
48Each encoding will have a 1 byte version followed by the version format encoding:
59
@@ -10,37 +14,47 @@ This will allow us to, in 1 deprecation cycle to completely switch to a new form
1014## Version Format (version_id = 0)
1115The version format for the version_id = 0 is based on ideas from proto encoding. The main
1216requirements are to allow adding and removing fields in less than 1 deprecation cycle. It
13- contains a list of repeated fields:
17+ contains a list of fields:
1418
1519` <field><field>... `
1620
1721### Field
22+ Each field is a 1-byte field ID paired with a field value, where the format of the field value is
23+ determined by both the field ID and the data type. For example, field 0 in ` Trace Context ` may
24+ have a completely different format than field 0 in ` Tag Context ` or field 1 in ` Trace Context ` .
25+
1826Each field that we send on the wire will have the following format:
1927
2028` <field_id><field_format> `
2129
2230* ` field_id ` is a single byte.
2331
24- * ` field_format ` must be defined for each metadata field separately, that means that for field_id
25- = 0 in trace context the field_value may have a completely different representation than the
26- field_id = 0 in the server-stats metadata.
32+ * ` field_format ` must be defined for each field separately.
33+
34+ The specification for a data type's format must also specify whether each field is single or
35+ repeated. For example, ` Trace-id ` in ` Trace Context ` in single, and ` String tag ` in ` Tag Context `
36+ is repeated. Every single field is optional. The specification for a data type's format MAY define
37+ a default value for any single field, which must be used when the field is missing.
2738
28- Each field is optional and MAY have defined a default value that can be used (if implementation
29- needs one) when the field is missing. Fields can be repeated, e.g. StringTag in the tagging example.
39+ The specification for a data type can define versions within a version of the format, called data
40+ type version, where each data type version adds new fields. The data type version can be useful
41+ for describing what fields an implementation supports, but it is not included in the
42+ serialized data.
3043
3144### Serialization Rules
32- Because each field has its own format that is not generically defined we are forced to always add
33- new field ids at the end. The serialization MUST ensure that fields are serialized in version
34- order (i.e. fields from version (i) must precede fields from version (i+1)). This ordering
35- allows old decoders to ignore any new fields even if they do not know the format for that field.
36- Systems that receive extra fields that they cannot decode MAY pass them on when possible (by
37- passing-through the whole opaque tail of bytes starting with the field id that the current
38- binary does not understand).
45+ Fields MUST be serialized in data type version order (i.e. all fields from version (i) of a data
46+ type must precede all fields from version (i+1)). That is because each field has its own format,
47+ and old implementations may not be able to determine where newer field values end. This ordering
48+ allows old decoders to ignore any new fields when they do not know the format for those fields.
49+ Fields within a data type version can be serialized in any order, and fields with the same field
50+ ID do not need to be serialized consecutively.
3951
4052### Deserialization Rules
41- Because all the fields will be decoded in the same order as they were defined/added, the
42- deserialization will simply read the encoded input until the end of the input (if no new fields
43- were received) or until the first unknown field_id.
53+ Because all the fields will be decoded in data type version order, the deserialization will
54+ simply read the encoded input until the end of the input or until the first unknown field_id.
55+ Implementations MAY pass on any fields that they cannot decode, when possible (by passing-through
56+ the whole opaque tail of bytes starting with the first field id that the current binary does not
57+ understand).
4458
4559### How can we add new fields?
4660If we follow the rules that we always append the new ids at the end of the buffer we can add up
0 commit comments