Skip to content

Commit cd002f6

Browse files
authored
Add textproto (text format) encoding and decoding (#6)
The protobuf text format is the human-readable debug representation — useful for config files, golden-file tests, and logging. This implements the full format with 883/883 conformance suite coverage. Runtime (feature = "text", zero deps, no_std + alloc) - buffa::text module: TextFormat trait, TextEncoder, TextDecoder, encode_to_string / decode_from_str conveniences - Tokenizer handles all grammar quirks: both { } and < > delimiters, adjacent string concatenation, #-comments, [pkg.ext] bracket names, hex/octal/float number forms, nan/inf, C++-style bool literals - Decoder returns Cow<'a, str> for strings (zero-copy when no escapes) - \u escapes are BMP-only (surrogates rejected per spec; use \U for non-BMP) - Heuristic nested-message detection for unknown LD fields (speculative wire-format parse, C++ text_format.cc:2926 / Java TextFormat.java:87) TypeRegistry — unified type/extension registry - JsonRegistry → TypeRegistry with feature-split entry types: JsonAnyEntry, TextAnyEntry, JsonExtEntry, TextExtEntry each under their own cfg gate - json and text features are now independently enableable (previously json pulled in text because unified entry structs referenced TextEncoder) - Codegen emits one register_types() per file covering whichever formats were enabled — lines inside are emitted per generate_json/generate_text, no #[cfg] in generated code - Any expansion ([type.googleapis.com/...] { ... }) and extension brackets ([pkg.ext] { ... }) both consult TypeRegistry - Deprecated aliases: JsonRegistry, set_json_registry, AnyTypeEntry, ExtensionRegistryEntry Codegen - generate_text(bool) config option and buffa-build builder method - impl TextFormat emission in impl_text.rs, skip-gated for Any (hand-written in buffa-types) and MessageSet - Group-field naming: GroupLikeType { ... } (message type name) and groupliketype { ... } (lowercase/field name) both accepted on parse; type name emitted on encode - Separate __FOO_JSON_ANY / __FOO_TEXT_ANY consts per codegen flag - emit_register_fn config (default true) — gen_wkt_types sets false since seven include!'d WKT files would collide on register_types Conformance - Text suite: 0 → 883 passing (was entirely skipped), zero expected failures - Binary+JSON unchanged at 5549/5529/2797 - known_failures_text.txt added (empty — passes the full suite) - print_unknown_fields request field parsed and threaded to TextEncoder Docs & examples - guide.md "Text format (textproto)" section - README/CHANGELOG/CONTRIBUTING/migration-from-protobuf.md updated - addressbook example gains a dump command (encode_to_string_pretty) - Taskfile: build-examples, example-addressbook, example-envelope, example-logging tasks (USER_WORKING_DIR for file-path preservation)
1 parent 52e40a3 commit cd002f6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+8443
-1071
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,6 @@ fuzz/corpus/
1717

1818
# Local tooling extracted from the tools image (task install-protoc).
1919
/.local/
20+
21+
# Local editor settings
22+
/.claude/settings.local.json

CHANGELOG.md

Lines changed: 21 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,11 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
2323
### Deprecated
2424

2525
- **`set_any_registry`, `set_extension_registry`** — use
26-
`buffa::json_registry::set_json_registry` instead, which installs both halves
26+
`buffa::type_registry::set_type_registry` instead, which installs all maps
2727
in one call. The deprecated functions still work.
28+
- **`AnyTypeEntry``JsonAnyEntry`, `ExtensionRegistryEntry``JsonExtEntry`.**
29+
Type aliases for one release cycle. The text-format fields have moved to
30+
separate `TextAnyEntry` / `TextExtEntry` structs in `type_registry`.
2831

2932
### Added
3033

@@ -35,16 +38,29 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
3538
proto2 `[default = ...]` on extension declarations, and MessageSet wire
3639
format behind `CodeGenConfig::allow_message_set`. See the
3740
[Extensions section of the user guide](docs/guide.md#extensions-custom-options).
38-
- **`JsonRegistry`** — unified JSON registry covering both `Any` type entries
39-
and extension entries. Codegen emits `register_json(&mut JsonRegistry)` per
40-
file; call once per generated file, then `set_json_registry(reg)`.
41+
- **`TypeRegistry`** — unified registry covering `Any` type entries and
42+
extension entries for both JSON and text formats. Codegen emits
43+
`register_types(&mut TypeRegistry)` per file; call once per generated file,
44+
then `set_type_registry(reg)`. JSON entries (`JsonAnyEntry`, `JsonExtEntry`)
45+
and text entries (`TextAnyEntry`, `TextExtEntry`) live in feature-split
46+
maps so `json` and `text` are independently enableable.
4147
- **`JsonParseOptions::strict_extension_keys`** — error on unregistered `"[...]"`
4248
JSON keys (default: silently drop, matching pre-0.3 behavior for all unknown
4349
keys).
4450
- **Editions `features.message_encoding = DELIMITED`** — fully supported in
4551
codegen, previously parsed but ignored. Message fields with this feature use
4652
the group wire format (StartGroup/EndGroup) instead of length-prefixed.
47-
- **Conformance:** `TestAllTypesEdition2023` enabled; 5539 → 5549 passing (std).
53+
- **Text format (`textproto`)** — the `buffa::text` module provides
54+
`TextFormat` trait, `TextEncoder`, `TextDecoder`, and `encode_to_string` /
55+
`decode_from_str` conveniences. Enable with `features = ["text"]`
56+
(zero-dependency, `no_std`-compatible) and `Config::generate_text(true)`.
57+
Covers `Any` expansion (`[type.googleapis.com/...] { ... }`), extension
58+
brackets (`[pkg.ext] { ... }`), and group/DELIMITED naming. `Any` expansion
59+
and extension brackets consult the text maps in `TypeRegistry` — the `json`
60+
and `text` features are independently enableable. Passes the full
61+
text-format conformance suite (883/883).
62+
- **Conformance:** `TestAllTypesEdition2023` enabled; binary+JSON 5539 → 5549
63+
passing (std). Text format suite 0 → 883 passing (was entirely skipped).
4864

4965
## [0.2.0] - 2026-03-16
5066

CONTRIBUTING.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,13 +50,13 @@ task conformance # now uses the locally-built image
5050
(std, no_std, via-view), each producing two suites:
5151

5252
1. Binary + JSON suite — expects thousands of successes (~5500 std, ~5500 no_std, ~2800 via-view — view mode skips JSON)
53-
2. Text format suite — always `0 successes, 883 skipped` (text format is not supported)
53+
2. Text format suite — 883 successes for std and no_std (the full suite); via-view shows `0 successes, 883 skipped` (views have no `TextFormat` — textproto goes through the owned type via `to_owned_message()`)
5454

55-
So a healthy run shows **6 `CONFORMANCE SUITE PASSED` lines**. The `883 skipped` in the text format suites is expected and correct.
55+
So a healthy run shows **6 `CONFORMANCE SUITE PASSED` lines**.
5656

5757
The Dockerfile builds **two binaries**: one with default features (std) and one with `--no-default-features` (no_std). The via-view run reuses the std binary with `BUFFA_VIA_VIEW=1` set, routing binary input through `decode_view → to_owned_message → encode` to verify owned/view decoder parity.
5858

59-
**Expected failures** are listed in `conformance/known_failures.txt` (std), `conformance/known_failures_nostd.txt` (no_std), and `conformance/known_failures_view.txt` (via-view). When a previously-failing test starts passing, remove it from the relevant file; when a new test is expected to fail, add it.
59+
**Expected failures** are listed in `conformance/known_failures.txt` (std binary+JSON), `conformance/known_failures_nostd.txt` (no_std binary+JSON), `conformance/known_failures_view.txt` (via-view), and `conformance/known_failures_text.txt` (text format — shared between std and no_std; currently empty). The text list is passed via `--text_format_failure_list` since the runner validates each suite's list independently. When a previously-failing test starts passing, remove it from the relevant file; when a new test is expected to fail, add it.
6060

6161
**Capturing output**: To save per-run logs for analysis, mount a directory and set `CONFORMANCE_OUT`:
6262

README.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ A pure-Rust Protocol Buffers implementation with first-class [protobuf editions]
44

55
## Why buffa?
66

7-
The Rust ecosystem lacks an actively maintained, pure-Rust library that supports [protobuf editions](https://protobuf.dev/editions/overview/). Buffa fills that gap with a ground-up design that treats editions as the core abstraction. It passes all current binary and JSON protobuf serialization conformance tests.
7+
The Rust ecosystem lacks an actively maintained, pure-Rust library that supports [protobuf editions](https://protobuf.dev/editions/overview/). Buffa fills that gap with a ground-up design that treats editions as the core abstraction. It passes the full protobuf conformance suite — binary, JSON, and text — with zero expected failures.
88

99
## Features
1010

@@ -24,19 +24,18 @@ The Rust ecosystem lacks an actively maintained, pure-Rust library that supports
2424

2525
## Wire formats
2626

27-
buffa supports **binary** and **JSON** protobuf encodings:
27+
buffa supports **binary**, **JSON**, and **text** protobuf encodings:
2828

2929
- **Binary wire format** -- full support for all scalar types, nested messages, repeated/packed fields, maps, oneofs, groups, and unknown fields.
3030

3131
- **Proto3 JSON** -- canonical protobuf JSON mapping via optional `serde` integration. Includes well-known type serialization (Timestamp as RFC 3339, Duration as `"1.5s"`, int64/uint64 as quoted strings, bytes as base64, etc.).
3232

33-
**Text format (`textproto`) is not supported** and is not planned.
33+
- **Text format (`textproto`)** -- the human-readable debug format. Covers `Any` expansion (`[type.googleapis.com/...] { ... }`), extension bracket syntax (`[pkg.ext] { ... }`), and group/DELIMITED fields. `no_std`-compatible.
3434

3535
## Unsupported features
3636

3737
These are intentionally out of scope:
3838

39-
- **Text format (`textproto`)** — not planned. Binary and JSON are the wire formats that matter for RPC and storage.
4039
- **Runtime reflection** (`DynamicMessage`, descriptor-driven introspection) — not planned for 0.1. Buffa is a codegen-first library; if you need schema-agnostic processing, consider preserving unknown fields or using `Any`.
4140
- **Proto2 optional-field getter methods**`[default = X]` on `optional` fields does not generate `fn field_name(&self) -> T` unwrap-to-default accessors. Custom defaults are applied only to `required` fields via `impl Default`. Optional fields are `Option<T>`; use pattern matching or `.unwrap_or(X)`.
4241
- **Scoped `JsonParseOptions` in `no_std`** — serde's `Deserialize` trait has no context parameter, so runtime options must be passed through ambient state. In `std` builds, [`with_json_parse_options`] provides per-closure, per-thread scoping via a thread-local. In `no_std` builds, [`set_global_json_parse_options`] provides process-wide set-once configuration via a global atomic. The two APIs are mutually exclusive. The `no_std` global supports singular-enum accept-with-default but not repeated/map container filtering (which requires scoped strict-mode override).

Taskfile.yml

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -457,6 +457,51 @@ tasks:
457457
- rm -f src/gen/context.v1.context.rs src/gen/log.v1.log.rs src/gen/mod.rs
458458
- PATH="{{.ROOT_DIR}}/target/release:$PATH" buf generate
459459

460+
# The examples are independent cargo projects (own Cargo.toml, own target/),
461+
# not workspace members — they declare path deps on the workspace crates to
462+
# mirror a downstream consumer's setup.
463+
464+
build-examples:
465+
desc: Build all example binaries.
466+
cmds:
467+
- cargo build --manifest-path examples/addressbook/Cargo.toml
468+
- cargo build --manifest-path examples/envelope/Cargo.toml
469+
- cargo build --manifest-path examples/logging/Cargo.toml
470+
471+
example-envelope:
472+
desc: >-
473+
Run the extensions demo — binary + JSON roundtrip of custom options,
474+
[default = ...] values, extendee-mismatch panic. Self-contained, no args.
475+
dir: examples/envelope
476+
cmds:
477+
- cargo run
478+
479+
# `dir: {{.USER_WORKING_DIR}}` so file-path arguments resolve relative to
480+
# where `task` was invoked, not the Taskfile's directory. Task defaults to
481+
# running commands from the Taskfile location, which would make
482+
# `task example-addressbook -- dump book.pb` look for ./book.pb in the
483+
# repo root instead of the user's cwd.
484+
485+
example-addressbook:
486+
desc: >-
487+
Run the addressbook CLI. Pass subcommand + args after `--`:
488+
`task example-addressbook -- add book.pb` /
489+
`task example-addressbook -- list book.pb` /
490+
`task example-addressbook -- dump book.pb` (textproto).
491+
dir: '{{.USER_WORKING_DIR}}'
492+
cmds:
493+
- cargo run --manifest-path {{.ROOT_DIR}}/examples/addressbook/Cargo.toml -- {{.CLI_ARGS}}
494+
495+
example-logging:
496+
desc: >-
497+
Run the structured-logging CLI. Pass subcommand + file after `--`:
498+
`task example-logging -- write log.pb` /
499+
`task example-logging -- read log.pb` /
500+
`task example-logging -- filter log.pb WARN`.
501+
dir: '{{.USER_WORKING_DIR}}'
502+
cmds:
503+
- cargo run --manifest-path {{.ROOT_DIR}}/examples/logging/Cargo.toml -- {{.CLI_ARGS}}
504+
460505
build-plugin:
461506
desc: Build the protoc plugins in release mode.
462507
cmds:

buffa-build/src/lib.rs

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,17 @@ impl Config {
114114
self
115115
}
116116

117+
/// Enable or disable `impl buffa::text::TextFormat` on generated message
118+
/// structs (default: false).
119+
///
120+
/// When enabled, the downstream crate must enable the `buffa/text`
121+
/// feature for the runtime textproto encoder/decoder.
122+
#[must_use]
123+
pub fn generate_text(mut self, enabled: bool) -> Self {
124+
self.codegen_config.generate_text = enabled;
125+
self
126+
}
127+
117128
/// Enable or disable `#[derive(arbitrary::Arbitrary)]` on generated
118129
/// types (default: false).
119130
///

buffa-codegen/src/bin/gen_wkt_types.rs

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,11 +74,24 @@ fn main() {
7474
// hand-written in the *_ext.rs modules (Timestamp → RFC3339,
7575
// Duration → "3.000001s", Any → type-URL dispatch, etc.).
7676
// None of the WKTs use derive-serde.
77+
//
78+
// generate_text = true Textproto has no special WKT treatment
79+
// (unlike JSON), so the generated field-by-field impls are
80+
// correct. `buffa/text` is zero-dep — enabled unconditionally
81+
// in buffa-types so no feature-gate wrapping is needed.
82+
//
83+
// emit_register_fn = false All seven WKT files are `include!`d into
84+
// one namespace — seven `register_types` fns would collide. WKTs
85+
// register via the hand-written `register_wkt_types` in
86+
// `any_ext.rs` anyway. Per-message `__*_TEXT_ANY` consts are
87+
// still emitted (harmless `#[doc(hidden)] pub`).
7788
let mut config = buffa_codegen::CodeGenConfig::default();
7889
config.generate_views = true;
7990
config.preserve_unknown_fields = true;
8091
config.generate_arbitrary = true;
8192
config.generate_json = false;
93+
config.generate_text = true;
94+
config.emit_register_fn = false;
8295

8396
let files_to_generate: Vec<String> = WKT_PROTOS.iter().map(|s| s.to_string()).collect();
8497

0 commit comments

Comments
 (0)