test(trace-normalization): add microbenchmarks for tag/metric/truncate normalization#2124
test(trace-normalization): add microbenchmarks for tag/metric/truncate normalization#2124yannham wants to merge 6 commits into
Conversation
📚 Documentation Check Results📦
|
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
🔒 Cargo Deny Results📦
|
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: 302189a | Docs | Datadog PR Page | Give us feedback! |
ffb711c to
f8e4abb
Compare
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
…e normalization Extend the existing criterion bench to cover the per-char UTF-8 state machines that run on every ingested span but were previously unmeasured: `normalize_tag` (ASCII / mixed-unicode / over-length), `normalize_metric_name`, `truncate_utf8` (UTF-8 boundary walk-back), and `normalize_span_start_duration` (quantifying the SystemTime read on the year-2000 path). Adds a `bench-internals` feature, mirroring `libdd-sampling`, to expose the otherwise-private `normalize_metric_name`/`truncate_utf8` without changing the shipped public API. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
f8e4abb to
bd85696
Compare
Convert the bench from a single-call `b.iter` to the batched `iter_batched_ref` + 1000-element inner loop used by the other benches in this file. The previous form set `throughput(Elements(1000))` and `SamplingMode::Flat` but measured one call per iteration, so the throughput number was meaningless and the ns-scale "clean" path was swamped by timer overhead. The batch is rebuilt in untimed setup because the function mutates its inputs in place: on the year-2000 path the first call rewrites `start` to a recent timestamp, which would make a second call on the same value skip the clock branch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7834e6f902
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…rk script The normalization_utils bench target uses required-features = ["bench-internals"], so Cargo silently skips it unless that feature is explicitly activated. Add libdd-trace-normalization/bench-internals to the --features list in run_benchmarks_ci.sh so the new (and existing) normalization benchmarks appear in CI results.
What does this PR do?
Extends the existing criterion bench (
libdd-trace-normalization/benches/normalization_utils.rs) to cover the normalization functions that were previously unmeasured, all of which run on every ingested span:normalize_tag: the heaviest function (per-codepoint UTF-8 scan + char-class state machine). Benched on ASCII fast-path, mixed/illegal-char, unicode (codepoint slow path), and over-length (>MAX_TAG_LEN) inputs.normalize_metric_name: similar complexity with one-byte lookahead. Clean, separator-collapsing, and over-length cases.truncate_utf8: over-length ASCII plus a multi-byte (3-byte) input where the limit lands mid-codepoint and forces the boundary walk-back.normalize_span_start_duration:cleanvsneeds-clockcases to quantify theSystemTimeread on the pre-year-2000 path.A
bench-internalscargo feature is added (mirroringlibdd-sampling) to expose the otherwise-private benched functions, without changing the shipped public API. The[[bench]]now requires this feature.Motivation
Normalization runs on every span, but the existing bench skipped the expensive per-char UTF-8 state machines. These are the functions most likely to show up as a per-span tax, so they are worth tracking.
Additional Notes
iter_batched_refover 1000 owned copies,Throughput(Elements),black_boxon inputs).bench-internals.How to test the change?
🤖 (Partly) Generated with Claude Code