[feat] Restructure the trace/telemetry interface by semantic role#4910
Conversation
The pre-existing `trace` block on the `/run` request mixed four different roles under one bucket: per-call propagation context (traceparent/baggage), exporter config (endpoint), exporter credential (authorization), and capture policy (captureContent). Split it by role into `context.propagation` (W3C trace-context) and `telemetry` (capture policy + OTLP exporter with the credential under the exporter's standard `authorization` header). Behavior is identical; this is a shape change to the wire contract. Updated in lockstep: protocol.ts, wire.py, wire_models.py, dtos.py, both golden fixtures, both wire-contract tests, the DTO unit test, and the two live OTLP readers (engines/sandbox_agent.ts, engines/sandbox_agent/pi-assets.ts). The internal tracer (tracing/otel.ts) and its `captureContent` field are unchanged. Also addresses a CodeRabbit follow-up on #4892: `run_context()` now captures the workflow identity and the trace identity as independent failure domains, so a workflow-extraction failure no longer drops an otherwise-valid `runContext.trace`. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughSummary by CodeRabbit
WalkthroughThe Changestrace → context + telemetry wire contract split
run_context() independent failure domains
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
🤖 The AI agent says: This is the deferred "Fork A" from #4892 (the trace/telemetry restructure you greenlit with "yes please create another pr to target this"), plus the open CodeRabbit follow-up on that PR. What I need reviewed (a design + contract check, not a UX check):
Deliberately deferred (flagged, not done): the stale Gates green locally: SDK agents 441, service agent 45 (+3 new), TS typecheck + 309 vitest. Not merging. |
|
@coderabbitai review |
✅ Action performedReview finished.
|
Context
The
/runrequest carries atraceblock on every agent run. It mixed four different roles under one bucket — the classic interface-design failure thedesign-interfacesskill is built around:Four roles, one feature label. This is the deferred "Fork A" from #4892 — the proposed shape was posted on that thread and Mahmoud greenlit it ("yes please create another pr to target this"). This PR is shape-only: behavior is identical.
What changed
Split the block by semantic role into
context(per-call protocol context) andtelemetry(operator-owned config + policy + credential):Both keys come from one service-side capture (
trace_context()); both arenullwhen the run has no trace context (the prior singletrace: null). The keys are updated in lockstep across the wire and both OTLP readers:protocol.ts—TraceContextreplaced byPropagation/RequestContext/Telemetry/OtlpExporter;AgentRunRequest.trace→context+telemetry.dtos.py(TraceContext.context_to_wire()/telemetry_to_wire()),utils/wire.py(request_to_wire),wire_models.py(WirePropagation/WireRequestContext/WireCapture/WireOtlpExporter/WireExporters/WireTelemetry).engines/sandbox_agent.ts(the tracer init) andengines/sandbox_agent/pi-assets.ts(the Pi extension OTLP env).The internal tracer (
tracing/otel.ts) and itscaptureContentfield are not touched — that is downstream of the readers and not part of the wire.CodeRabbit follow-up (from #4892)
run_context()inservices/oss/src/agent/tracing.pytreated workflow and trace capture as one failure domain — a failure reading the workflow references would drop an otherwise-validrunContext.trace. They are now captured as independent failure domains, so a trace-only run still shipsrunContext.trace. Addedtest_tracing.pycovering both directions.Interface reference (old → new)
trace.*)traceparent,baggagecontext.propagation.{traceparent,baggage}captureContenttelemetry.capture.content.enabledendpointtelemetry.exporters.otlp.endpointauthorizationtelemetry.exporters.otlp.headers.authorizationdesign-interfaces rationale
Each field now lives under a parent that answers the same question it does: standard W3C names (
traceparent/baggage) kept verbatim undercontext.propagation; the credential nested under the exporter'sheaders(under the thing it authenticates) rather than free-floating; the capture policy expressed as an allowlist-styleenabledflag, not a sibling of the data;exportersplural so a second exporter can be added without reshaping.context(telemetry propagation, read by the runner's OTLP exporter) is intentionally kept distinct fromrunContext(the run's own resource identity, read by toolcall.contextbinding) — different roles, different consumers.Scope / risk
captureContent-default-on semantics (!== false) and the OTLP env wiring are preserved exactly. Standalone runs still serializecontext: null, telemetry: null.TraceContextobject throughSessionConfig→Backend.create_session→request_to_wire; only the serialization (the wire) is role-split. This keeps the change to the contract, not the call graph.projects/runner-interface/README.mdshows the old shape but is already stale (it still has the removedbackendfield and an old golden filename); a DTO rename (TraceContextis now slightly misnamed) is left as optional cleanup. Flagged, not done, to keep the diff a clean shape change.How to QA
Expected (verified locally): SDK agents 441 passed; service agent 45 passed (3 new in
test_tracing.py); TS typecheck clean, 309 vitest passed (incl.wire-contract.test.ts,sandbox-agent-pi-assets.test.ts,tool-direct.test.ts). The contract tests pin both sides of the new shape and that the legacytracekey is gone; the pi-assets test confirms telemetry still flows to the OTLP env (TRACEPARENT,OTEL_EXPORTER_OTLP_TRACES_ENDPOINT,OTEL_EXPORTER_OTLP_HEADERS).Do not merge.
https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc