fix(agent): land skill/error trace attrs under recognized ag.* namespaces (F-036)#4857
Conversation
…aces (F-036)
The Wave-1 tracing fix stamped skill and error attributes under ag.agent.skills.*
and ag.error.*, which Agenta's OTel ingest strict-whitelists against AgAttributes
and demotes to ag.unsupported.* (queryable but mis-namespaced). Move them to the
recognized free-form buckets ag.meta.skills.{loaded,count} and
ag.exception.{message,provider} so they land first-class with no api schema change.
Runner-only (services/agent/src/tracing/otel.ts); updates the otel-skills-error
unit tests and adds a namespace guard asserting no ag.agent.*/ag.error.*/
ag.unsupported.* keys.
Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Needs review — code review of a runner-only tracing namespace fix (F-036). What changed and what to look at:
Decision needed from you: confirm the "use recognized buckets, don't register new ones" call (D-015), and confirm that punting the builtins-in-loaded gap to the server-side platform-skill seeding workstream (D-016, out of Not a UX/functional check — this is observability-only and there is no |
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughSummary by CodeRabbit
WalkthroughTracing attributes for loaded skills and run errors were renamed to new namespaces in agent OTel code, and the matching unit tests were updated to assert the revised keys. ChangesTracing namespace updates
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Context
The Wave-1 agent tracing fix (PR #4855, F-029/F-030) added skill and error attributes to the agent span, but it stamped them as
ag.agent.skills.loaded/ag.agent.skills.countandag.error.message/ag.error.provider. Live QA (F-036) found these land underag.unsupported.*in the trace viewer (e.g.ag.unsupported.agent.skills.loaded,ag.unsupported.error.message).Root cause: Agenta's OTel ingest (
api/oss/src/core/tracing/utils/attributes.pyinitialize_ag_attributes) strict-whitelists top-levelag.*keys against theAgAttributespydantic model and relocates any unrecognized key into anag.unsupported.*bucket.agentanderrorare not fields on that model, so both attribute groups were demoted. The data was correct and queryable, just mis-namespaced.This change moves the attributes to existing recognized free-form
ag.*buckets, so they land first-class with no api/SDK schema change:ag.agent.skills.{loaded,count}->ag.meta.skills.{loaded,count}(ag.metais the recognized home for run/request metadata, the same place model/system/response info lands)ag.error.{message,provider}->ag.exception.{message,provider}(ag.exceptionis the recognized error bucket;messagemirrors the OTel exception event)Both
ag.metaandag.exceptionpass ingest validation untouched (they are free-formMeta/Datadicts), so no api-side schema change is needed. Registering net-newag.agent.*/ag.error.*namespaces would mean coordinated edits to the SDK pydantic model + the api validator + tests + back-compat — non-trivial for a pure-naming polish where the info is already present, so it was deliberately not done.Scope / risk
services/agent/src/tracing/otel.ts(three attribute-name changes across the Pi extension tracer and the sandbox-agent ACP tracer) plus its unit test./runwire change (the golden wire contract is untouched; skill names ride an internal env var, errors are span-side only).ag.unsupported.agent.skills.*orag.unsupported.error.*would need to read the new keys. No such query exists in the repo (grep of code + interface inventory is clean; only QA scratch docs referenced the old names, now updated).Not fixed here (reported, out of scope): builtins-in-loaded
F-036 also noted the forced
_agenta.*platform skill did not appear in theloadedlist on api_agentarun (only the author skill). This is NOT a runner trace bug: the runner faithfully stamps exactly the materializedrequest.skillsit receives. The platform default skill (_agenta.agenta-getting-started) is@ag.embed-ed only into the DEFAULT agent config template (services/oss/src/agent/schemas.py), so a custom config drops it and it never reaches the wire. The runner has no independent forced-skill installation and treatspi_agentalikepi_corefor skills, so it cannot stamp a skill it never received. Force-injecting the platform skill for everypi_agentarun regardless of config is a server-side concern (the platform-skill seeding "separate workstream" noted inharnesses.py/agenta_builtins.py) and lives outside this lane'sotel.tsboundary. Recorded as decision D-016.How to QA
Prerequisites: Node 24 on PATH; from
services/agent.uses recognized ag.* namespaces, never ag.agent.* / ag.error.* (F-036)test asserts the agent span carries noag.agent.*,ag.error.*, orag.unsupported.*keys, and thatag.meta.skills.loaded+ag.exception.messageare present.pi_agentaskill cell and an error cell, fetch the trace viaGET /api/preview/tracing/traces/{trace_id}, and grep span attributes — the loaded skills now appear underag.meta.skills.*and the error underag.exception.*, not underag.unsupported.*.Edge cases covered by tests: no skills -> attributes omitted;
recordErrorprovider falls back to the init model provider; local-Pi self-instrument path stamps Pi's own span (skills) and emits a standaloneagent_errorspan (error).https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc