Skip to content

refactor(telemetry): migrate stream_with_chunking telemetry to plugin hooks#1361

Open
ajbozarth wants to merge 3 commits into
generative-computing:mainfrom
ajbozarth:feat/1290-telemetry-stream-plugin-hooks
Open

refactor(telemetry): migrate stream_with_chunking telemetry to plugin hooks#1361
ajbozarth wants to merge 3 commits into
generative-computing:mainfrom
ajbozarth:feat/1290-telemetry-stream-plugin-hooks

Conversation

@ajbozarth

Copy link
Copy Markdown
Contributor

Pull Request

Issue

Fixes #1290

Description

Migrates stream_with_chunking telemetry to plugin hooks as part of the Phase 2 migration: moves the orchestration span and its streaming metrics off direct mellea.telemetry calls and onto plugin hooks. With this change, no async path in mellea/stdlib/ calls the old trace_application / set_span_* API, so the deprecated helpers can finally be removed.

Span and metrics on hooks. New streaming hook types in mellea/plugins/types.py (with payload dataclasses in mellea/plugins/hooks/streaming.py) carry the data previously stamped inline — chunk index, requirement counts, pass/fail, exceptions — keyed by a streaming_id so pre/post hooks pair cleanly across the _run_async_in_thread tasks. A new StreamingTracingPlugin subscribes to these hooks and emits the orchestration span via internal helpers in mellea/telemetry/tracing.py; the existing metrics plugins gain streaming subscriptions, replacing the inline record_* calls. Metrics record fire-and-forget so they never block the stream. Spans, events, and metric names/attributes match main — no observability regression — and the span still nests correctly under the action span when called via a session.

Cleanup. With streaming migrated, the four deprecated public tracing helpers (trace_application, set_span_attribute, set_span_error, set_span_status_error) are removed; the internal setter logic moves to _tracing_setters.py.

Bug fix surfaced during migration. The backend chat span attaches in the caller task but finishes in the orchestration task that drains the MOT, so its OTel context detach crossed asyncio tasks and failed — leaving the generation span ambient and mis-nesting validation chat spans under generation instead of as its sibling. The fix adds orchestration-task hooks that re-attach the stream_with_chunking span as ambient context for the drain/validate loop, and _safe_detach now skips cross-task detaches within a reattach scope. See 6c3de9e5 for details.

Follow-up bug fix. functional.py now catches BaseException (not just Exception) in the component error path, so CancelledError/KeyboardInterrupt fire COMPONENT_POST_ERROR (closing the action span) before propagating. Caught while writing this PR; the handler re-raises, so nothing is swallowed.

Docs. docs/docs/observability/tracing.md documents the stream_with_chunking span (attributes + lifecycle events) and shows the two chat spans as siblings under it.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code was added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

Adding a new component, requirement, sampling strategy, or tool?

If your PR adds or modifies one of the types below, check the matching box. A checklist of type-specific review items will be posted as a comment.

  • Component
  • Requirement
  • Sampling Strategy
  • Tool

NOTE: Please ensure you have an issue that has been acknowledged by a core contributor and routed you to open a pull request against this repository. Otherwise, please open an issue before continuing with this pull request.

… hooks (generative-computing#1290)

Move the stream_with_chunking orchestration span, its span events, and its
metrics off inline telemetry calls onto the STREAMING_START/EVENT/END plugin
hooks, completing the Phase 2 tracing migration. STREAMING_START fires in the
caller task and STREAMING_END from acomplete(), giving the streaming span
same-task attach/detach and correct stream_with_chunking > chat nesting.

Removes the four deprecated public tracing helpers (trace_application,
set_span_attribute, set_span_error, set_span_status_error).

Assisted-by: Claude Code
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
…ask detach

The backend `chat` span opened by stream_with_chunking attaches in the caller
task but finishes in the orchestration task that drains the MOT. Detaching its
OTel context token from that second task failed (logged by OTel as "Failed to
detach context"), and the failed detach left the generation span ambient, so a
subsequent validation `chat` span nested under generation instead of being its
sibling under stream_with_chunking.

Add STREAMING_ORCHESTRATION_START/END hooks fired on the orchestration task.
The tracing plugin uses them to re-attach the stream_with_chunking span as that
task's ambient context for the drain/validate loop, so mid-stream spans parent
correctly. _safe_detach skips a detach that would cross asyncio tasks: within a
reattach scope the skip is expected and logged at debug; otherwise the detach
runs so OTel surfaces its own error, preceded by a warning naming the mismatch.

Add unit coverage for the reattach helpers, the task-identity classification,
and both log paths; an integration test asserting both chat spans are siblings
under stream_with_chunking and carry no streaming events; and an e2e streaming
test. Document the stream_with_chunking span and its hierarchy.

Assisted-by: Claude Code
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
Assisted-by: Claude Code
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
@ajbozarth ajbozarth requested a review from a team as a code owner June 30, 2026 22:18
@github-actions github-actions Bot added the enhancement New feature or request label Jun 30, 2026
@ajbozarth ajbozarth self-assigned this Jun 30, 2026
@ajbozarth ajbozarth requested a review from jakelorocco June 30, 2026 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(telemetry): migrate stream_with_chunking orchestration span to plugin hooks

1 participant