Skip to content

feat(v17): Implement tracing channels#4670

Open
logaretm wants to merge 11 commits intographql:17.x.xfrom
logaretm:tracing-channel-support
Open

feat(v17): Implement tracing channels#4670
logaretm wants to merge 11 commits intographql:17.x.xfrom
logaretm:tracing-channel-support

Conversation

@logaretm
Copy link
Copy Markdown

@logaretm logaretm commented Apr 18, 2026

This PR implements tracing channel support as discussed in #4629

The approach ensures that GraphQL has no dependency on the node:diagnostic_channels module in anyway and relies on the consumer of the library to provide it via an enableDiagnosticsChannel API.

Most of the changes are tests, the actual source changes are mostly tame and introduces wrappers around pre-existing code paths.

API

GraphQL will vendor a minimal version of the tracing channel types, ideally i would've liked it to not have any of the deep channel specifics like runStores but it was required to maintain the sync-or-async behavior we have today. The alternative is to normalize to async for all calls which isn't ideal and probably will have a perf implication.

The implemented channels are:

Channel Context fields
graphql:parse source
graphql:validate schema, document
graphql:execute schema, document, variableValues, operationName (lazy), operationType (lazy)
graphql:execute schema, operation, variableValues, operationName, operationType
graphql:subscribe schema, document, variableValues, operationName (lazy), operationType (lazy)
graphql:resolve fieldName, parentType, fieldType, args, isTrivialResolver, fieldPath (lazy)

Usage

The APM provider or the user can do:

import dc from 'node:diagnostic_channels';
import { enableDiagnosticsChannel } from 'graphql';

enableDiagnosticsChannel(dc);

// Then they can subscribe to any of the 5 tracing channels that are available
// usually an APM would subscribe to all of them
dc.tracingChannel('graphql:resolve').subscribe({
  start: () => {
    // start a span or log or whatever....
  },
  end: ({ result }) => {
    // if result is present then it is a sync resolve, otherwise they should end the span in asyncEnd
  },
  asyncEnd: ({ result }) => {
    // end the span
  },
  error: ({ error }) => {
    // report resolve errors
  }
});

I tested this locally in a dashboard and it produces waterfall structure like so with basic Sentry integration:

CleanShot 2026-04-18 at 12 30 49@2x

You can view that trace here.

closes #4629

logaretm added 10 commits April 17, 2026 12:39
Introduces src/diagnostics.ts which exposes a single public function,
enableDiagnosticsChannel(dc), that APMs use to register a
node:diagnostics_channel-compatible module with graphql-js. Channel
names (graphql:parse, graphql:validate, graphql:execute, graphql:resolve,
graphql:subscribe) are owned by graphql-js so multiple APM subscribers
converge on the same cached TracingChannel instances.

Structural MinimalChannel / MinimalTracingChannel / MinimalDiagnosticsChannel
types describe the subset of the Node API graphql-js needs; no dependency
on @types/node is introduced, and no runtime-specific import is added to
the core. Subsequent commits wire emission sites into parse, validate,
execute, subscribe, and the resolver path.
Wraps parse() with the graphql:parse channel when any subscribers are
attached. The published context carries the Source (or string) handed to
parse() so APM tools can record the GraphQL source on their span.

Adds two shared helpers on src/diagnostics.ts used by every subsequent
emission site:

  maybeTraceSync(name, ctxFactory, fn)
  maybeTracePromise(name, ctxFactory, fn)

They share a shouldTrace gate that uses hasSubscribers !== false so
runtimes without an aggregated hasSubscribers getter (notably Node 18,
see nodejs/node#54470) still publish; Node's traceSync/tracePromise gate
each sub-channel internally. Context construction is lazy through the
factory so the no-subscriber path allocates nothing beyond the closure.
…gnostics_channel

Adds an integrationTests/diagnostics/ project that installs the built
graphql package, registers node:diagnostics_channel, and asserts that
subscribers to graphql:parse receive the expected start/end/error
lifecycle events. Complements the unit tests (which use a FakeDc) with
coverage against a real TracingChannel.

The in-tree eslint rules forbid node: imports and flag TracingChannel as
experimental in src/; integrationTests/* overrides already allow the
node: import, and a file-scope disable handles the experimental warning.
The integration project pins engines to >=22 to match where graphql-js
v17 is actively tested.

This file grows as later channels (validate, execute, subscribe,
resolve) are wired up.
Wraps the public validate() entry point with maybeTraceSync. The context
carries both the schema being validated against and the parsed document,
so APM tools can associate the span with a concrete GraphQL operation.

validate() is synchronous so traceSync is appropriate; errors thrown by
assertValidSchema or the visitor rethrow propagate through the channel's
error sub-channel.

Also extracts the in-memory TracingChannel fake used by parser-diagnostics
into src/__testUtils__/fakeDiagnosticsChannel.ts so subsequent emission
sites (validate, execute, subscribe, resolve) can reuse it without
duplicating the lifecycle simulation.
Wraps the public execute / experimentalExecuteIncrementally /
executeIgnoringIncremental / executeSubscriptionEvent entry points with
maybeTraceMixed so subscribers see a single graphql:execute span per
top-level operation invocation, including every per-event execution of
a subscription stream.

The context exposes the document, schema, variableValues, operationName,
and operationType. operationName and operationType are lazy getters that
only resolve the operation AST (getOperationAST) if a subscriber reads
them, keeping the gate cheap for APMs that do not need them. The
ValidatedExecutionArgs variant used by executeSubscriptionEvent uses the
already-resolved operation from validation and therefore needs no lazy
lookup; it carries the resolved operation in place of the (unavailable)
document.

Adds a shared maybeTraceMixed helper to src/diagnostics.ts for
PromiseOrValue-returning functions. It delegates start / end / error to
Node's traceSync (which also runs fn inside end.runStores for
AsyncLocalStorage propagation) and adds the asyncStart / asyncEnd /
error-on-rejection emission when the return value is a promise.
… shape

maybeTraceMixed now publishes asyncStart immediately once it knows the
operation is in-flight asynchronously and asyncEnd in a .finally once the
promise settles, matching the shape subscribers expect from a
tracePromise-style channel (asyncStart brackets the async tail rather
than being paired with asyncEnd in a single microtask).

Also brings the in-memory FakeTracingChannel in line with Node's actual
traceSync behavior: Node sets ctx.result before publishing end, letting
subscribers check isPromise(ctx.result) inside their end handler to
decide whether asyncEnd will follow or the span is complete. The fake
now does the same so unit tests aren't looser than real-Node behavior.
Wraps the public subscribe() entry point with maybeTraceMixed using the
same ExecutionArgs-based context factory as graphql:execute. The
subscription setup path may return sync (when the subscribe resolver is
synchronous) or async (when the resolver returns a promise resolving to
an AsyncIterable), so maybeTraceMixed's sync / promise branching fits
exactly.

Per-subscription-event executions continue to publish on graphql:execute
via executeSubscriptionEvent, which was wired in the previous commit.
The graphql:subscribe context therefore owns the document reference and
covers the setup span; subscribers that need to correlate per-event
execute spans to their parent subscription use AsyncLocalStorage context
propagation provided by Channel.runStores.
Wraps the single resolver invocation site in Executor.executeField with
maybeTraceMixed so each field resolver call emits start/end (and the
async tail when the resolver returns a promise, or error when it
throws). The emission is inside the engine — no field.resolve mutation,
no schema walking, nothing stacks across multiple APM subscribers.

The published context captures the field-level metadata APM tools use
to name and attribute per-field spans:

  - fieldName        from info.fieldName
  - parentType       from info.parentType.name
  - fieldType        stringified info.returnType
  - args             resolver arguments, by reference
  - isTrivialResolver (field.resolve === undefined), a hint for
    subscribers that want to skip default property-access resolvers
  - fieldPath        lazy getter, serializes info.path only on first
    read since it is O(depth) work and many subscribers depth-filter
    without reading it

graphql:resolve is the noisiest channel (one event per field per
operation); all emission sites are gated on hasSubscribers before any
context is constructed, so it remains zero-cost when no APM is loaded.
maybeTraceMixed previously delegated start / end to traceSync and then
attached its own .then / .finally outside that scope. That worked for
event ordering but broke AsyncLocalStorage propagation: a subscriber
that called channel.start.bindStore(als, ...) could read that store in
its start and end handlers but not in asyncStart or asyncEnd, because
those fired in a .then attached outside the start.runStores scope.

Promise continuations inherit the AsyncLocalStorage context of the
frame that attaches them, so the fix is to attach .then inside the
start.runStores block. This mirrors Node's own TracingChannel.tracePromise
structure exactly, just with an additional sync branch that short-
circuits to [start, end] when fn returns synchronously.

Also updates the FakeChannel helper to match: its runStores now
publishes the context on entry (the behavior Node's Channel.runStores
has), so the fake traceSync / tracePromise implementations match real
Node's event counts without the old end.runStores workaround.

Adds an integration test that binds a store on graphql:execute's start
sub-channel and asserts every lifecycle handler sees it when a resolver
returns a promise.
maybeTraceMixed was firing asyncStart and asyncEnd back-to-back inside
the .then callback. That left no meaningful window between the two
events: for APM subscribers, 'async work begins' and 'async work
completes' fired in the same microtask, making the pair useless for
measuring the async duration.

Correct shape: asyncStart publishes synchronously as soon as we know
the operation is in-flight asynchronously (right after end), and
asyncEnd publishes in a .finally after settlement. Both the .then and
.finally are attached inside start.runStores, so AsyncLocalStorage
context is preserved end-to-end via async_hooks.

Event order now:
  sync path:   [start, end]
  async path:  [start, end, asyncStart, ..., asyncEnd]
  async error: [start, end, asyncStart, error, asyncEnd]

Mirrors the same fix in the FakeTracingChannel helper.
Copilot AI review requested due to automatic review settings April 18, 2026 04:22
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 18, 2026

@logaretm is attempting to deploy a commit to the The GraphQL Foundation Team on Vercel.

A member of the Team first needs to authorize it.

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Apr 18, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces opt-in diagnostics tracing channels for GraphQL core operations while keeping the codebase runtime-agnostic (no direct node:diagnostics_channel dependency) by requiring consumers to register a compatible module via enableDiagnosticsChannel().

Changes:

  • Added src/diagnostics.ts with minimal duck-typed interfaces and maybeTrace* helpers, plus public exports from src/index.ts.
  • Wrapped parse, validate, execute/subscribe, and field resolution to emit graphql:* tracing channel lifecycles with lazy context fields where appropriate.
  • Added unit tests (via a fake diagnostics channel) and a Node integration test project to validate real node:diagnostics_channel behavior.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/diagnostics.ts New core diagnostics/tracing channel registration + trace helper implementations.
src/index.ts Exposes enableDiagnosticsChannel and related public types from the package entrypoint.
src/language/parser.ts Wraps parse() with tracing emission.
src/validation/validate.ts Wraps validate() with tracing emission.
src/execution/execute.ts Wraps execute()/subscribe() and related execution entrypoints with tracing + lazy operation metadata.
src/execution/Executor.ts Wraps field resolver invocation to emit graphql:resolve lifecycle events with lazy fieldPath.
src/testUtils/fakeDiagnosticsChannel.ts Adds fake tracing channel implementation to support deterministic unit tests.
src/tests/diagnostics-test.ts Verifies registration behavior and channel identity semantics.
src/language/tests/parser-diagnostics-test.ts Validates graphql:parse lifecycle emission.
src/validation/tests/validate-diagnostics-test.ts Validates graphql:validate lifecycle emission including error paths.
src/execution/tests/execute-diagnostics-test.ts Validates graphql:execute lifecycle emission for sync/async/error paths.
src/execution/tests/subscribe-diagnostics-test.ts Validates graphql:subscribe emission for sync vs async subscription setup.
src/execution/tests/resolve-diagnostics-test.ts Validates graphql:resolve emission for sync/async/error and lazy fieldPath.
resources/integration-test.ts Adds the new diagnostics integration test project to the integration suite.
integrationTests/diagnostics/test.js Real Node integration tests for channel lifecycles and ALS propagation.
integrationTests/diagnostics/package.json Adds diagnostics integration test project metadata and node engine constraint.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/diagnostics.ts Outdated
Comment thread src/__testUtils__/fakeDiagnosticsChannel.ts Outdated
"private": true,
"type": "module",
"engines": {
"node": ">=22.0.0"
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This integration test project sets engines.node to >=22, but the main package supports Node 20+. Unless the test relies on Node 22-only diagnostics_channel APIs, consider lowering this to >=20 so the integration test better matches the supported Node range and doesn’t mask compatibility regressions.

Suggested change
"node": ">=22.0.0"
"node": ">=20.0.0"

Copilot uses AI. Check for mistakes.
@logaretm logaretm force-pushed the tracing-channel-support branch from 5333b30 to faf327b Compare April 18, 2026 04:30
Comment thread src/diagnostics.ts
* - `graphql:subscribe`
* - `graphql:resolve`
*
* Calling this repeatedly is safe: subsequent calls replace the stored
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is a potential failure mode if this is for some reason called by different APMs with different values?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as it is the actual diagnostic channel module, it will work fine. The failure mode is if one APM provides an implementation and a different one provides another. The diagnostic channel runtime implementations today are idempotent AFAIK so it doesn't matter as long everyone ends up on the same channels.

I think this is a highly specialized case, I initially considered either throwing to force only implementation over the other OR silently NOOP but I that favors whoever comes first which isn't ideal.

WDYT, should this be tweaked?

looping @Qard here as he might have some thoughts here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants