feat(v17): Implement tracing channels#4670
Conversation
Introduces src/diagnostics.ts which exposes a single public function, enableDiagnosticsChannel(dc), that APMs use to register a node:diagnostics_channel-compatible module with graphql-js. Channel names (graphql:parse, graphql:validate, graphql:execute, graphql:resolve, graphql:subscribe) are owned by graphql-js so multiple APM subscribers converge on the same cached TracingChannel instances. Structural MinimalChannel / MinimalTracingChannel / MinimalDiagnosticsChannel types describe the subset of the Node API graphql-js needs; no dependency on @types/node is introduced, and no runtime-specific import is added to the core. Subsequent commits wire emission sites into parse, validate, execute, subscribe, and the resolver path.
Wraps parse() with the graphql:parse channel when any subscribers are attached. The published context carries the Source (or string) handed to parse() so APM tools can record the GraphQL source on their span. Adds two shared helpers on src/diagnostics.ts used by every subsequent emission site: maybeTraceSync(name, ctxFactory, fn) maybeTracePromise(name, ctxFactory, fn) They share a shouldTrace gate that uses hasSubscribers !== false so runtimes without an aggregated hasSubscribers getter (notably Node 18, see nodejs/node#54470) still publish; Node's traceSync/tracePromise gate each sub-channel internally. Context construction is lazy through the factory so the no-subscriber path allocates nothing beyond the closure.
…gnostics_channel Adds an integrationTests/diagnostics/ project that installs the built graphql package, registers node:diagnostics_channel, and asserts that subscribers to graphql:parse receive the expected start/end/error lifecycle events. Complements the unit tests (which use a FakeDc) with coverage against a real TracingChannel. The in-tree eslint rules forbid node: imports and flag TracingChannel as experimental in src/; integrationTests/* overrides already allow the node: import, and a file-scope disable handles the experimental warning. The integration project pins engines to >=22 to match where graphql-js v17 is actively tested. This file grows as later channels (validate, execute, subscribe, resolve) are wired up.
Wraps the public validate() entry point with maybeTraceSync. The context carries both the schema being validated against and the parsed document, so APM tools can associate the span with a concrete GraphQL operation. validate() is synchronous so traceSync is appropriate; errors thrown by assertValidSchema or the visitor rethrow propagate through the channel's error sub-channel. Also extracts the in-memory TracingChannel fake used by parser-diagnostics into src/__testUtils__/fakeDiagnosticsChannel.ts so subsequent emission sites (validate, execute, subscribe, resolve) can reuse it without duplicating the lifecycle simulation.
Wraps the public execute / experimentalExecuteIncrementally / executeIgnoringIncremental / executeSubscriptionEvent entry points with maybeTraceMixed so subscribers see a single graphql:execute span per top-level operation invocation, including every per-event execution of a subscription stream. The context exposes the document, schema, variableValues, operationName, and operationType. operationName and operationType are lazy getters that only resolve the operation AST (getOperationAST) if a subscriber reads them, keeping the gate cheap for APMs that do not need them. The ValidatedExecutionArgs variant used by executeSubscriptionEvent uses the already-resolved operation from validation and therefore needs no lazy lookup; it carries the resolved operation in place of the (unavailable) document. Adds a shared maybeTraceMixed helper to src/diagnostics.ts for PromiseOrValue-returning functions. It delegates start / end / error to Node's traceSync (which also runs fn inside end.runStores for AsyncLocalStorage propagation) and adds the asyncStart / asyncEnd / error-on-rejection emission when the return value is a promise.
… shape maybeTraceMixed now publishes asyncStart immediately once it knows the operation is in-flight asynchronously and asyncEnd in a .finally once the promise settles, matching the shape subscribers expect from a tracePromise-style channel (asyncStart brackets the async tail rather than being paired with asyncEnd in a single microtask). Also brings the in-memory FakeTracingChannel in line with Node's actual traceSync behavior: Node sets ctx.result before publishing end, letting subscribers check isPromise(ctx.result) inside their end handler to decide whether asyncEnd will follow or the span is complete. The fake now does the same so unit tests aren't looser than real-Node behavior.
Wraps the public subscribe() entry point with maybeTraceMixed using the same ExecutionArgs-based context factory as graphql:execute. The subscription setup path may return sync (when the subscribe resolver is synchronous) or async (when the resolver returns a promise resolving to an AsyncIterable), so maybeTraceMixed's sync / promise branching fits exactly. Per-subscription-event executions continue to publish on graphql:execute via executeSubscriptionEvent, which was wired in the previous commit. The graphql:subscribe context therefore owns the document reference and covers the setup span; subscribers that need to correlate per-event execute spans to their parent subscription use AsyncLocalStorage context propagation provided by Channel.runStores.
Wraps the single resolver invocation site in Executor.executeField with
maybeTraceMixed so each field resolver call emits start/end (and the
async tail when the resolver returns a promise, or error when it
throws). The emission is inside the engine — no field.resolve mutation,
no schema walking, nothing stacks across multiple APM subscribers.
The published context captures the field-level metadata APM tools use
to name and attribute per-field spans:
- fieldName from info.fieldName
- parentType from info.parentType.name
- fieldType stringified info.returnType
- args resolver arguments, by reference
- isTrivialResolver (field.resolve === undefined), a hint for
subscribers that want to skip default property-access resolvers
- fieldPath lazy getter, serializes info.path only on first
read since it is O(depth) work and many subscribers depth-filter
without reading it
graphql:resolve is the noisiest channel (one event per field per
operation); all emission sites are gated on hasSubscribers before any
context is constructed, so it remains zero-cost when no APM is loaded.
maybeTraceMixed previously delegated start / end to traceSync and then attached its own .then / .finally outside that scope. That worked for event ordering but broke AsyncLocalStorage propagation: a subscriber that called channel.start.bindStore(als, ...) could read that store in its start and end handlers but not in asyncStart or asyncEnd, because those fired in a .then attached outside the start.runStores scope. Promise continuations inherit the AsyncLocalStorage context of the frame that attaches them, so the fix is to attach .then inside the start.runStores block. This mirrors Node's own TracingChannel.tracePromise structure exactly, just with an additional sync branch that short- circuits to [start, end] when fn returns synchronously. Also updates the FakeChannel helper to match: its runStores now publishes the context on entry (the behavior Node's Channel.runStores has), so the fake traceSync / tracePromise implementations match real Node's event counts without the old end.runStores workaround. Adds an integration test that binds a store on graphql:execute's start sub-channel and asserts every lifecycle handler sees it when a resolver returns a promise.
maybeTraceMixed was firing asyncStart and asyncEnd back-to-back inside the .then callback. That left no meaningful window between the two events: for APM subscribers, 'async work begins' and 'async work completes' fired in the same microtask, making the pair useless for measuring the async duration. Correct shape: asyncStart publishes synchronously as soon as we know the operation is in-flight asynchronously (right after end), and asyncEnd publishes in a .finally after settlement. Both the .then and .finally are attached inside start.runStores, so AsyncLocalStorage context is preserved end-to-end via async_hooks. Event order now: sync path: [start, end] async path: [start, end, asyncStart, ..., asyncEnd] async error: [start, end, asyncStart, error, asyncEnd] Mirrors the same fix in the FakeTracingChannel helper.
|
@logaretm is attempting to deploy a commit to the The GraphQL Foundation Team on Vercel. A member of the Team first needs to authorize it. |
There was a problem hiding this comment.
Pull request overview
This PR introduces opt-in diagnostics tracing channels for GraphQL core operations while keeping the codebase runtime-agnostic (no direct node:diagnostics_channel dependency) by requiring consumers to register a compatible module via enableDiagnosticsChannel().
Changes:
- Added
src/diagnostics.tswith minimal duck-typed interfaces andmaybeTrace*helpers, plus public exports fromsrc/index.ts. - Wrapped
parse,validate,execute/subscribe, and field resolution to emitgraphql:*tracing channel lifecycles with lazy context fields where appropriate. - Added unit tests (via a fake diagnostics channel) and a Node integration test project to validate real
node:diagnostics_channelbehavior.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/diagnostics.ts | New core diagnostics/tracing channel registration + trace helper implementations. |
| src/index.ts | Exposes enableDiagnosticsChannel and related public types from the package entrypoint. |
| src/language/parser.ts | Wraps parse() with tracing emission. |
| src/validation/validate.ts | Wraps validate() with tracing emission. |
| src/execution/execute.ts | Wraps execute()/subscribe() and related execution entrypoints with tracing + lazy operation metadata. |
| src/execution/Executor.ts | Wraps field resolver invocation to emit graphql:resolve lifecycle events with lazy fieldPath. |
| src/testUtils/fakeDiagnosticsChannel.ts | Adds fake tracing channel implementation to support deterministic unit tests. |
| src/tests/diagnostics-test.ts | Verifies registration behavior and channel identity semantics. |
| src/language/tests/parser-diagnostics-test.ts | Validates graphql:parse lifecycle emission. |
| src/validation/tests/validate-diagnostics-test.ts | Validates graphql:validate lifecycle emission including error paths. |
| src/execution/tests/execute-diagnostics-test.ts | Validates graphql:execute lifecycle emission for sync/async/error paths. |
| src/execution/tests/subscribe-diagnostics-test.ts | Validates graphql:subscribe emission for sync vs async subscription setup. |
| src/execution/tests/resolve-diagnostics-test.ts | Validates graphql:resolve emission for sync/async/error and lazy fieldPath. |
| resources/integration-test.ts | Adds the new diagnostics integration test project to the integration suite. |
| integrationTests/diagnostics/test.js | Real Node integration tests for channel lifecycles and ALS propagation. |
| integrationTests/diagnostics/package.json | Adds diagnostics integration test project metadata and node engine constraint. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "private": true, | ||
| "type": "module", | ||
| "engines": { | ||
| "node": ">=22.0.0" |
There was a problem hiding this comment.
This integration test project sets engines.node to >=22, but the main package supports Node 20+. Unless the test relies on Node 22-only diagnostics_channel APIs, consider lowering this to >=20 so the integration test better matches the supported Node range and doesn’t mask compatibility regressions.
| "node": ">=22.0.0" | |
| "node": ">=20.0.0" |
5333b30 to
faf327b
Compare
| * - `graphql:subscribe` | ||
| * - `graphql:resolve` | ||
| * | ||
| * Calling this repeatedly is safe: subsequent calls replace the stored |
There was a problem hiding this comment.
is a potential failure mode if this is for some reason called by different APMs with different values?
There was a problem hiding this comment.
As long as it is the actual diagnostic channel module, it will work fine. The failure mode is if one APM provides an implementation and a different one provides another. The diagnostic channel runtime implementations today are idempotent AFAIK so it doesn't matter as long everyone ends up on the same channels.
I think this is a highly specialized case, I initially considered either throwing to force only implementation over the other OR silently NOOP but I that favors whoever comes first which isn't ideal.
WDYT, should this be tweaked?
looping @Qard here as he might have some thoughts here.
This PR implements tracing channel support as discussed in #4629
The approach ensures that GraphQL has no dependency on the
node:diagnostic_channelsmodule in anyway and relies on the consumer of the library to provide it via anenableDiagnosticsChannelAPI.Most of the changes are tests, the actual source changes are mostly tame and introduces wrappers around pre-existing code paths.
API
GraphQL will vendor a minimal version of the tracing channel types, ideally i would've liked it to not have any of the deep channel specifics like
runStoresbut it was required to maintain the sync-or-async behavior we have today. The alternative is to normalize to async for all calls which isn't ideal and probably will have a perf implication.The implemented channels are:
graphql:parsesourcegraphql:validateschema,documentgraphql:executeschema,document,variableValues,operationName(lazy),operationType(lazy)graphql:executeschema,operation,variableValues,operationName,operationTypegraphql:subscribeschema,document,variableValues,operationName(lazy),operationType(lazy)graphql:resolvefieldName,parentType,fieldType,args,isTrivialResolver,fieldPath(lazy)Usage
The APM provider or the user can do:
I tested this locally in a dashboard and it produces waterfall structure like so with basic Sentry integration:
You can view that trace here.
closes #4629