Add OpenTelemetry OTLP exporter with full SDK support#218
Conversation
bad79e6 to
e6d05d8
Compare
a82c5e2 to
82249fe
Compare
|
|
||
| ### 3. Instrument Caching | ||
|
|
||
| **Implementation**: Thread-safe two-level locking pattern |
There was a problem hiding this comment.
I can't tell if this is internal to the stats library or external - if users need to know about this.
|
|
||
| ### Default: Cumulative Temporality | ||
|
|
||
| **Decision**: Use cumulative temporality for all metric instruments (Prometheus-compatible) |
There was a problem hiding this comment.
I think we should rewrite this to be aimed more at people who might be curious - it's weird to talk about a "decision" without a discussion of the tradeoffs that led to that
There was a problem hiding this comment.
Or maybe it just needs to be presented in reverse order
| - ✅ Resource detection | ||
| - ✅ Production-ready | ||
|
|
||
| 2. **Handler** (Legacy): Custom OTLP implementation |
There was a problem hiding this comment.
Do we have internal use of this?
If so I'd like to expand on why people shouldn't use this.
There was a problem hiding this comment.
Asked claude to search for usage and it didn't find any. Adding clear deprecation notice.
| // For gRPC: "localhost:4317" | ||
| // For HTTP: "http://localhost:4318" | ||
| // If empty, uses OTEL_EXPORTER_OTLP_ENDPOINT environment variable | ||
| Endpoint string |
There was a problem hiding this comment.
Note people frequently get tripped up between "Endpoint" and "EndpointURL" we should probably note the difference here and say this is explicitly "Endpoint"
| go 1.23.0 | ||
| go 1.24.0 | ||
|
|
||
| require ( |
There was a problem hiding this comment.
Should this really have its own separate go.mod? I guess so the other callers don't pull in all of the otel dependencies?
| // EndpointURL: "http://localhost:4318", | ||
| // }) | ||
| // | ||
| // Status: Alpha. This Handler is still in heavy development phase. Do not use |
There was a problem hiding this comment.
Let's remove this comment since this is now deprecated
| defer stats.Flush() | ||
|
|
||
| // Or use environment variables (simplest) | ||
| // export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 |
There was a problem hiding this comment.
This example switches from GRPC to HTTP in addition to switching from env vars to in memory - that's fine but let's be explicit this is making two changes (env var AND protocol) instead of just one
|
|
||
| ### Prometheus | ||
|
|
||
| The [github.com/segmentio/stats/v5/prometheus](https://godoc.org/github.com/segmentio/stats/v5/prometheus) package exposes an HTTP handler that serves metrics in Prometheus format. |
There was a problem hiding this comment.
This is dumb but for people who have never used Prometheus before, the pull model vs. push can be counterintuitive, could we just add a sentence explaining that's how this works? "Note that with Prometheus, the metric server will poll your client for changes - metrics are not pushed from a client to the server"
| // If zero or not set, uses the SDK default (60 seconds) | ||
| ExportInterval time.Duration | ||
|
|
||
| // ExportTimeout specifies the timeout for exports |
There was a problem hiding this comment.
I'd be more specific - "the maximum amount of time to wait for a request to the server to complete"
As written it could be confused with what ExportInterval does
|
|
||
| // HTTPOptions are additional options for HTTP protocol | ||
| // Only used when Protocol is ProtocolHTTPProtobuf | ||
| HTTPOptions []otlpmetrichttp.Option |
There was a problem hiding this comment.
My first question with this and GRPCOptions is "what options exist" can we link to the docs.
| // Set defaults for histogram configuration | ||
| if config.ExponentialHistogram { | ||
| if config.ExponentialHistogramMaxSize == 0 { | ||
| config.ExponentialHistogramMaxSize = 160 |
There was a problem hiding this comment.
let's pull into DefaultHistogramMaxSize const please
| config.ExponentialHistogramMaxSize = 160 | ||
| } | ||
| if config.ExponentialHistogramMaxScale == 0 { | ||
| config.ExponentialHistogramMaxScale = 20 |
| res := config.Resource | ||
| if res == nil { | ||
| var err error | ||
| res, err = resource.New(ctx, |
There was a problem hiding this comment.
let's put a timeout on this - we shouldn't hang forever because we couldn't get a resource
| } | ||
|
|
||
| default: | ||
| return nil, fmt.Errorf("unsupported protocol: %s", protocol) |
There was a problem hiding this comment.
| return nil, fmt.Errorf("unsupported protocol: %s", protocol) | |
| return nil, fmt.Errorf("unsupported protocol: %q", protocol) |
"%q" will make clear 'you passed the empty string' vs. 'we forgot to include the variable in the error message'
| case stats.Counter: | ||
| counter, err := meter.Int64Counter(name) | ||
| if err != nil { | ||
| log.Printf("stats/otlp: failed to create counter %s: %v", name, err) |
There was a problem hiding this comment.
same for all in this file
136c900 to
3441b0b
Compare
Implement a production-ready OpenTelemetry Protocol (OTLP) exporter using the official OpenTelemetry SDK, supporting both gRPC and HTTP transports, and deprecate the legacy alpha handler in preparation for v6. Features: - gRPC and HTTP/Protobuf protocol support - Counter, Gauge, and Histogram metric types - Optional exponential histogram aggregation - Configurable temporality (cumulative default, Prometheus-compatible) - Tag to attribute conversion - Thread-safe instrument caching Implementation: - Gauges use the native Float64Gauge instrument for instantaneous value recording - Background context for recording to avoid cancellation issues - Lock-free reads for instrument lookup in hot path - A single Meter is created once and reused across recordings Environment variables: - The transport protocol is resolved from OTEL_EXPORTER_OTLP_PROTOCOL, with OTEL_EXPORTER_OTLP_METRICS_PROTOCOL taking precedence; an unrecognized value is rejected. An explicit SDKConfig.Protocol always wins. We resolve only this variable ourselves because the otlpmetricgrpc/otlpmetrichttp exporters do not read the protocol selector. - The exporters read the remaining OTEL_EXPORTER_OTLP_* variables (endpoint, headers, timeout, compression, ...) themselves; programmatic overrides are available via WithEndpointURL, GRPCOptions, and HTTPOptions. - Resource attributes come from OTEL_RESOURCE_ATTRIBUTES/OTEL_SERVICE_NAME plus host and process detection. Cloud and Kubernetes detection is opt-in via the contrib/detectors/* packages. Config API: - SDKConfig.EndpointURL takes a full URL with scheme (http:// or https://); WithEndpointURL is used to avoid a known gRPC bug with the http:// scheme - ExportInterval and ExportTimeout fall back to SDK defaults (60s and 30s) when unset Deprecations: - Deprecate otlp.Handler (Alpha since 2022, minimal usage) - Deprecate otlp.HTTPClient - Deprecate otlp.NewHTTPClient() All will be removed in v6.0.0. Migration path provided in deprecation notices with code examples. Testing: - Unit tests and benchmarks for instrument handling and value conversion - Integration tests that export to an in-process gRPC OTLP collector and assert on the metrics received over the wire, covering protocol resolution precedence and the invalid-protocol error path Documentation: - Complete README with configuration examples - Cloud resource detector usage guides - Implementation notes explaining design decisions and temporality - Example code for common use cases - HISTORY.md release notes for v5.9.0 Performance: - Preallocate tag/attribute slices to their exact length and assign by index instead of appending, in the OTLP handlers and in the core stats.M and tagFuncMap.namedTagFuncs helpers Bumps version to 5.9.0. All tests pass. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Co-Authored-By: Kevin Burke <kburke@twilio.com>
3441b0b to
7e7070d
Compare
Summary
Adds production-ready OpenTelemetry Protocol (OTLP) exporter using the official OpenTelemetry SDK with comprehensive support for both gRPC and HTTP/Protobuf transports.
Features
✅ Dual Transport Support: gRPC and HTTP/Protobuf protocols
✅ Environment Variables: Full
OTEL_*environment variable support✅ Resource Detection: AWS (EC2, ECS, EKS, Lambda), GCP, Azure, K8s, host, process
✅ All Metric Types: Counter, Gauge, Histogram with proper semantics
✅ Tag Conversion: Automatic stats tags → OpenTelemetry attributes
✅ Production Ready: Thread-safe, tested, documented
Usage
Implementation Highlights
UpDownCounterwith delta calculation to maintain absolute value semantics (workaround until stable OTel SDK adds Gauge)Documentation
Testing
Changes
otlp/sdk_handler.go- Main OpenTelemetry SDK integrationotlp/sdk_handler_test.go- Comprehensive testsotlp/example_test.go- Usage examplesotlp/README.md- Complete documentationotlp/IMPLEMENTATION_NOTES.md- Design decisionsREADME.md- Added OpenTelemetry backend overviewHISTORY.md- Added v5.9.0 release notesversion/version.go- Bumped to 5.9.0otlp/go.mod- Added OpenTelemetry SDK dependenciesBackward Compatibility
✅ Fully backward compatible - This is a new feature addition that doesn't change existing APIs. The legacy
otlp.Handlerremains available for existing users.🤖 Generated with Claude Code