Skip to content

Conversation

Mojachieee
Copy link
Contributor

@Mojachieee Mojachieee commented Aug 7, 2025

fixes #7007

Copy link

codecov bot commented Aug 7, 2025

Codecov Report

❌ Patch coverage is 90.55118% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.9%. Comparing base (eda888f) to head (980b74b).

Files with missing lines Patch % Lines
exporters/otlp/otlptrace/otlptracegrpc/client.go 88.2% 8 Missing and 4 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@          Coverage Diff           @@
##            main   #7142    +/-   ##
======================================
  Coverage   82.9%   82.9%            
======================================
  Files        264     265     +1     
  Lines      24628   24754   +126     
======================================
+ Hits       20423   20535   +112     
- Misses      3822    3832    +10     
- Partials     383     387     +4     
Files with missing lines Coverage Δ
...rters/otlp/otlptrace/otlptracegrpc/internal/x/x.go 100.0% <100.0%> (ø)
exporters/otlp/otlptrace/otlptracegrpc/client.go 89.9% <88.2%> (-1.2%) ⬇️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@flc1125
Copy link
Member

flc1125 commented Aug 7, 2025

@flc1125
Copy link
Member

flc1125 commented Aug 7, 2025

The following indicators are missing some attributes (including unit tests):

https://github.com/open-telemetry/semantic-conventions/blob/v1.36.0/docs/otel/sdk-metrics.md#metric-otelsdkexporterspaninflight

  • otel.sdk.exporter.span.inflight
    • server.address
    • server.port
  • otel.sdk.exporter.span.exported
    • server.address
    • server.port
  • otel.sdk.exporter.operation.duration
    • server.address
    • server.port
    • rpc.grpc.status_code

@flc1125 flc1125 requested a review from Copilot August 7, 2025 15:17
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds experimental self-observability metrics to the OTLP trace gRPC exporter. When enabled via the OTEL_GO_X_SELF_OBSERVABILITY environment variable, the exporter will emit metrics tracking span export operations including in-flight spans, exported spans count, and operation duration.

  • Introduces an experimental feature flag system for enabling self-observability
  • Adds metric instrumentation to track span export operations in the gRPC exporter
  • Updates module dependencies to include required metric packages

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
exporters/otlp/otlptrace/otlptracegrpc/internal/x/x.go Implements experimental feature flag system for self-observability
exporters/otlp/otlptrace/otlptracegrpc/internal/x/x_test.go Tests for the experimental feature flag functionality
exporters/otlp/otlptrace/otlptracegrpc/client.go Adds metric instrumentation to the gRPC client for tracking export operations
exporters/otlp/otlptrace/otlptracegrpc/client_test.go Comprehensive tests for self-observability metrics functionality
exporters/otlp/otlptrace/otlptracegrpc/go.mod Updates dependencies to include metric packages
CHANGELOG.md Documents the new experimental feature

@Mojachieee
Copy link
Contributor Author

The following indicators are missing some attributes (including unit tests):

https://github.com/open-telemetry/semantic-conventions/blob/v1.36.0/docs/otel/sdk-metrics.md#metric-otelsdkexporterspaninflight

  • otel.sdk.exporter.span.inflight

    • server.address
    • server.port
  • otel.sdk.exporter.span.exported

    • server.address
    • server.port
  • otel.sdk.exporter.operation.duration

    • server.address
    • server.port
    • rpc.grpc.status_code

These are all added now

@Mojachieee Mojachieee requested a review from flc1125 August 8, 2025 12:26
@Mojachieee Mojachieee requested a review from flc1125 August 11, 2025 09:35
Copy link
Member

@flc1125 flc1125 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last remaining issue.

Also, please resolve the conflicts.

Finally: Thank you for your contribution.

Comment on lines +381 to +383
m := mp.Meter("go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc",
metric.WithInstrumentationVersion(sdk.Version()),
metric.WithSchemaURL(semconv.SchemaURL))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
m := mp.Meter("go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc",
metric.WithInstrumentationVersion(sdk.Version()),
metric.WithSchemaURL(semconv.SchemaURL))
m := mp.Meter(
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc",
metric.WithInstrumentationVersion(sdk.Version()),
metric.WithSchemaURL(semconv.SchemaURL),
)

Comment on lines +95 to +96
c.initSelfObservability()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flattening initSelfObservability seems appropriate. This is the only call site and this function is scoped to setup a new client, which includes telemetry.


defer func() {
duration := time.Since(start)
durationAttrs := make([]attribute.KeyValue, 0, len(c.selfObservabilityAttrs)+2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is allocated every call. A pool should be used to amortize the slice allocation.

Comment on lines +215 to +216
for _, ss := range ps.ScopeSpans {
spanCount += len(ss.Spans)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle nil values.

Suggested change
for _, ss := range ps.ScopeSpans {
spanCount += len(ss.Spans)
for _, ss := range ps.GetScopeSpans() {
spanCount += len(ss.GetSpans())

Comment on lines +223 to +236
durationAttrs := make([]attribute.KeyValue, 0, len(c.selfObservabilityAttrs)+2)
durationAttrs = append(durationAttrs, c.selfObservabilityAttrs...)
durationAttrs = append(durationAttrs,
c.operationDurationMetric.AttrRPCGRPCStatusCode(otelconv.RPCGRPCStatusCodeAttr(status.Code(err))))

exportedAttrs := make([]attribute.KeyValue, 0, len(c.selfObservabilityAttrs)+1)
exportedAttrs = append(exportedAttrs, c.selfObservabilityAttrs...)

if err != nil {
// Try to extract the underlying gRPC status error, if there is one
rootErr := err
if s, ok := status.FromError(err); ok {
rootErr = s.Err()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exact allocations can be made here for the cost of a few more branches which is worth it.

Suggested change
durationAttrs := make([]attribute.KeyValue, 0, len(c.selfObservabilityAttrs)+2)
durationAttrs = append(durationAttrs, c.selfObservabilityAttrs...)
durationAttrs = append(durationAttrs,
c.operationDurationMetric.AttrRPCGRPCStatusCode(otelconv.RPCGRPCStatusCodeAttr(status.Code(err))))
exportedAttrs := make([]attribute.KeyValue, 0, len(c.selfObservabilityAttrs)+1)
exportedAttrs = append(exportedAttrs, c.selfObservabilityAttrs...)
if err != nil {
// Try to extract the underlying gRPC status error, if there is one
rootErr := err
if s, ok := status.FromError(err); ok {
rootErr = s.Err()
}
rootErr := err
// Extract the underlying gRPC status error, if there is one.
if s, ok := status.FromError(err); ok {
rootErr = s.Err()
}
n := len(c.selfObservabilityAttrs)
var durationAttrs, exportedAttrs []attribute.KeyValue
if rootErr != nil {
durationAttrs = make([]attribute.KeyValue, n, n+2)
exportedAttrs = make([]attribute.KeyValue, n, n+1)
} else {
durationAttrs = make([]attribute.KeyValue, n, n+1)
exportedAttrs = make([]attribute.KeyValue, n, n)
}
_ = copy(durationAttrs, c.selfObservabilityAttrs)
scAttr := c.operationDurationMetric.AttrRPCGRPCStatusCode(otelconv.RPCGRPCStatusCodeAttr(status.Code(err)))
durationAttrs = append(durationAttrs, scAttr)
_ = copy(exportedAttrs, c.selfObservabilityAttrs)
if err != nil {

Comment on lines +445 to +446
// nextExporterID returns a new unique ID for an exporter.
// the starting value is 0, and it increments by 1 for each call.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
// nextExporterID returns a new unique ID for an exporter.
// the starting value is 0, and it increments by 1 for each call.
// nextExporterID returns a monotonically increasing int64 starting at 0

DataPoints: []metricdata.HistogramDataPoint[float64]{
{
Attributes: attribute.NewSet(
semconv.OTelComponentName("otlp_grpc_span_exporter/1"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relies on test execution order. It is brittle and will break when test are run in parallel or new cases are added. The generator needs to be reset per test case or this needs to not be evaluated as strictly.

Comment on lines +655 to +657
if tt.enabled {
t.Setenv("OTEL_GO_X_SELF_OBSERVABILITY", "true")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two test cases and this conditional splits them. They should be made into their own tests to just remove the complexity being added to accommodate everything here.

}

original := otel.GetMeterProvider()
defer otel.SetMeterProvider(original)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
defer otel.SetMeterProvider(original)
t.Cleanup(func() { otel.SetMeterProvider(original) })

@@ -286,3 +289,106 @@ func TestWithEndpointWithEnv(t *testing.T) {
})
}
}

func Test_getServerAttrs(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func Test_getServerAttrs(t *testing.T) {
func TestGetServerAttrs(t *testing.T) {

@pellared
Copy link
Member

pellared commented Sep 1, 2025

@Mojachieee PTAL #7272

@flc1125
Copy link
Member

flc1125 commented Sep 11, 2025

Hi, since this process involves specification adjustments and historical review records (which may contain invalid review suggestions), it’s impossible to tell which items need attention amid the large volume of information.

Could we create a new PR based on the current branch before preparing for the review, so that subsequent reviews can proceed more smoothly?

Thanks~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trace SDK observability - otlptracegrpc exporter metrics
4 participants