Skip to content

Conversation

jackshirazi
Copy link
Contributor

closes #2149

@@ -159,7 +159,7 @@ public void checkProfilerWorking() {

private static void doSleep() {
try {
Thread.sleep(100);
Thread.sleep(300);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this has to be long enough for the inferred span code to observe it? WDYT if instead of sleeping you did something like

Span tx = tracer.spanBuilder("my-root").startSpan();
try (Scope scope = tx.makeCurrent()) {
  await().until(() -> !AutoConfiguredDataCapture.getSpans().isEmpty());
} finally {
  tx.end();
}

could that result in a more reliable test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inferred span would not be created before the span ends because that's how it infers the time. So that suggestion should timeout.

The current sequence should be

  1. profiler is started (we waited for that or assertion would have failed)
  2. we start a 100ms span
  3. the profiler is sampling at 5ms intervals so more than enough traces to infer a span (we only need 2, and we'd expect around 20 here)
  4. the span finishes and we generate first the span, then after a short delay the inferred span
  5. the await is waiting for both spans

Hmm, actually as I write this, it seems to me that the failure I am suggesting fixing would be a timeout failure. In fact shouldn't this should only fail with a timeout and the "size of" as a sub-reason why it timed out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inferred span would not be created before the span ends because that's how it infers the time. So that suggestion should timeout.

That is unfortunate. My experience is that short timeouts in tests can cause instability. In CI there may be things running in parallel which could also have a negative effect. I try with a 1s sleep.

the profiler is sampling at 5ms intervals so more than enough traces to infer a span (we only need 2, and we'd expect around 20 here)

according to output in https://scans.gradle.com/s/k77oi6bbrx2ew/tests/task/:inferred-spans:test/details/io.opentelemetry.contrib.inferredspans.InferredSpansAutoConfigTest/checkProfilerWorking()?page=eyJvdXRwdXQiOnsiMCI6MX19&top-execution=1 you got 0 stack traces

FINE: Parsing JFR chunk at offset	
Aug 20, 2025 8:51:13 PM io.opentelemetry.contrib.inferredspans.internal.SamplingProfiler processTraces	
FINE: Processing 0 stack traces	
Aug 20, 2025 8:51:13 PM io.opentelemetry.contrib.inferredspans.internal.SamplingProfiler$ActivationEvent handle	
FINE: Handling event timestamp=3,215,835,057,958 root=true threadId=1 activation=true	
Aug 20, 2025 8:51:14 PM io.opentelemetry.contrib.inferredspans.internal.SamplingProfiler$ActivationEvent startProfiling	
FINE: Create call tree (ImmutableSpanContext{traceId=7dcd14b33b36f375ace29d344a0a3118, spanId=824f6b362b41c0af, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}(clock-anchor: 1755719857533603501)) for thread 1	
Aug 20, 2025 8:51:14 PM io.opentelemetry.contrib.inferredspans.internal.SamplingProfiler$ActivationEvent handle	
FINE: Handling event timestamp=3,216,133,279,166 root=true threadId=1 activation=false	
Aug 20, 2025 8:51:14 PM io.opentelemetry.contrib.inferredspans.internal.SamplingProfiler$ActivationEvent stopProfiling	
FINE: End call tree (ImmutableSpanContext{traceId=7dcd14b33b36f375ace29d344a0a3118, spanId=824f6b362b41c0af, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}(clock-anchor: 1755719857533603501)) for thread 1	
Aug 20, 2025 8:51:14 PM io.opentelemetry.contrib.inferredspans.internal.SamplingProfiler$ActivationEvent stopProfiling	
FINE: Created no spans for thread 1 (count=0)	
Aug 20, 2025 8:51:14 PM io.opentelemetry.contrib.inferredspans.internal.SamplingProfiler processTraces	
FINE: Processing traces took 41,626us

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which implies that the profiler was suspended for the whole 100ms sleep. That's going to be rare but possible in the CI shared env world, and would explain why the error is rare. I'll bump the numbers even more, and apply to the other test failure as well

@laurit
Copy link
Contributor

laurit commented Aug 27, 2025

Note that the failure mentioned in #2149 (comment) https://gradle.com/s/uevvy2bsaau5k is in a different test

@jackshirazi jackshirazi marked this pull request as draft August 28, 2025 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sporadic failing test in inferred-spans
4 participants