Skip to content

Conversation

trask
Copy link
Member

@trask trask commented Sep 23, 2025

Investigating implementation options for open-telemetry/opentelemetry-specification#4645

TL;DR - I couldn't conjure up an implementation that satisfies "eventually visible" that was faster than just using volatile (immediately visible)

Java 17 Results

BooleanStateBenchmark

Benchmark Configuration Score Units
read_singleThread implementation=NonVolatileBooleanState 1.205 ±0.002 ns/op
read_singleThread implementation=ImmediateBooleanState 74.078 ±0.196 ns/op
read_singleThread implementation=EventualBooleanState 272.405 ±3.691 ns/op
read_singleThread implementation=VarHandleImmediateBooleanState 80.017 ±1.464 ns/op
read_singleThread implementation=VarHandleEventualBooleanState 269.313 ±0.210 ns/op
read_twoThreads implementation=NonVolatileBooleanState 1.218 ±0.014 ns/op
read_twoThreads implementation=ImmediateBooleanState 73.995 ±0.061 ns/op
read_twoThreads implementation=EventualBooleanState 799.889 ±2.235 ns/op
read_twoThreads implementation=VarHandleImmediateBooleanState 79.455 ±0.688 ns/op
read_twoThreads implementation=VarHandleEventualBooleanState 864.939 ±173.696 ns/op

Java 24 Results

BooleanStateBenchmark

Benchmark Configuration Score Units
read_singleThread implementation=NonVolatileBooleanState 1.217 ±0.024 ns/op
read_singleThread implementation=ImmediateBooleanState 59.877 ±0.072 ns/op
read_singleThread implementation=EventualBooleanState 270.300 ±1.612 ns/op
read_singleThread implementation=VarHandleImmediateBooleanState 79.709 ±0.160 ns/op
read_singleThread implementation=VarHandleEventualBooleanState 269.310 ±0.271 ns/op
read_twoThreads implementation=NonVolatileBooleanState 1.213 ±0.014 ns/op
read_twoThreads implementation=ImmediateBooleanState 60.002 ±0.057 ns/op
read_twoThreads implementation=EventualBooleanState 826.375 ±24.897 ns/op
read_twoThreads implementation=VarHandleImmediateBooleanState 81.311 ±2.690 ns/op
read_twoThreads implementation=VarHandleEventualBooleanState 792.749 ±19.175 ns/op

Copy link

codecov bot commented Sep 23, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.13%. Comparing base (1e763b2) to head (dd28501).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #7700   +/-   ##
=========================================
  Coverage     90.12%   90.13%           
- Complexity     7187     7192    +5     
=========================================
  Files           814      814           
  Lines         21700    21713   +13     
  Branches       2123     2127    +4     
=========================================
+ Hits          19557    19570   +13     
  Misses         1477     1477           
  Partials        666      666           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@trask trask force-pushed the eventually-visible-benchmark branch 4 times, most recently from b2204c8 to 2b1b386 Compare September 24, 2025 19:23
@trask trask marked this pull request as ready for review September 24, 2025 19:40
@trask trask requested a review from a team as a code owner September 24, 2025 19:40
@trask trask force-pushed the eventually-visible-benchmark branch from 2b1b386 to 78c668d Compare September 24, 2025 19:43
@trask trask force-pushed the eventually-visible-benchmark branch from 78c668d to e5706e0 Compare September 24, 2025 19:44
@jkwatson
Copy link
Contributor

So, non-volatile is the fastest, but we know may not satisfy "eventually visible", correct? Did your benchmark measure "time to visibility" across multiple threads, or just throughput with different implementations?

@trask
Copy link
Member Author

trask commented Sep 24, 2025

Did your benchmark measure "time to visibility" across multiple threads

the "eventual visibility" implementations rely on a non-volatile access counter to be "eventual", though that's also what destroys the performance

@jkwatson
Copy link
Contributor

Did your benchmark measure "time to visibility" across multiple threads

the "eventual visibility" implementations rely on a non-volatile access counter to be "eventual", though that's also what destroys the performance

I guess I'm asking what the "score" represents in the benchmarks... time to visibility?

@trask
Copy link
Member Author

trask commented Sep 24, 2025

I guess I'm asking what the "score" represents in the benchmarks... time to visibility?

ah, number of nanoseconds to perform 100 boolean reads on the same thread

this is where non-volatile shines because the JIT compiler can optimize it to do a single memory read

@jack-berg
Copy link
Member

If I'm reading the results correctly, the non-volatile is much much faster than everything else. This is what I suspected based on research about perf penalty of the volatile keyword, but didn't test myself.

I'm reluctant to add this to penalty when only a small percent of users will ever take advantage of the dynamism that requires the hit.

The penalty has to be paid on the hot path of metrics, logs, and traces. See my old big post on metric systems for some ballpark figures on time to record measurements. I think the volatile keyword moves the needle in a meaningful way: https://opentelemetry.io/blog/2024/java-metric-systems-compared/#metrics-primer

@jack-berg
Copy link
Member

We could make it a setting. I.e. when you initialize the sdk, indicate whether you intend to use dynamic config. If so, we substitute the implementation to guarantee eventually consistency. If not, we use an implementation which is fast and doesnt waste perf checking for config changes that will never come.

@pellared
Copy link
Member

I'm reluctant to add this to penalty when only a small percent of users will ever take advantage of the dynamism that requires the hit.

Do you really believe that the nanoseconds overhead is something that would be noticeable? What is the sense of adding a feature there is no guarantee that it will work?

I suggest to check what is the speedup factor between non-volatile vs volatile in a end to end scenario (e.g. emitting a log record on a logger that is disabled).

@laurit
Copy link
Contributor

laurit commented Sep 29, 2025

keep in mind what jmh prints out

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Firstly I didn't try to run these or attempt to verify whey the numbers are like they are.
As far as I know on x86, where these tests are run, volatile reads don't use a barrier. Based on that I'd guess that the difference in read only perf is a not caused directly by volatile reads being slower but rather some sort of compiler optimization. Probably for the non volatile case the compiler reads the state once and keeps it in the register while the volatile case keeps rereading the state. Or perhaps the compiler does something even more clever. If that turns out to be so then IMO this test would really be invalid. It doesn't realistically reflect the logging instrumentation reading a configuration flag as reading that flag won't happen in a loop that could be unrolled and optimized like that. Inspect the generated asm might give a better understanding on what is different for these 2 tests. I believe that adding BlackHole.consumeCPU that simulates real work could help with building a more realistic test. Note that on other architectures volatile reads use a barrier.
https://shipilev.net/blog/2014/nanotrusting-nanotime/ states that

It chimes back to our observation that volatile write costs are dramatically amortized if we are not choking the system with them.

@jack-berg
Copy link
Member

jack-berg commented Sep 29, 2025

Do you really believe that the nanoseconds overhead is something that would be noticeable? What is the sense of adding a feature there is no guarantee that it will work?

Yes especially for metrics. Perf arguments are brought up as reasons not to use otel.

As for the guaranteed to work argument:

  • this is an experimental feature and I've maintained the position that if anyone actually observed the updates are not occurring in the real world, we should adjust
  • I offered a potential solution above which is guaranteed to work and doesn't sacrifice perf for the vast majority of users who won't use this feature

@pellared
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants