Skip to content

Change metric.Producer to be an Option on Reader#4346

Merged
MrAlias merged 3 commits into
open-telemetry:mainfrom
dashpole:prototype_metricreader_args
Aug 11, 2023
Merged

Change metric.Producer to be an Option on Reader#4346
MrAlias merged 3 commits into
open-telemetry:mainfrom
dashpole:prototype_metricreader_args

Conversation

@dashpole
Copy link
Copy Markdown
Collaborator

@dashpole dashpole commented Jul 20, 2023

Updates the MetricProducer implementation to comply with open-telemetry/opentelemetry-specification#3613

@codecov
Copy link
Copy Markdown

codecov Bot commented Jul 20, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.8%. Comparing base (7b9fb7a) to head (d204195).
Report is 1497 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #4346     +/-   ##
=======================================
- Coverage   78.8%   78.8%   -0.1%     
=======================================
  Files        253     253             
  Lines      20644   20630     -14     
=======================================
- Hits       16281   16267     -14     
  Misses      4014    4014             
  Partials     349     349             
Files with missing lines Coverage Δ
sdk/metric/manual_reader.go 74.1% <100.0%> (-2.4%) ⬇️
sdk/metric/periodic_reader.go 85.0% <100.0%> (+0.1%) ⬆️
sdk/metric/reader.go 100.0% <100.0%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dashpole dashpole force-pushed the prototype_metricreader_args branch from 4cae38d to 1740d9c Compare July 20, 2023 15:53
@dashpole
Copy link
Copy Markdown
Collaborator Author

@MrAlias The register pattern came from your comment here: open-telemetry/opentelemetry-specification#2722 (comment)

@dashpole dashpole closed this Jul 20, 2023
@dashpole dashpole reopened this Aug 10, 2023
@dashpole dashpole force-pushed the prototype_metricreader_args branch 3 times, most recently from 3f19cae to 60473b7 Compare August 10, 2023 19:38
@dashpole dashpole changed the title Prototype for metric.Producer as an argument to Reader Change metric.Producer to be an Option on Reader Aug 10, 2023
@dashpole dashpole marked this pull request as ready for review August 10, 2023 19:39
@pellared
Copy link
Copy Markdown
Member

pellared commented Aug 11, 2023

I think there is a potential race condition (which existed before) when Shutdown is invoked during Collect.

The reader is initialized with some externalProducers.

  1. Goroutine 1 calling Collect reaches https://github.com/dashpole/opentelemetry-go/blob/60473b75286b9b2e87d6021db1d9056565f577cf/sdk/metric/manual_reader.go#L136 (the r.externalProducers containes some elements)
  2. Goroutine 2 calls Shutdown and finishes -> r.externalProducers is set to nil (inside a lock)
  3. Goroutine 1 continues and enumerates range mr.externalProducers without any synchronization

The same problem could occur if one manually invokes Collect on PeriodicReader.

How to fix it? I am not sure 😉

My initial thought is to change mu sync.Mutex to mu sync.RWMutex and using mu.RLock in Collect. Shutdown would clear the state when no collect is running. Then we could also use the ctx passed to Shutdown (which is currently not used) to make sure that the client have control how long he waits until Shutdown is completed. This would be a "graceful shutdown".

My second idea is to remove mu sync.Mutex and isShutdown bool, and replace externalProducers []Producer with externalProducers sync.Pointer[[]Producer]. Then we will have a lock-free implementation. This would be a kind of "force shutdown". This would be more performant (consume less resources). The only drawback that I could think of is that Shutdown is not that well synchronized: when Shutdown finishes, they still may be some processing in place. When metricProvider.Shutdown() returns they still may be some manual Collects running. However, the code that would be running would be the one called by the user - the SDK's period reader collects would be already done thanks to https://github.com/dashpole/opentelemetry-go/blob/60473b75286b9b2e87d6021db1d9056565f577cf/sdk/metric/periodic_reader.go#L330. If the caller calls Collect in a goroutine - they should make sure it finishes. Personally, I would lean to this solution. While in "business software I would say that it is too complex, for us I think minimalization of resource consumption is very important.

This comment is not blocking this PR as this PR does not introduce this problem. I just noticed it when reviewing.

@dashpole dashpole force-pushed the prototype_metricreader_args branch from 60473b7 to 56a5ef9 Compare August 11, 2023 13:44
@dashpole
Copy link
Copy Markdown
Collaborator Author

I'm surprised this isn't caught by our concurrency tests...

@dashpole
Copy link
Copy Markdown
Collaborator Author

dashpole commented Aug 11, 2023

I "fixed" our concurrency tests to expose the race condition.

@dashpole dashpole force-pushed the prototype_metricreader_args branch from 7b2dbb7 to 6b9f2ed Compare August 11, 2023 15:35
Comment thread sdk/metric/reader_test.go
@dashpole
Copy link
Copy Markdown
Collaborator Author

I believe switching back to atomic.Value to hold producers fixed the race condition.

Copy link
Copy Markdown
Contributor

@MrAlias MrAlias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

@MrAlias MrAlias merged commit fe51391 into open-telemetry:main Aug 11, 2023
@pellared
Copy link
Copy Markdown
Member

I believe switching back to atomic.Value to hold producers fixed the race condition.

👍 PS. I have not noticed that it was atomic.Value before 😬 Anyway, I am happy that I was able to find the issue 😄

@XSAM XSAM added this to the Old Untracked PRs milestone Nov 7, 2024
@MrAlias MrAlias added the area:metrics Part of OpenTelemetry Metrics label Jun 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:metrics Part of OpenTelemetry Metrics

Projects

No open projects

Development

Successfully merging this pull request may close these issues.

4 participants