Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: PeriodicReader::shutdown() deadlocks on current thread runtime. #2056

Open
inahga opened this issue Aug 26, 2024 · 7 comments
Open

[Bug]: PeriodicReader::shutdown() deadlocks on current thread runtime. #2056

inahga opened this issue Aug 26, 2024 · 7 comments
Assignees
Labels
A-metrics Area: issues related to metrics bug Something isn't working

Comments

@inahga
Copy link
Contributor

inahga commented Aug 26, 2024

What happened?

When using the SdkMeterProvider with a PeriodicReader on a tokio current_thread runtime, the program hangs when the SdkMeterProvider leaves scope.

Example reproducer: Take the registration_triggers_collection() test function in opentelemetry-sdk/src/metrics/periodic_reader.rs, and run it on a current_thread runtime:

    #[tokio::test]
    async fn registration_triggers_collection() {
        // Arrange
        let interval = std::time::Duration::from_millis(1);
        let exporter = InMemoryMetricsExporter::default();
        let reader = PeriodicReader::builder(exporter.clone(), runtime::Tokio)
            .with_interval(interval)
            .build();
        let (sender, receiver) = mpsc::channel();

        // Act
        let meter_provider = SdkMeterProvider::builder().with_reader(reader).build();
        let meter = meter_provider.meter("test");
        let _counter = meter
            .u64_observable_counter("testcounter")
            .with_callback(move |_| {
                sender.send(()).expect("channel should still be open");
            })
            .init();

        _ = meter_provider.force_flush();

        // Assert
        receiver
            .try_recv()
            .expect("message should be available in channel, indicating a collection occurred");
    }

The problematic code looks to be in PeriodicReader::shutdown(), where we send a shutdown signal and wait for receipt on a oneshot channel in a block_on statement. When executed in the context of a Drop, there is no runtime to advance the other side of the channel, so we hang indefinitely.

It's not a big deal, it can be worked around by using a multi-thread runtime, or by calling shutdown() ahead of time. It's just unexpected and easy to trip up on, since tokio tests default to a current thread runtime.

More than happy to fix it myself, but wanted to get maintainer input first. (It may be the right answer is to not use a current thread runtime?)

API Version

0.24.0

SDK Version

0.24.1

What Exporter(s) are you seeing the problem on?

N/A

Relevant log output

No response

@inahga inahga added bug Something isn't working triage:todo Needs to be traiged. labels Aug 26, 2024
@cijothomas
Copy link
Member

This could be some design bug in the metric aggregation. Related to BatchProcessing as well, as that is another place where OTel does background processing.

I don't have a good answer at the moment, but you can expect a solution soon. We just finished redesigning the metrics hot path (the aggregation path), and the next major item left is to revisit the periodicreader part.

We are exploring a redesign, where OTel will spin up its own background thread for doing metric collect, batch exporting(logs,spans), and not rely on async runtimes at all. Will be sharing some design ideas soon, would love to get your inputs there.

@cijothomas cijothomas self-assigned this Aug 29, 2024
@cijothomas cijothomas added A-metrics Area: issues related to metrics and removed triage:todo Needs to be traiged. labels Aug 29, 2024
@cijothomas
Copy link
Member

Could be the same: #868 (comment)

@philipglazman
Copy link

Running PeriodicReader with the actix-web runtime started with #[actix_web::main] also causes a deadlock. The tokio runtime started by actix-rt uses current thread.

@fraillt
Copy link
Contributor

fraillt commented Sep 20, 2024

The way I understand currently there are two options:

  1. use tokio multi-threaded executor (which is default for #[tokio::main] + Tokio runtime
  2. use any runtime, (or no runtime at all) + TokioCurrentThread runtime. (PeriodicReader::build make interval registration in same thread/task #2133 fix bug you for this to work)

@cijothomas
Copy link
Member

Added this to a test (obviously ignored to not make CI red)
#2147

@cijothomas
Copy link
Member

@inahga Could you review the approach in #2142 where we spin up a dedicated thread for metrics export in the background? I am trying to see if that'd be a good default than relying on async runtimes. (ability to bring own async runtimes like today is still planned to be a supported feature in the future)

@inahga
Copy link
Contributor Author

inahga commented Nov 14, 2024

Sorry for replying so late. The approach LGTM, it's probably how I would have done it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-metrics Area: issues related to metrics bug Something isn't working
Projects
None yet
Development

No branches or pull requests

9 participants
@cijothomas @fraillt @philipglazman @inahga and others