Race-condition on shutdown of current-thread runtime? #7056

thomaseizinger · 2024-12-30T10:22:33Z

Is your feature request related to a problem? Please describe.

In my application, I am spawning threads that use tokio's current-thread runtime. Only a single task is spawned onto this runtime via block_on. The remaining application is connected with this task via channels. When the application is shut down, there appears to be a race condition between cleaning up the runtime and polling the task that is spawned onto the runtime. In particular, my task ends up spamming "A Tokio 1.x context was found, but it is being shutdown." from polling the AsyncFd within that task.

This is the task that is running inside the runtime: https://github.com/firezone/firezone/blob/90bac881945ebe4f91f812672da2633cfe3c1079/rust/tun/src/unix.rs#L39-L107

What I don't understand is: How can the runtime shut down while the task is still being polled? I would expect that either the runtime shuts down and de-allocates the task or the runtime is still active and polls the task. Here is an example of where the thread is being spawned:

https://github.com/firezone/firezone/blob/90bac881945ebe4f91f812672da2633cfe3c1079/rust/connlib/clients/apple/src/tun.rs#L33-L53

Describe the solution you'd like

Runtime shutdown to be atomic, i.e. a task running within block_on to never be polled if the runtime is being shut down
A way of detecting runtime shutdown from within the task

Describe alternatives you've considered

The way I am solving this is by checking for the message of the io::Error (see firezone/firezone#7605). That isn't very clean and could break if that message every changes. It is additional API surface but perhaps a function could be exposed that matches an io::Error against it being the "shut down"-error? Or a function on Handle to detect whether the runtime is shutting down?

Additional context
~~Add any other context or screenshots about the feature request here.~~

The text was updated successfully, but these errors were encountered:

Darksonn · 2024-12-30T13:03:14Z

The way you shut down the runtime is by running the destructor of the Runtime. That cannot possibly happen with the send_recv_tun function since you're inside a call to block_on on the Runtime object. Are you sure that send_recv_tun is the function where this panic happens?

Are you perhaps using tokio::runtime::Handle::block_on anywhere?

thomaseizinger · 2024-12-30T13:15:46Z

Are you sure that send_recv_tun is the function where this panic happens?

There is no panic! Instead, the AsyncFd returns an io::Error with that error text.

thomaseizinger · 2024-12-30T13:22:03Z

Are you perhaps using tokio::runtime::Handle::block_on anywhere?

Not on that runtime. The linked async fn is the only thing that runs on that runtime.

Darksonn · 2024-12-30T13:24:40Z

Sorry, I meant the error not panic. But I still don't see how you are shutting down the runtime without first exiting from the block_on call. Where do you trigger shutdown?

Darksonn · 2024-12-30T13:29:58Z

Oh. I understand now.

The problem is that the AsyncFd is using the IO driver of the runtime it was created in. It stops working when that other runtime shuts down. But the future is executing under a different runtime that did not get shut down.

You should wait with creating the AsyncFd until you're inside the new runtime.

thomaseizinger · 2024-12-30T14:03:08Z

Oh. I understand now.

The problem is that the AsyncFd is using the IO driver of the runtime it was created in. It stops working when that other runtime shuts down. But the future is executing under a different runtime that did not get shut down.

You should wait with creating the AsyncFd until you're inside the new runtime.

Oh! I didn't realise that is what is happening. Thanks for pointing that out :)

thomaseizinger · 2024-12-30T14:05:00Z

This is a bit bizarre. Perhaps there should be some kind of warning about that? If each runtime where to carry a unique identifier, this could potentially be detected?

Darksonn · 2024-12-30T14:22:03Z

We could probably add some logic to detect this when creating the error message, so that we could emit a better error for this case.

Reading and writing to the TUN device within `connlib` happens in a separate thread. The task running within these threads is connected to the rest of `connlib` via channels. When the application shuts down, these threads also need to exit. Currently, we attempt to detect this from within the task when these channels close. It appears that there is a race condition here because we first attempt to read from the TUN device before reading from the channels. We treat read & write errors on the TUN device as non-fatal so we loop around and attempt to read from it again, causing an infinite-loop and log spam. To fix this, we swap the order in which we evaluate the two concurrent tasks: The first task to be polled is now the channel for outbound packets and only if that one is empty, we attempt to read new packets from the TUN device. This is also better from a backpressure point of view: We should attempt to flush out our local buffers of already processed packets before taking on "new work". As a defense-in-depth strategy, we also attempt to detect the particular error from the tokio runtime when it is being shut down and exit the task. Resolves: #7601. Related: tokio-rs/tokio#7056.

thomaseizinger added A-tokio Area: The main tokio crate C-feature-request Category: A feature request. labels Dec 30, 2024

thomaseizinger changed the title ~~Race-condition on shutdown of current-thread runtime~~ Race-condition on shutdown of current-thread runtime? Dec 30, 2024

thomaseizinger mentioned this issue Dec 30, 2024

fix(connlib): attempt to detect runtime shutdown within TUN task firezone/firezone#7605

Merged

Darksonn added the M-runtime Module: tokio/runtime label Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race-condition on shutdown of current-thread runtime? #7056

Race-condition on shutdown of current-thread runtime? #7056

thomaseizinger commented Dec 30, 2024

Darksonn commented Dec 30, 2024

thomaseizinger commented Dec 30, 2024

thomaseizinger commented Dec 30, 2024 •

edited

Loading

Darksonn commented Dec 30, 2024

Darksonn commented Dec 30, 2024

thomaseizinger commented Dec 30, 2024 •

edited

Loading

thomaseizinger commented Dec 30, 2024

Darksonn commented Dec 30, 2024

Race-condition on shutdown of current-thread runtime? #7056

Race-condition on shutdown of current-thread runtime? #7056

Comments

thomaseizinger commented Dec 30, 2024

Darksonn commented Dec 30, 2024

thomaseizinger commented Dec 30, 2024

thomaseizinger commented Dec 30, 2024 • edited Loading

Darksonn commented Dec 30, 2024

Darksonn commented Dec 30, 2024

thomaseizinger commented Dec 30, 2024 • edited Loading

thomaseizinger commented Dec 30, 2024

Darksonn commented Dec 30, 2024

thomaseizinger commented Dec 30, 2024 •

edited

Loading

thomaseizinger commented Dec 30, 2024 •

edited

Loading