Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage): upgrade tokio to 1.38.0 and simplify memory limiter notify #17528

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

wenym1
Copy link
Contributor

@wenym1 wenym1 commented Jul 2, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Previously due to the problem of tokio oneshot channel described in #6617, in the memory limiter, when we notify waiters, we will have to store the senders to notify and then notify senders after release the lock to avoid deadlock.

With tokio-rs/tokio#6558 gets merged and published in tokio 1.38.0, we don't need this workaround anymore. We can change to notify the sender within the lock, and on send error, we can release the quota without acquiring the lock to notify again. The following code snippet shows the changed logic.

fn may_notify_waiters(self: &Arc<Self>) {
    ...
    {
        // Previously have a vector to hold the tx before release the lock
        // let mut notify_waiters = vec![];

        let mut waiters = self.controller.lock();
        while let Some((_, quota)) = waiters.front() {
            ...
            let (tx, quota) = waiters.pop_front().unwrap();
            // Changed to release quota when returning with err
            if let Err(tracker) = tx.send(MemoryTracker::new(self.clone(), quota)) {
                tracker.release_quota_without_notify();
            }
        }
        drop(waiters);

        ...

        // Previously notify waiters after release the lock
        // for (tx, quota) in notify_waiters {
        //     let _ = tx.send(MemoryTracker::new(self.clone(), quota));
        // }
    }
}

The memory limiter is moved to a separate crate, so that we can write loom test on the memory limiter. A loom test test_memory_limiter_drop is added in this PR, which tests the scenario of future cancellation that can cause deadlock in previous tokio version. It's expected to fail in the previous tokio version 1.37.0, though it still gets passed. We may further investigate on why loom is not able to cover all the corner cases.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@wenym1 wenym1 requested a review from a team as a code owner July 2, 2024 07:05
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 5177 files.

Valid Invalid Ignored Fixed
2213 1 2963 0
Click to see the invalid file list
  • src/storage/hummock_memory_limiter/src/test.rs

}

fn may_notify_waiters(self: &Arc<Self>) {
// check `has_waiter` to avoid access lock every times drop `MemoryTracker`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only this method is changed compared to the original code in src/storage/hummock/utils.rs.

Changed logic is described as in the PR description.

wenym1 and others added 3 commits July 2, 2024 15:08
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should upgrade tokio in another PR first (and ideally run benchmarks/longevity tests first), because it might be one of the most critical dependency.

If you've read the CHANGELOG and are very confident it's a harmless trivial upgrade, that's also fine.

@BugenZhao
Copy link
Member

the problem of tokio oneshot channel described in #6617,

This doesn't seem to be the correct issue number?

@wenym1
Copy link
Contributor Author

wenym1 commented Jul 2, 2024

the problem of tokio oneshot channel described in #6617,

This doesn't seem to be the correct issue number?

In the bullet 9 of the Investigation section in the issue description, you mentioned a corner case that msg might be dropped in the oneshot sender side even if the send returns ok.

pub fn send(mut self, t: T) -> Result<(), T> {
    let inner = self.inner.take().unwrap();

    inner.value.with_mut(|ptr| unsafe {
        *ptr = Some(t);
    });

    if !inner.complete() {
        unsafe {
            return Err(inner.consume_value().unwrap());
        }
    }

-> receiver dropped here

    Ok(())
}

@BugenZhao
Copy link
Member

In the bullet 9 of the Investigation section in the issue description, you mentioned a corner case that msg might be dropped in the oneshot sender side even if the send returns ok.

Wow, have almost no impression of this. Still find it interesting when rereading it now. 😄

@wenym1
Copy link
Contributor Author

wenym1 commented Jul 2, 2024

Perhaps we should upgrade tokio in another PR first (and ideally run benchmarks/longevity tests first), because it might be one of the most critical dependency.

If you've read the CHANGELOG and are very confident it's a harmless trivial upgrade, that's also fine.

Make sense to me, I can have a separate PR for it. btw, I didn't see a place to specify the tokio version in any of the Cargo.toml, and the tokio version seems to be implicitly deduced from our set of dependencies.

Should we instead explicitly specify the tokio version somewhere in the Cargo.toml? Though it may have naming conflict with the current madsim-tokio if we directly name it tokio.

@xxchan
Copy link
Member

xxchan commented Jul 2, 2024

I didn't see a place to specify the tokio version in any of the Cargo.toml, and the tokio version seems to be implicitly deduced from our set of dependencies.

Yes, I also thought about this problem before. I think we do need to specify the version explicitly. (Perhaps we should use =version to avoid upgrade by mistake.)

It cannot be put in workspace Cargo.toml, because no crate directly use it and it won't have any effect. Perhaps we can specify it in src/cmd_all/Cargo.toml, or src/common/Cargo.toml (Actually any package is OK I guess...)

sth like

# why we need this: XXX
dummy-tokio-used-to-restrict-version = { version= "=1.38.0", package = "tokio" ...}

@wenym1
Copy link
Contributor Author

wenym1 commented Jul 2, 2024

I didn't see a place to specify the tokio version in any of the Cargo.toml, and the tokio version seems to be implicitly deduced from our set of dependencies.

Yes, I also thought about this problem before. I think we do need to specify the version explicitly. (Perhaps we should use =version to avoid upgrade by mistake.)

It cannot be put in workspace Cargo.toml, because no crate directly use it and it won't have any effect. Perhaps we can specify it in src/cmd_all/Cargo.toml, or src/common/Cargo.toml (Actually any package is OK I guess...)

sth like

# why we need this: XXX
dummy-tokio-used-to-restrict-version = { version= "=1.38.0", package = "tokio" ...}

#17536

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants