Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ring buffer deadlock #511

Merged
merged 3 commits into from
Oct 18, 2024

Conversation

IronCore864
Copy link
Contributor

@IronCore864 IronCore864 commented Oct 15, 2024

Fix an issue where the ring buffer might deadlock because rwlock and iterator mutex are acquired in a different order.

Fixes: #508

Notes:

  1. As discussed with Harry and Ben, rb.Write isn't adjusted because the rwlock and iterator mutex are acquired at different times.
  2. Besides an internal close function, an internal positions function is created for the same reason.
  3. The test case here is not added because it was supposed to be an easy way to reproduce the deadlock. Adding an additional benchmark test is not useful, and the deadlock issue can be caught by the TestDeadlock test case anyway.
  4. All three functions signalIterators, releaseIterators and removeIterator are adjusted (although technical only releaseIterators needed to be ) because they share a common naming pattern so it's nicer to share a common implementation pattern as well: all internal functions don't operate locks.

Benchmark test, manual test to reproduce the 3-way deadlock issue, and race test are all done and passed.

@IronCore864 IronCore864 marked this pull request as ready for review October 15, 2024 04:23
@benhoyt benhoyt requested a review from hpidcock October 15, 2024 22:25
Copy link
Contributor

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, thanks. Just a couple of nit comments to see if we can tidy up a couple of things.

internals/servicelog/ringbuffer.go Outdated Show resolved Hide resolved
internals/servicelog/ringbuffer.go Show resolved Hide resolved
@IronCore864
Copy link
Contributor Author

I have refactored according to the code review from Ben and Dima, and then the final tests with -race and manual deadlock testing passed successfully. @hpidcock please take a look.

Copy link
Contributor

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, and passes my deadlock tests locally.

@benhoyt
Copy link
Contributor

benhoyt commented Oct 18, 2024

I'm going to merge this optimistically so we can move ahead with c3cb594, but @hpidcock -- we'd still appreciate your review-after-the-fact when you get a chance.

@benhoyt benhoyt merged commit ae91836 into canonical:master Oct 18, 2024
18 checks passed
@hpidcock
Copy link
Member

I'm going to merge this optimistically so we can move ahead with c3cb594, but @hpidcock -- we'd still appreciate your review-after-the-fact when you get a chance.

LGTM. Looks like all the previous locking behaviour is preserved with the consistent ordering to avoid deadlocks.

@IronCore864 IronCore864 deleted the fix-ringbuffer-deadlock branch November 14, 2024 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TestDeadlock fails intermittently in local environment with go test -races
4 participants