Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1628] Avoid exception caused by calling release multiple times #1822

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

wForget
Copy link
Member

@wForget wForget commented Jun 24, 2024

What changes were proposed in this pull request?

Set internal buffer to null after release.

Why are the changes needed?

address #1628 (comment)

Fix: #1628

Does this PR introduce any user-facing change?

No.

How was this patch tested?

added unit tests

@rickyma
Copy link
Contributor

rickyma commented Jun 24, 2024

Why would release be called multiple times?

Copy link

Test Results

 2 641 files  ± 0   2 641 suites  ±0   5h 27m 46s ⏱️ -24s
   946 tests + 2     945 ✅ + 3   1 💤 ±0  0 ❌  - 1 
11 803 runs  +30  11 788 ✅ +31  15 💤 ±0  0 ❌  - 1 

Results for commit d1be2e4. ± Comparison against base commit 1482804.

@wForget
Copy link
Member Author

wForget commented Jun 24, 2024

Why would release be called multiple times?

I didn't find out why it was called multiple times in #1628. Apparently, in

if (shuffleIndexResult == null || shuffleIndexResult.isEmpty()) {
, it checks whether shuffleIndexResult is not empty. One of my guesses is that buffer is cleaned up when netty client is closed.

@rickyma rickyma requested review from advancedxy and zuston June 24, 2024 08:53
@advancedxy
Copy link
Contributor

Hmm, I'm not sure about this change.

If we can find the root cause why it's called multiple times and can reason that the multiple released is necessary, this change looks good to me.

However, we have not find the root cause. By simply avoid/reset the managedBuffer to null, we may hide a bigger problem and causing more subtle problems later.

Could we add more logging first to help debugging this issue? For example, we can log the double release in the releasing method with the stack trace?

@rickyma
Copy link
Contributor

rickyma commented Jun 25, 2024

We need to find out the root cause, which is better. Maybe it has something to do with Uniffle Hadoop related codes. Because this has only happened in RepartitionWithHadoopHybridStorageRssTest.

@wForget
Copy link
Member Author

wForget commented Jun 26, 2024

Thanks, that makes sense to me, I will try uploading GA failure logs for uniffle first.

@rickyma
Copy link
Contributor

rickyma commented Jun 30, 2024

Do we have any progress here? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Occasionally encountering IllegalReferenceCountException when releasing ShuffleIndexResult
3 participants