-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#1863] fix(server): Fail commitShuffle when write Operation Fails #1866
base: master
Are you sure you want to change the base?
Conversation
…Manager Write Fails
When the write operation fails and throws an exception, we need to make the |
When an exception happens during the flushing process, just let |
Hi made the changes in the PR. So now, when the
|
Test Results 2 647 files - 10 2 647 suites - 10 5h 28m 26s ⏱️ - 3m 31s For more details on these errors, see this check. Results for commit 402bf31. ± Comparison against base commit 7731998. |
I think you haven't fully understood this issue. A flush event corresponds to a specific For this issue, you'd better conduct some tests. |
Apologies, for my little context and understanding of the issue, being new to the repo, it's taking sometime to get a hang of things. As mentioned by you, that a flushEvent and commitShuffle is shuffle specific, so instead of a single global variable, created a global list: Added a UT too for it. Let me know if it makes sense to you. |
startTime = System.currentTimeMillis(); | ||
boolean writeSuccess = storageManager.write(storage, handler, event); | ||
if (!writeSuccess) { | ||
shuffleIdsWithWriteError.add(event.getShuffleId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is thread-safe.
It could be overwritten by other threads which may be successful.
You need to reconsider concurrency problems through this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delayed response, but here I used CopyOnWriteArrayList
for implementing shuffleIdsWithWriteError
, do you still think thread safety is a concern here, or do you think the other parts of the code are not thread safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please help to review this.Thx.@maobaolong
What changes were proposed in this pull request?
This PR lets the method
ShuffleTaskManager.commitShuffle(..)
fail whenstorageManager.write(..)
method throws an exception.write
method in try-catch block for exception handling.write
method throws an exception orwrite
is not successful , then the writeError is set totrue
in the catch block.throw new EventRetryException
when writeSuccess is false in the constructor, as theShuffleFlushManager
is unable to initialize then.commitShuffle
, when thegetCommittedBlockIds
method gets called, it checks thewriteError
variable and throws an exception if it is found set.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing UTs.