-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#2279] improvement(spark): Trigger the upstream rewrite when the read stage fails #2281
Conversation
Wait a minute. I'm not ready for this. |
If this is not ready to review, you can mark this as draft. |
ok |
@@ -553,16 +576,36 @@ public void incPartitionFetchFailure(int stageAttempt, int partition) { | |||
}); | |||
} | |||
|
|||
public int getPartitionFetchFailureNum(int stageAttempt, int partition) { | |||
public boolean getPartitionFetchFailureNum( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method has a weird return value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method has a weird return value.
The number of failures is compared to the maximum number of failures in the configuration, and I moved the comparison logic to the internal implementation, so I only need to return whether fetchfailed is thrown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not proper name if we don't change the method name.
} | ||
}); | ||
} | ||
|
||
public void setClearedMapTrackerBlock(boolean isCleared) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you have a better method name?
@@ -489,10 +509,13 @@ private static class RssShuffleStatus { | |||
private final ReentrantReadWriteLock.WriteLock writeLock = lock.writeLock(); | |||
private final int[] partitions; | |||
private int stageAttempt; | |||
// Whether the Shuffle result has been cleared for the current number of attempts. | |||
private boolean isClearedMapTrackerBlock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isClearedMapTrackerBlock
-> hasClearedMapTrackerBlock
} | ||
}); | ||
} | ||
|
||
public void clearedMapTrackerBlock(boolean isCleared) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we don't the parameter isCleared
. It's rebundant.
}); | ||
} | ||
|
||
public boolean isClearedMapTrackerBlock() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isClearedMapTrackerBlock
-> hasClearedMapTrackerBlock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above has been changed.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2281 +/- ##
============================================
+ Coverage 51.78% 52.30% +0.52%
- Complexity 2966 3386 +420
============================================
Files 479 518 +39
Lines 22566 27900 +5334
Branches 2068 2628 +560
============================================
+ Hits 11686 14594 +2908
- Misses 10140 12345 +2205
- Partials 740 961 +221 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
cc @maobaolong |
What changes were proposed in this pull request?
If the current Reader fails to obtain Shuffle data, it does not trigger the upstream Stage to rewrite the data. If a Shuffle Server fails, it does not trigger Stage retry.
Why are the changes needed?
Fix: #2279
Does this PR introduce any user-facing change?
No.
How was this patch tested?
UT.