Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#2279] improvement(spark): Trigger the upstream rewrite when the read stage fails #2280

Closed
wants to merge 1 commit into from

Conversation

yl09099
Copy link
Contributor

@yl09099 yl09099 commented Dec 8, 2024

What changes were proposed in this pull request?

If the current Reader fails to obtain Shuffle data, it does not trigger the upstream Stage to rewrite the data. If a Shuffle Server fails, it does not trigger Stage retry.

Why are the changes needed?

Fix: #2279

Does this PR introduce any user-facing change?

No.

How was this patch tested?

UT.

Copy link

github-actions bot commented Dec 8, 2024

Test Results

 2 509 files   -   457   2 509 suites   - 457   3h 49m 2s ⏱️ - 2h 39m 46s
   914 tests  -   182     912 ✅  -   182   2 💤 ±0  0 ❌ ±0 
11 980 runs   - 1 755  11 950 ✅  - 1 755  30 💤 ±0  0 ❌ ±0 

Results for commit 5fbee9a. ± Comparison against base commit 4ce1aa8.

This pull request removes 182 tests.
org.apache.spark.shuffle.DelegationRssShuffleManagerTest ‑ testCreateFallback
org.apache.spark.shuffle.DelegationRssShuffleManagerTest ‑ testCreateInDriver
org.apache.spark.shuffle.DelegationRssShuffleManagerTest ‑ testCreateInDriverDenied
org.apache.spark.shuffle.DelegationRssShuffleManagerTest ‑ testCreateInExecutor
org.apache.spark.shuffle.DelegationRssShuffleManagerTest ‑ testTryAccessCluster
org.apache.spark.shuffle.FunctionUtilsTests ‑ testOnceFunction0
org.apache.spark.shuffle.RssShuffleManagerTest ‑ testCreateShuffleManagerServer
org.apache.spark.shuffle.RssShuffleManagerTest ‑ testGetDataDistributionType
org.apache.spark.shuffle.RssShuffleManagerTest ‑ testRssShuffleManagerInterface
org.apache.spark.shuffle.RssShuffleManagerTest ‑ testRssShuffleManagerRegisterShuffle{int}[1]
…

@jerqi jerqi requested a review from advancedxy December 9, 2024 02:46
@jerqi
Copy link
Contributor

jerqi commented Dec 9, 2024

Could you add a ut for this?

}
});
}

public void setClearedMapTrackerBlock(boolean isCleared) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give a better name for this?

@zuston zuston changed the title [Improvement] The upstream rewrite is triggered when the read stage fails. [#2279] improvement(spark): Trigger the upstream rewrite when the read stage fails Dec 9, 2024
code = RssProtos.StatusCode.SUCCESS;
status.incPartitionFetchFailure(stageAttempt, partitionId);
int fetchFailureNum = status.getPartitionFetchFailureNum(stageAttempt, partitionId);
if (fetchFailureNum >= shuffleManager.getMaxFetchFailures()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removing this logic?

@yl09099 yl09099 closed this Dec 9, 2024
@yl09099 yl09099 deleted the uniffle-2279 branch December 9, 2024 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] The upstream rewrite is triggered when the read stage fails.
3 participants