[#2601] feat(spark): Overlapping decompression for shuffle read #2602

zuston · 2025-09-04T09:47:20Z

What changes were proposed in this pull request?

This PR is to introduce the overlapping decompression for the shuffle reading for the better shuffle speed.

Why are the changes needed?

When applying the #2598 into the benchmark of terasort 100g, I found some bottleneck for the decompression time in the read phase.

Based on a 100 GB Terasort benchmark, the results are impressive, reducing shuffle read time by 50% after applying this PR.

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Unit tests.

github-actions · 2025-09-04T10:35:47Z

Test Results

3 105 files +15 3 105 suites +15 6h 49m 38s ⏱️ +9s
1 200 tests + 2 1 199 ✅ + 3 1 💤 ±0 0 ❌ ±0
15 196 runs +30 15 181 ✅ +32 15 💤 ±0 0 ❌ ±0

Results for commit deb3a6c. ± Comparison against base commit 2a32171.

♻️ This comment has been updated with latest results.

jerqi · 2025-09-04T10:45:31Z

Spark is ok. But mr may need order.

zuston · 2025-09-04T11:39:47Z

Spark is ok. But mr may need order.

Thanks for sharing this, but the impl has ensured the order. BTW now this PR is only valid in spark

zuston added 4 commits September 4, 2025 17:30

[apache#2601] feat(spark): Overlapping decompression for shuffle read

ca6f64e

buffer reuse

271ffd1

fix

9d8bf8a

fix

11c0f9b

zuston linked an issue Sep 4, 2025 that may be closed by this pull request

[FEATURE] Overlapping decompression for shuffle read #2601

Closed

3 tasks

zuston requested review from xianjingfeng and jerqi September 4, 2025 10:07

zuston added 2 commits September 5, 2025 10:34

convert

1695375

fix checkstyle

deb3a6c

jerqi approved these changes Sep 5, 2025

View reviewed changes

zuston merged commit 1e48bc6 into apache:master Sep 5, 2025
80 of 81 checks passed

zuston deleted the readoverlapping branch September 5, 2025 09:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[#2601] feat(spark): Overlapping decompression for shuffle read #2602

[#2601] feat(spark): Overlapping decompression for shuffle read #2602

Uh oh!

zuston commented Sep 4, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 4, 2025 •

edited

Loading

Uh oh!

jerqi commented Sep 4, 2025

Uh oh!

zuston commented Sep 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

[#2601] feat(spark): Overlapping decompression for shuffle read #2602

[#2601] feat(spark): Overlapping decompression for shuffle read #2602

Uh oh!

Conversation

zuston commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

jerqi commented Sep 4, 2025

Uh oh!

zuston commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zuston commented Sep 4, 2025 •

edited

Loading

github-actions bot commented Sep 4, 2025 •

edited

Loading

zuston commented Sep 4, 2025 •

edited

Loading