Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the use of Rayon iterators #139011

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

Zoxc
Copy link
Contributor

@Zoxc Zoxc commented Mar 27, 2025

This removes the use of Rayon iterators and the use of the rustc-rayon crate. rustc-rayon-core is still used however.

In parallel loops, instead of a Rayon iterator a serial iterator are used to collect items into a Vec and we use a parallel loop over its elements using the new par_slice function which is built on rustc-rayon-core's join.

This change makes it easier to bring rustc-rayon-core in-tree.

Tests using 7 threads:

BenchmarkBeforeAfterBeforeAfterBeforeAfter
TimeTime%Physical MemoryPhysical Memory%Committed MemoryCommitted Memory%
🟣 clap:check0.4827s0.4828s 0.02%201.23 MiB201.31 MiB 0.04%279.03 MiB279.46 MiB 0.15%
🟣 hyper:check0.1443s0.1401s💚 -2.91%126.42 MiB126.70 MiB 0.22%199.79 MiB199.99 MiB 0.10%
🟣 regex:check0.3252s0.3065s💚 -5.78%161.87 MiB161.78 MiB -0.05%229.59 MiB230.23 MiB 0.28%
🟣 syn:check0.5845s0.5876s 0.53%197.01 MiB196.89 MiB -0.06%267.62 MiB267.47 MiB -0.06%
Total1.5367s1.5169s💚 -1.29%686.53 MiB686.68 MiB 0.02%976.04 MiB977.14 MiB 0.11%
Summary1.0000s0.9796s💚 -2.04%1 byte1.00 bytes 0.04%1 byte1.00 bytes 0.12%
BenchmarkBeforeAfterBeforeAfterBeforeAfter
TimeTime%Physical MemoryPhysical Memory%Committed MemoryCommitted Memory%
🟠 clap:debug1.6371s1.6529s 0.96%395.58 MiB396.21 MiB 0.16%460.98 MiB461.52 MiB 0.12%
🟠 hyper:debug0.3248s0.3210s💚 -1.16%155.16 MiB155.19 MiB 0.02%219.21 MiB219.30 MiB 0.04%
🟠 regex:debug1.0148s0.9929s💚 -2.16%297.96 MiB295.07 MiB -0.97%354.53 MiB351.58 MiB -0.83%
🟠 syn:debug1.3614s1.3717s 0.76%319.10 MiB321.19 MiB 0.65%378.90 MiB381.27 MiB 0.62%
Total4.3381s4.3386s 0.01%1.14 GiB1.14 GiB -0.01%1.38 GiB1.38 GiB 0.00%
Summary1.0000s0.9960s -0.40%1 byte1.00 bytes -0.03%1 byte1.00 bytes -0.01%

@rustbot
Copy link
Collaborator

rustbot commented Mar 27, 2025

r? @estebank

rustbot has assigned @estebank.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added A-tidy Area: The tidy tool S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 27, 2025
@rustbot
Copy link
Collaborator

rustbot commented Mar 27, 2025

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

The list of allowed third-party dependencies may have been modified! You must ensure that any new dependencies have compatible licenses before merging.

cc @davidtwco, @wesleywiser

These commits modify the Cargo.lock file. Unintentional changes to Cargo.lock can be introduced when switching branches and rebasing PRs.

If this was unintentional then you should revert the changes before this PR is merged.
Otherwise, you can ignore this comment.

@rust-log-analyzer

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Mar 27, 2025

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

@rust-log-analyzer

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Mar 27, 2025

The Miri subtree was changed

cc @rust-lang/miri

@oli-obk
Copy link
Contributor

oli-obk commented Mar 27, 2025

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 27, 2025
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 27, 2025
Remove the use of Rayon iterators

This removes the use of Rayon iterators and the use of the `rustc-rayon` crate.  `rustc-rayon-core` is still used however.

In parallel loops, instead of a Rayon iterator a serial iterator are used to collect items into a `Vec` and we use a parallel loop over its elements using the new `par_slice` function which is built on `rustc-rayon-core`'s `join`.

This change makes it easier to bring `rustc-rayon-core` in-tree.

Tests using 7 threads:
<table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th><td align="right">Physical Memory</td><td align="right">Physical Memory</td><td align="right">%</th><td align="right">Committed Memory</td><td align="right">Committed Memory</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check</td><td align="right">0.4827s</td><td align="right">0.4828s</td><td align="right"> 0.02%</td><td align="right">201.23 MiB</td><td align="right">201.31 MiB</td><td align="right"> 0.04%</td><td align="right">279.03 MiB</td><td align="right">279.46 MiB</td><td align="right"> 0.15%</td></tr><tr><td>🟣 <b>hyper</b>:check</td><td align="right">0.1443s</td><td align="right">0.1401s</td><td align="right">💚  -2.91%</td><td align="right">126.42 MiB</td><td align="right">126.70 MiB</td><td align="right"> 0.22%</td><td align="right">199.79 MiB</td><td align="right">199.99 MiB</td><td align="right"> 0.10%</td></tr><tr><td>🟣 <b>regex</b>:check</td><td align="right">0.3252s</td><td align="right">0.3065s</td><td align="right">💚  -5.78%</td><td align="right">161.87 MiB</td><td align="right">161.78 MiB</td><td align="right"> -0.05%</td><td align="right">229.59 MiB</td><td align="right">230.23 MiB</td><td align="right"> 0.28%</td></tr><tr><td>🟣 <b>syn</b>:check</td><td align="right">0.5845s</td><td align="right">0.5876s</td><td align="right"> 0.53%</td><td align="right">197.01 MiB</td><td align="right">196.89 MiB</td><td align="right"> -0.06%</td><td align="right">267.62 MiB</td><td align="right">267.47 MiB</td><td align="right"> -0.06%</td></tr><tr><td>Total</td><td align="right">1.5367s</td><td align="right">1.5169s</td><td align="right">💚  -1.29%</td><td align="right">686.53 MiB</td><td align="right">686.68 MiB</td><td align="right"> 0.02%</td><td align="right">976.04 MiB</td><td align="right">977.14 MiB</td><td align="right"> 0.11%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9796s</td><td align="right">💚  -2.04%</td><td align="right">1 byte</td><td align="right">1.00 bytes</td><td align="right"> 0.04%</td><td align="right">1 byte</td><td align="right">1.00 bytes</td><td align="right"> 0.12%</td></tr></table>

<table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th><td align="right">Physical Memory</td><td align="right">Physical Memory</td><td align="right">%</th><td align="right">Committed Memory</td><td align="right">Committed Memory</td><td align="right">%</th></tr><tr><td>🟠 <b>clap</b>:debug</td><td align="right">1.6371s</td><td align="right">1.6529s</td><td align="right"> 0.96%</td><td align="right">395.58 MiB</td><td align="right">396.21 MiB</td><td align="right"> 0.16%</td><td align="right">460.98 MiB</td><td align="right">461.52 MiB</td><td align="right"> 0.12%</td></tr><tr><td>🟠 <b>hyper</b>:debug</td><td align="right">0.3248s</td><td align="right">0.3210s</td><td align="right">💚  -1.16%</td><td align="right">155.16 MiB</td><td align="right">155.19 MiB</td><td align="right"> 0.02%</td><td align="right">219.21 MiB</td><td align="right">219.30 MiB</td><td align="right"> 0.04%</td></tr><tr><td>🟠 <b>regex</b>:debug</td><td align="right">1.0148s</td><td align="right">0.9929s</td><td align="right">💚  -2.16%</td><td align="right">297.96 MiB</td><td align="right">295.07 MiB</td><td align="right"> -0.97%</td><td align="right">354.53 MiB</td><td align="right">351.58 MiB</td><td align="right"> -0.83%</td></tr><tr><td>🟠 <b>syn</b>:debug</td><td align="right">1.3614s</td><td align="right">1.3717s</td><td align="right"> 0.76%</td><td align="right">319.10 MiB</td><td align="right">321.19 MiB</td><td align="right"> 0.65%</td><td align="right">378.90 MiB</td><td align="right">381.27 MiB</td><td align="right"> 0.62%</td></tr><tr><td>Total</td><td align="right">4.3381s</td><td align="right">4.3386s</td><td align="right"> 0.01%</td><td align="right">1.14 GiB</td><td align="right">1.14 GiB</td><td align="right"> -0.01%</td><td align="right">1.38 GiB</td><td align="right">1.38 GiB</td><td align="right"> 0.00%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9960s</td><td align="right"> -0.40%</td><td align="right">1 byte</td><td align="right">1.00 bytes</td><td align="right"> -0.03%</td><td align="right">1 byte</td><td align="right">1.00 bytes</td><td align="right"> -0.01%</td></tr></table>
@bors
Copy link
Collaborator

bors commented Mar 27, 2025

⌛ Trying commit 229e548 with merge 42fea88...


par_slice(&mut items, guard, |i| {
if let Err(err) = for_each(&*i) {
*error.lock() = Some(err);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s pre-existing and maybe benign but somewhat surprising, is it expected that we only handle a single error in this function? Either way, this looks to now return a random error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. It's only used with ErrorGuaranteed. All the errors are equivalent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Let's add that explanation as a comment.

@RalfJung
Copy link
Member

RalfJung commented Mar 27, 2025

In parallel loops, instead of a Rayon iterator a serial iterator are used to collect items into a Vec and we use a parallel loop over its elements using the new par_slice function which is built on rustc-rayon-core's join.

This change makes it easier to bring rustc-rayon-core in-tree.

That seems like a step backwards. Why is "bring rustc-rayon-core in-tree" so desirable that we should stop using well-tested and well-working ecosystem crates? I feel like there's some broader context here that the PR description fails to explain.

@bors
Copy link
Collaborator

bors commented Mar 27, 2025

☀️ Try build successful - checks-actions
Build commit: 42fea88 (42fea88737205de5b0f9a6ebbea8a2081f6134f3)

@rust-timer

This comment has been minimized.

@klensy
Copy link
Contributor

klensy commented Mar 27, 2025

well-tested and well-working ecosystem crates

But this isn't rayon, this is rustc-rayon, few years outdated fork with some custom patches.

@RalfJung
Copy link
Member

RalfJung commented Mar 27, 2025 via email

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (42fea88): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.3% [-0.3%, -0.3%] 1
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results (primary -1.2%, secondary 2.8%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.8% [2.8%, 2.8%] 1
Improvements ✅
(primary)
-1.2% [-1.2%, -1.2%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.2% [-1.2%, -1.2%] 1

Cycles

Results (secondary 3.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.0% [2.0%, 3.5%] 4
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 778.377s -> 776.378s (-0.26%)
Artifact size: 365.79 MiB -> 365.78 MiB (-0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 27, 2025
Comment on lines 735 to 749
IntoDynSyncSend(
tcx.dep_graph
.with_task(
dep_node,
tcx,
(
global_asm_config.clone(),
cgu.name(),
concurrency_limiter.acquire(tcx.dcx()),
),
module_codegen,
Some(rustc_middle::dep_graph::hash_result),
)
.0,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
IntoDynSyncSend(
tcx.dep_graph
.with_task(
dep_node,
tcx,
(
global_asm_config.clone(),
cgu.name(),
concurrency_limiter.acquire(tcx.dcx()),
),
module_codegen,
Some(rustc_middle::dep_graph::hash_result),
)
.0,
)
let (module, _) = tcx.dep_graph
.with_task(
dep_node,
tcx,
(global_asm_config.clone(), cgu.name(), concurrency_limiter.acquire(tcx.dcx())),
module_codegen,
Some(rustc_middle::dep_graph::hash_result),
);
IntoDynSyncSend(module)

reduces indentation and avoids rustfmt putting the tcx.sess.time() call on a separate indented line.

@cuviper
Copy link
Member

cuviper commented Mar 27, 2025

In parallel loops, instead of a Rayon iterator a serial iterator are used to collect items into a Vec and we use a parallel loop over its elements using the new par_slice function which is built on rustc-rayon-core's join.

Maybe we should have a bespoke lighter version of IntoParallelIterator that still lets some types work directly in this par_slice mode? Then you can have uses on &[T] and &mut [T] without any serial collect, but also for example on indexmap with its own Slice types.

@cuviper
Copy link
Member

cuviper commented Mar 27, 2025

I think you would just need:

trait ParSplit: IntoIterator + DynSend + Sized {
    fn len(&self) -> usize;
    fn split_at(self, at: usize) -> (Self, Self);
}

@Zoxc
Copy link
Contributor Author

Zoxc commented Apr 2, 2025

That seems like a step backwards. Why is "bring rustc-rayon-core in-tree" so desirable that we should stop using well-tested and well-working ecosystem crates?

I'm basically replacing the Rayon iterators with 36 lines of code here. It doesn't have high value in rustc's code base.

Maybe we should have a bespoke lighter version of IntoParallelIterator that still lets some types work directly in this par_slice mode?

Probably want some variant that would work with loops, not just join. Having loop support in the thread pool would allow work-stealing within the loop.

@RalfJung
Copy link
Member

RalfJung commented Apr 2, 2025

I'm basically replacing the Rayon iterators with 36 lines of code here. It doesn't have high value in rustc's code base.

Ah okay, I thought there's all the attached infrastructure (threadpool etc) that we'd then have to carry ourselves. But if we do that already anyway then sounds good to me.

@bjorn3
Copy link
Member

bjorn3 commented Apr 2, 2025

The threadpool is implemented by rayon-core, of which this PR still retains the rustc fork (rustc-rayon-core, necessary because upstream rayon doesn't support propagating thread local storage to the worker threads among other things).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tidy Area: The tidy tool S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.