-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the use of Rayon iterators #139011
base: master
Are you sure you want to change the base?
Conversation
Some changes occurred in compiler/rustc_codegen_ssa The list of allowed third-party dependencies may have been modified! You must ensure that any new dependencies have compatible licenses before merging. These commits modify the If this was unintentional then you should revert the changes before this PR is merged. |
This comment has been minimized.
This comment has been minimized.
Some changes occurred in compiler/rustc_codegen_cranelift cc @bjorn3 |
This comment has been minimized.
This comment has been minimized.
The Miri subtree was changed cc @rust-lang/miri |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Remove the use of Rayon iterators This removes the use of Rayon iterators and the use of the `rustc-rayon` crate. `rustc-rayon-core` is still used however. In parallel loops, instead of a Rayon iterator a serial iterator are used to collect items into a `Vec` and we use a parallel loop over its elements using the new `par_slice` function which is built on `rustc-rayon-core`'s `join`. This change makes it easier to bring `rustc-rayon-core` in-tree. Tests using 7 threads: <table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th><td align="right">Physical Memory</td><td align="right">Physical Memory</td><td align="right">%</th><td align="right">Committed Memory</td><td align="right">Committed Memory</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check</td><td align="right">0.4827s</td><td align="right">0.4828s</td><td align="right"> 0.02%</td><td align="right">201.23 MiB</td><td align="right">201.31 MiB</td><td align="right"> 0.04%</td><td align="right">279.03 MiB</td><td align="right">279.46 MiB</td><td align="right"> 0.15%</td></tr><tr><td>🟣 <b>hyper</b>:check</td><td align="right">0.1443s</td><td align="right">0.1401s</td><td align="right">💚 -2.91%</td><td align="right">126.42 MiB</td><td align="right">126.70 MiB</td><td align="right"> 0.22%</td><td align="right">199.79 MiB</td><td align="right">199.99 MiB</td><td align="right"> 0.10%</td></tr><tr><td>🟣 <b>regex</b>:check</td><td align="right">0.3252s</td><td align="right">0.3065s</td><td align="right">💚 -5.78%</td><td align="right">161.87 MiB</td><td align="right">161.78 MiB</td><td align="right"> -0.05%</td><td align="right">229.59 MiB</td><td align="right">230.23 MiB</td><td align="right"> 0.28%</td></tr><tr><td>🟣 <b>syn</b>:check</td><td align="right">0.5845s</td><td align="right">0.5876s</td><td align="right"> 0.53%</td><td align="right">197.01 MiB</td><td align="right">196.89 MiB</td><td align="right"> -0.06%</td><td align="right">267.62 MiB</td><td align="right">267.47 MiB</td><td align="right"> -0.06%</td></tr><tr><td>Total</td><td align="right">1.5367s</td><td align="right">1.5169s</td><td align="right">💚 -1.29%</td><td align="right">686.53 MiB</td><td align="right">686.68 MiB</td><td align="right"> 0.02%</td><td align="right">976.04 MiB</td><td align="right">977.14 MiB</td><td align="right"> 0.11%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9796s</td><td align="right">💚 -2.04%</td><td align="right">1 byte</td><td align="right">1.00 bytes</td><td align="right"> 0.04%</td><td align="right">1 byte</td><td align="right">1.00 bytes</td><td align="right"> 0.12%</td></tr></table> <table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th><td align="right">Physical Memory</td><td align="right">Physical Memory</td><td align="right">%</th><td align="right">Committed Memory</td><td align="right">Committed Memory</td><td align="right">%</th></tr><tr><td>🟠 <b>clap</b>:debug</td><td align="right">1.6371s</td><td align="right">1.6529s</td><td align="right"> 0.96%</td><td align="right">395.58 MiB</td><td align="right">396.21 MiB</td><td align="right"> 0.16%</td><td align="right">460.98 MiB</td><td align="right">461.52 MiB</td><td align="right"> 0.12%</td></tr><tr><td>🟠 <b>hyper</b>:debug</td><td align="right">0.3248s</td><td align="right">0.3210s</td><td align="right">💚 -1.16%</td><td align="right">155.16 MiB</td><td align="right">155.19 MiB</td><td align="right"> 0.02%</td><td align="right">219.21 MiB</td><td align="right">219.30 MiB</td><td align="right"> 0.04%</td></tr><tr><td>🟠 <b>regex</b>:debug</td><td align="right">1.0148s</td><td align="right">0.9929s</td><td align="right">💚 -2.16%</td><td align="right">297.96 MiB</td><td align="right">295.07 MiB</td><td align="right"> -0.97%</td><td align="right">354.53 MiB</td><td align="right">351.58 MiB</td><td align="right"> -0.83%</td></tr><tr><td>🟠 <b>syn</b>:debug</td><td align="right">1.3614s</td><td align="right">1.3717s</td><td align="right"> 0.76%</td><td align="right">319.10 MiB</td><td align="right">321.19 MiB</td><td align="right"> 0.65%</td><td align="right">378.90 MiB</td><td align="right">381.27 MiB</td><td align="right"> 0.62%</td></tr><tr><td>Total</td><td align="right">4.3381s</td><td align="right">4.3386s</td><td align="right"> 0.01%</td><td align="right">1.14 GiB</td><td align="right">1.14 GiB</td><td align="right"> -0.01%</td><td align="right">1.38 GiB</td><td align="right">1.38 GiB</td><td align="right"> 0.00%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9960s</td><td align="right"> -0.40%</td><td align="right">1 byte</td><td align="right">1.00 bytes</td><td align="right"> -0.03%</td><td align="right">1 byte</td><td align="right">1.00 bytes</td><td align="right"> -0.01%</td></tr></table>
|
||
par_slice(&mut items, guard, |i| { | ||
if let Err(err) = for_each(&*i) { | ||
*error.lock() = Some(err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s pre-existing and maybe benign but somewhat surprising, is it expected that we only handle a single error in this function? Either way, this looks to now return a random error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. It's only used with ErrorGuaranteed
. All the errors are equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Let's add that explanation as a comment.
That seems like a step backwards. Why is "bring rustc-rayon-core in-tree" so desirable that we should stop using well-tested and well-working ecosystem crates? I feel like there's some broader context here that the PR description fails to explain. |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
But this isn't rayon, this is rustc-rayon, few years outdated fork with some custom patches. |
So "rayon iterators" are not the iterators from rayon?
Given that a bunch of people got auto-pinged here, it would be good to make the PR description understandable to the general compiler team, not just people with intimate knowledge of parallel-rustc. :)
|
Finished benchmarking commit (42fea88): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary -1.2%, secondary 2.8%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (secondary 3.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 778.377s -> 776.378s (-0.26%) |
IntoDynSyncSend( | ||
tcx.dep_graph | ||
.with_task( | ||
dep_node, | ||
tcx, | ||
( | ||
global_asm_config.clone(), | ||
cgu.name(), | ||
concurrency_limiter.acquire(tcx.dcx()), | ||
), | ||
module_codegen, | ||
Some(rustc_middle::dep_graph::hash_result), | ||
) | ||
.0, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IntoDynSyncSend( | |
tcx.dep_graph | |
.with_task( | |
dep_node, | |
tcx, | |
( | |
global_asm_config.clone(), | |
cgu.name(), | |
concurrency_limiter.acquire(tcx.dcx()), | |
), | |
module_codegen, | |
Some(rustc_middle::dep_graph::hash_result), | |
) | |
.0, | |
) | |
let (module, _) = tcx.dep_graph | |
.with_task( | |
dep_node, | |
tcx, | |
(global_asm_config.clone(), cgu.name(), concurrency_limiter.acquire(tcx.dcx())), | |
module_codegen, | |
Some(rustc_middle::dep_graph::hash_result), | |
); | |
IntoDynSyncSend(module) |
reduces indentation and avoids rustfmt putting the tcx.sess.time()
call on a separate indented line.
Maybe we should have a bespoke lighter version of |
I think you would just need: trait ParSplit: IntoIterator + DynSend + Sized {
fn len(&self) -> usize;
fn split_at(self, at: usize) -> (Self, Self);
} |
I'm basically replacing the Rayon iterators with 36 lines of code here. It doesn't have high value in rustc's code base.
Probably want some variant that would work with loops, not just |
Ah okay, I thought there's all the attached infrastructure (threadpool etc) that we'd then have to carry ourselves. But if we do that already anyway then sounds good to me. |
The threadpool is implemented by rayon-core, of which this PR still retains the rustc fork (rustc-rayon-core, necessary because upstream rayon doesn't support propagating thread local storage to the worker threads among other things). |
This removes the use of Rayon iterators and the use of the
rustc-rayon
crate.rustc-rayon-core
is still used however.In parallel loops, instead of a Rayon iterator a serial iterator are used to collect items into a
Vec
and we use a parallel loop over its elements using the newpar_slice
function which is built onrustc-rayon-core
'sjoin
.This change makes it easier to bring
rustc-rayon-core
in-tree.Tests using 7 threads: