Fix quadratic MIR blowup for large vec![] expressions with Drop-implementing elements in async functions#154720
Fix quadratic MIR blowup for large vec![] expressions with Drop-implementing elements in async functions#154720jakubadamw wants to merge 4 commits intorust-lang:mainfrom
vec![] expressions with Drop-implementing elements in async functions#154720Conversation
|
Some changes occurred in src/tools/compiletest cc @jieyouxu |
|
rustbot has assigned @jdonszelmann. Use Why was this reviewer chosen?The reviewer was selected based on:
|
This comment has been minimized.
This comment has been minimized.
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Fix quadratic MIR blowup for large `vec![]` expressions with `Drop`-implementing elements in async functions
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (ab08649): comparison URL. Overall result: ❌✅ regressions and improvements - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (secondary 0.7%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 2.2%, secondary 0.8%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (secondary -0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 487.024s -> 487.341s (0.07%) |
This comment has been minimized.
This comment has been minimized.
Add a `//@ timeout: <seconds>` directive that lets individual tests override the default compilation timeout. This is useful for regression tests that guard against quadratic or otherwise slow compilation, where the default timeout is too generous to catch a regression.
For coroutines, `StorageDead` operations were unconditionally added to the unwind path. For the included test, each array element's unwind chain includes unique `StorageDead` locals, preventing tail-sharing between chains. This produces O(n^2) MIR basic blocks for large aggregate initialisers (e.g. sum(1 to n) = ~50M blocks for n = 10,000). `StorageDead` on the unwind path is needed for correctness: without it, the coroutine drop handler may re-drop locals already dropped during unwinding. However, this can only happen when the coroutine body contains yield points, because locals can only span a yield if one exists. Fix: scan the THIR at construction time for `Yield` expressions. When none are found (for example, in an async function with no `.await`), set `storage_dead_on_unwind = false` on all drops, skipping `StorageDead` on the unwind path, which makes the MIR block count linear rather than quadratic.
…arge `Vec<String>` construction Add a run-make test that generates and compiles an `async` function returning a `vec!["string".to_string(), …]` with 10,000 elements. Without the preceding fix, this example would produce tens of millions of MIR basic blocks and take minutes to compile. With the fix in place, it compiles in a few seconds. The timeout specified in the test makes it so it would fail before the fix applied.
…n expected change
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
|
not sure I'm the right reviewer for this one, @rustbot reroll |
|
sorry, me neither, @rustbot reroll |
In #115327, constructing a large
Vec<String>inside anasync fnused to cause a quadratic blowup in the number of MIR basic blocks, making compilation very slow (minutes for tens of thousands items). Example:The root cause was that
StorageDeadinstructions on the unwind path prevented tail-sharing of drop chains in coroutines. Each element in the vec initialiser got its own unique unwind chain because of distinctStorageDeadlocals, leading to ~n² basic blocks.The fix checks whether the coroutine body actually contains any yield points. If there are no yields (e.g. an
async fnwith no.await), no locals can be live across a yield, soStorageDeadon the unwind path can safely be skipped – which restores the tail-sharing and brings MIR size back to linear.Now,
asyncfunctions with noawaitaren’t perhaps necessarily that common, but are sort of inevitable in the case of traits defining a method asasyncwith implementations deciding not to have any yield points for whatever reason.This PR also adds support for a
//@ timeout: Ndirective to compiletest, so that individual tests can set their own warning timeout (the regression test uses a 10-second timeout to catch any future regressions). This is perhaps controversial or should be separated out into another issue & PR and discussed independently, but I felt like this could be the best way to test this performance improvement. In the future, the compiletest machinery could be made so that the defined timeouts actually lead to an immediate failure rather than merely a warning.Accordingly, the PR includes a
run-maketest that compiles a generatedasync fnreturning avec!["string".to_string(), …]with 10,000 elements, much like the original example from the issue.I'm not familiar with this part of the compiler so I’d be happy to hear if there's a better way to detect the yield-free case – checking the THIR for
ExprKind::Yieldseemed like the most straightforward approach, but perhaps there’s a better way?I guess the PR would deserve a perf run to make sure this doesn’t impact the ”happy path” overly.
Closes #115327.