Always inline InterpCx::layout_of after perf regression by Stypox · Pull Request #143334 · rust-lang/rust

Stypox · 2025-07-02T16:21:19Z

Followup to #142721 to fix the performance regression. I ran one quick benchmark locally (ctfe-stress-5) and it does seem to be faster. I further tried adding #[inline(always)] to compiler/rustc_middle/src/ty/layout.rs in layout_of() under LayoutOfHelpers but that didn't change the benchmark results at all.

@rust-timer build e7e3c9e

(I'm not sure I have permission to do the above)

r? @RalfJung

rustbot · 2025-07-02T16:21:25Z

Some changes occurred to the CTFE / Miri interpreter

cc @rust-lang/miri

Some changes occurred to the CTFE machinery

cc @RalfJung, @oli-obk, @lcnr

RalfJung · 2025-07-02T16:28:22Z

That's not quite the right command -- and indeed you won't have permission.

@bors2 try
@rust-timer-queue

rust-bors · 2025-07-02T16:28:25Z

⌛ Trying commit e7e3c9e with merge 1ab17c0…

To cancel the try build, run the command @bors2 try cancel.

Always inline InterpCx::layout_of after perf regression Followup to #142721 to fix the performance regression. I ran one quick benchmark locally (`ctfe-stress-5`) and it does seem to be faster. I further tried adding `#[inline(always)]` to `compiler/rustc_middle/src/ty/layout.rs` in `layout_of()` under `LayoutOfHelpers` but that didn't change the benchmark results at all. `@rust-timer` build e7e3c9e (I'm not sure I have permission to do the above) r? `@RalfJung`

RalfJung · 2025-07-02T16:28:33Z

Argh...
@rust-timer queue

rust-bors · 2025-07-02T18:46:53Z

☀️ Try build successful (CI)
Build commit: 1ab17c0 (1ab17c02e0da006756c695942d5aaceaf2bb9a39, parent: b94bd12401d26ccf1c3b04ceb4e950b0ff7c8d29)

rust-timer · 2025-07-03T16:50:39Z

Finished benchmarking commit (1ab17c0): comparison URL.

Overall result: ❌ regressions - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.8%	[0.1%, 2.3%]	7
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results (primary 2.3%, secondary 1.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	2.3%	[2.3%, 2.3%]	1
Regressions ❌ (secondary)	1.4%	[0.4%, 2.3%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	2.3%	[2.3%, 2.3%]	1

Cycles

Results (primary -0.6%, secondary 0.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	1.6%	[1.4%, 1.9%]	2
Regressions ❌ (secondary)	1.6%	[0.5%, 2.6%]	2
Improvements ✅ (primary)	-5.0%	[-5.0%, -5.0%]	1
Improvements ✅ (secondary)	-0.4%	[-0.4%, -0.4%]	1
All ❌✅ (primary)	-0.6%	[-5.0%, 1.9%]	3

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 462.452s -> 461.112s (-0.29%)
Artifact size: 372.20 MiB -> 372.36 MiB (0.04%)

RalfJung · 2025-07-03T16:59:29Z

So, uh, somehow this actually makes things even worse...?

RalfJung · 2025-07-03T17:10:40Z

It seems potentially non-trivial to optimize away the drop glue for this _span variable for the CTFE instance of the machine. I wonder if it would help to make that analysis easier for the compiler: instead of a TRACING_ENABLED associated constant, we could have:

type ENTERED_TRACING_SPAN = ();
fn enter_tracing_span(tracing::Span) -> Self::ENTERED_TRACING_SPAN { () }

The macro then simply expands to $machine::enter_tracing_span(tracing::info_span!($($tt)*)).

RalfJung · 2025-07-04T19:11:46Z

@rustbot author

rustbot · 2025-07-04T19:11:49Z

Reminder, once the PR becomes ready for a review, use @rustbot ready.

Stypox · 2025-07-05T09:36:01Z

type ENTERED_TRACING_SPAN = ();
fn enter_tracing_span(tracing::Span) -> Self::ENTERED_TRACING_SPAN { () }

I couldn't use this directly as "associated type defaults are unstable" and, most importantly, since trait implementations would be able to override ENTERED_TRACING_SPAN without overriding enter_tracing_span, () would not always have the Self::ENTERED_TRACING_SPAN type. Anyway, I used this instead:

fn enter_trace_span(_span: tracing::Span) -> impl EnteredTraceSpan { () }
trait EnteredTraceSpan {}
impl EnteredTraceSpan for () {}
impl EnteredTraceSpan for tracing::span::EnteredSpan {}

I again ran a very quick benchmark and again the latest commit seems slightly faster than the one before on my device, but it obviously shouldn't be trusted, so please run another full benchmark.

@rustbot ready

Hopefully this will make tracing calls be optimized out properly when tracing is disabled

RalfJung · 2025-07-05T09:42:37Z

I couldn't use this directly as "associated type defaults are unstable" and, most importantly, since trait implementations would be able to override ENTERED_TRACING_SPAN without overriding enter_tracing_span, () would not always have the Self::ENTERED_TRACING_SPAN type.

Ah, fair.

Anyway, I used this instead:

Interesting, I would have just removed the default. ;)

RalfJung · 2025-07-05T09:43:50Z

@bors2 try
@rust-timer queue

rust-bors · 2025-07-05T11:49:04Z

☀️ Try build successful (CI)
Build commit: ce51ffb (ce51ffbeae12a0e63028938a74cbb1da8c986ffe, parent: f0b67dd97d74610ee4185cf01c775a563c2017a2)

rust-timer · 2025-07-05T12:59:39Z

Finished benchmarking commit (ce51ffb): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.4%	[0.1%, 0.8%]	15
Improvements ✅ (primary)	-2.9%	[-2.9%, -2.9%]	1
Improvements ✅ (secondary)	-0.7%	[-1.0%, -0.5%]	3
All ❌✅ (primary)	-2.9%	[-2.9%, -2.9%]	1

Max RSS (memory usage)

Results (primary -3.9%, secondary -0.6%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	3.5%	[3.5%, 3.5%]	1
Improvements ✅ (primary)	-3.9%	[-3.9%, -3.9%]	1
Improvements ✅ (secondary)	-4.8%	[-4.8%, -4.8%]	1
All ❌✅ (primary)	-3.9%	[-3.9%, -3.9%]	1

Cycles

Results (secondary -1.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	3.6%	[2.3%, 4.8%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-7.3%	[-7.5%, -7.0%]	2
All ❌✅ (primary)	-	-	0

Binary size

Results (primary -1.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.1%	[-1.1%, -1.1%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.1%	[-1.1%, -1.1%]	1

Bootstrap: 460.935s -> 461.093s (0.03%)
Artifact size: 372.19 MiB -> 371.92 MiB (-0.07%)

RalfJung · 2025-07-05T13:41:22Z

Fun, so that does seem to help for ctfe-stress, but other benchmarks don't like it, and it doesn't fully negate the effect of #143334. Can you try adding inline(always) to the machine hook as well?

The only other thing I can think of is an entirely different, closure-based API. But given that the regression only affects secondary benchmarks / stress tests, I don't think that's worth it.

Stypox · 2025-07-06T07:10:23Z

I added inline(always) to the machine hook and also opened #143520 as an alternative using a closure (I'm not sure it's what you had in mind but given the simple change I wanted to try it anyway). Could you run the benchmarks on both? Thanks!

Fix perf regression caused by tracing See #143334, this is another alternative that may be worth benchmarking as suggested in #143334 (comment). r? `@RalfJung`

Kobzol · 2025-07-06T08:13:23Z

@bors2 try @rust-timer queue

rust-bors · 2025-07-06T08:13:26Z

⌛ Trying commit 5eefd8b with merge e5441bd…

To cancel the try build, run the command @bors2 try cancel.

Always inline InterpCx::layout_of after perf regression Followup to #142721 to fix the performance regression. I ran one quick benchmark locally (`ctfe-stress-5`) and it does seem to be faster. I further tried adding `#[inline(always)]` to `compiler/rustc_middle/src/ty/layout.rs` in `layout_of()` under `LayoutOfHelpers` but that didn't change the benchmark results at all. `@rust-timer` build e7e3c9e (I'm not sure I have permission to do the above) r? `@RalfJung`

rust-bors · 2025-07-06T10:18:05Z

☀️ Try build successful (CI)
Build commit: e5441bd (e5441bddc082141fd7bdd42b5e72766016324a9c, parent: febb10d0a2d29278135676783f6a22eb83295981)

rust-timer · 2025-07-06T16:35:31Z

Finished benchmarking commit (e5441bd): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.4%	[0.1%, 0.8%]	16
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.6%	[-1.0%, -0.3%]	4
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results (primary -1.6%, secondary -2.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.6%	[-1.6%, -1.6%]	1
Improvements ✅ (secondary)	-2.0%	[-2.0%, -2.0%]	1
All ❌✅ (primary)	-1.6%	[-1.6%, -1.6%]	1

Cycles

Results (primary -1.2%, secondary 3.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	3.1%	[3.1%, 3.1%]	1
Improvements ✅ (primary)	-1.2%	[-1.2%, -1.2%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.2%	[-1.2%, -1.2%]	1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 460.756s -> 461.178s (0.09%)
Artifact size: 372.14 MiB -> 371.86 MiB (-0.08%)

RalfJung · 2025-07-06T17:46:26Z

Pretty much the same as the previous one.

Should we land this or do you want to try the other closure variant in the other PR?

bors · 2025-07-07T18:25:39Z

☔ The latest upstream changes (presumably #143582) made this pull request unmergeable. Please resolve the merge conflicts.

Stypox · 2025-07-08T12:58:17Z

TLDR: I would close this PR and merge #143520 instead, as it should be at least as good as what we have now (i.e. the TRACING_ENABLED const), but probably better.

I did a bit of analysis based on the strings contained in the miri executables. I used as a baseline an executable where the tracing macro always expands to (), and compared the other executables to it. Since I did not do a deeper binary analysis, please take this with a grain of salt, however I think there are some interesting insights:

M::TRACING_ENABLED = true (on parent commit 733b47e and on the inlined 80a2332): leads as expected to an executable containing ...::layout_of::CALLSITE from the tracing macro
M::TRACING_ENABLED = false (on 733b47e and on 80a2332): does not have ...::layout_of::CALLSITE, but still has core::ptr::drop_in_place::<rustc_const_eval::interpret::util::MaybeEnteredSpan> which I guess indicates that the tracing macro was not compiled out completely

^ so there is no difference in behavior between 80a2332 and its parent 733b47e (just a bigger binary size when TRACING_ENABLED = true, but identical binaries when false)

M::enter_trace_span taking a span and returning span.entered() (on 5eefd8b): ...::layout_of::CALLSITE is present as expected
M::enter_trace_span taking a span and returning () (on 5eefd8b): ...::layout_of::CALLSITE is present here too, it does not get optimized out!

so this PR Always inline InterpCx::layout_of after perf regression #143334 makes the situation worse

M::enter_trace_span taking a closure and returning span().entered() (on 57aa88e): ...::layout_of::{closure#0}::CALLSITE is present as expected
M::enter_trace_span taking a closure and returning () (on 57aa88e): ...::layout_of::{closure#0}::CALLSITE is not present, and furthermore the binary has very little differences to the one with the commented out macro (2. in comparison had noticeably more differences, e.g. MaybeEnteredSpan)

so I would say that Fix perf regression caused by tracing #143520 is at least as good as what we have now (i.e. 1. and 2.), and may even be better given the results in 6. and given the benchmark results here

The command I used to make the comparisons was, among others ($1 and $2 are the executables):

diff <(strings -n 20 $1 | rustfilt | sort) <(strings -n 20 $2 | rustfilt | sort) > ${1}_vs_${2}.txt

These are the binaries I compared: https://github.com/Stypox/testing-apks/releases/download/19/miri-binaries-comparisons.zip

RalfJung · 2025-07-08T13:09:03Z

I agree, let's close in favor of #143520.

…=RalfJung Fix perf regression caused by tracing See rust-lang#143334, this is another alternative that may be worth benchmarking as suggested in rust-lang#143334 (comment). r? `@RalfJung`

…=RalfJung Fix perf regression caused by tracing See rust-lang#143334, this is another alternative that may be worth benchmarking as suggested in rust-lang#143334 (comment). r? ``@RalfJung``

Rollup merge of #143520 - Stypox:enter_trace_span-closure, r=RalfJung Fix perf regression caused by tracing See #143334, this is another alternative that may be worth benchmarking as suggested in #143334 (comment). r? ``@RalfJung``

Fix perf regression caused by tracing See rust-lang/rust#143334, this is another alternative that may be worth benchmarking as suggested in rust-lang/rust#143334 (comment). r? ``@RalfJung``

rustbot assigned RalfJung Jul 2, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 2, 2025

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 2, 2025

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 3, 2025

rustbot removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 4, 2025

rustbot added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jul 4, 2025

Always inline InterpCx::layout_of after perf regression

80a2332

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 5, 2025

Stypox force-pushed the layout_of-inline branch from e7e3c9e to dda5bd3 Compare July 5, 2025 09:39

Replace TRACING_ENABLED with enter_trace_span()

77d32ad

Hopefully this will make tracing calls be optimized out properly when tracing is disabled

Stypox force-pushed the layout_of-inline branch from dda5bd3 to 77d32ad Compare July 5, 2025 09:40

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 5, 2025

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jul 5, 2025

Add inline(always) to Machine::enter_trace_span

5eefd8b

Stypox mentioned this pull request Jul 6, 2025

Fix perf regression caused by tracing #143520

Merged

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 6, 2025

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 6, 2025

RalfJung closed this Jul 8, 2025

rustbot removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 8, 2025

Uh oh!

Conversation

Stypox commented Jul 2, 2025 • edited by rustbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rustbot commented Jul 2, 2025

Uh oh!

RalfJung commented Jul 2, 2025

Uh oh!

rust-bors bot commented Jul 2, 2025

Uh oh!

RalfJung commented Jul 2, 2025

Uh oh!

This comment has been minimized.

rust-bors bot commented Jul 2, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Jul 3, 2025

Overall result: ❌ regressions - no action needed

Uh oh!

RalfJung commented Jul 3, 2025

Uh oh!

RalfJung commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RalfJung commented Jul 4, 2025

Uh oh!

rustbot commented Jul 4, 2025

Uh oh!

Stypox commented Jul 5, 2025

Uh oh!

RalfJung commented Jul 5, 2025

Uh oh!

RalfJung commented Jul 5, 2025

Uh oh!

This comment has been minimized.

rust-bors bot commented Jul 5, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Jul 5, 2025

Overall result: ❌✅ regressions and improvements - please read the text below

Uh oh!

RalfJung commented Jul 5, 2025

Uh oh!

Stypox commented Jul 6, 2025

Uh oh!

Kobzol commented Jul 6, 2025

Uh oh!

This comment has been minimized.

rust-bors bot commented Jul 6, 2025

Uh oh!

rust-bors bot commented Jul 6, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Jul 6, 2025

Overall result: ❌✅ regressions and improvements - please read the text below

Uh oh!

RalfJung commented Jul 6, 2025

Uh oh!

bors commented Jul 7, 2025

Uh oh!

Stypox commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RalfJung commented Jul 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Stypox commented Jul 2, 2025 •

edited by rustbot

Loading

RalfJung commented Jul 3, 2025 •

edited

Loading

Stypox commented Jul 8, 2025 •

edited

Loading