Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler-rt: memmove optimisation #22606

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

dweiller
Copy link
Contributor

@dweiller dweiller commented Jan 25, 2025

This PR seeks to improve memmove performance and fix some issues with generated code size of the current compiler-rt memmove.

I haven't yet benchmarked this implementation, though I expect the impact to be similar to #18912.

Here is a table of code sizes for ReleastFast (targets chosen somewhat randomly, feel free to suggest additions/removals from the list):

target cpu master (B) 3642e26 (B)
thumb-freestanding-eabihf cortex_m3 16362 438
thumb-freestanding-eabihf cortex_m4 16362 438
thumb-freestanding-eabihf cortex_m33 16362 438
thumb-freestanding-eabihf cortex_m52 2644 420
aarch64-linux cortex_a53 1472 380
aarch64-linux cortex_a75 832 568
aarch64-linux cortex_x1 836 584
aarch64-linux cortex_x4 832 584
x86_64-linux x86_64 1402 564
x86_64-linux x86_64_v2 1402 564
x86_64-linux x86_64_v3 1348 826
x86_64-linux x86_64_v4 1348 826

I've marked this a ready for review as I'm not sure when I'll get to benchmarking in earnest and I think this should be merged before 0.14. I think there's no problem merging this as-is (modulo any reviews) and doing the following todos in a follow-up post 0.14 if I don't get it done before hand.

Resolves #22603 (at least for the target discussed there, but presumably for any others as well).

Todo:

  • benchmark memmove implementation
  • investigate sharing parts of implementation with memcpy

@@ -18,7 +18,7 @@ comptime {
}
}

const Element = if (std.simd.suggestVectorLength(u8)) |vec_size|
pub const Element = if (std.simd.suggestVectorLength(u8)) |vec_size|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just put them back in the same file, like it was before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could, especially if it ends up making sense to share significant parts of their implementations. I was actually planning to move Element into common.zig (with a more descriptive name) since memset is going to want it as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've move Element's definition to PreferredLoadStoreElement in common.zig - I anticipate using it in memset and possibly memcmp in the future.

@dweiller dweiller changed the title Memmove opt compiler-rt: memmove optimisation Jan 25, 2025
@alexrp
Copy link
Member

alexrp commented Jan 29, 2025

Are you aiming to get this one in for 0.14.0?

@dweiller
Copy link
Contributor Author

dweiller commented Jan 29, 2025

Are you aiming to get this one in for 0.14.0?

Yes, I'd say it's basically mergable as is (there's one or two small things I can think of that I'd change first), which would fix the code size issue we currently have, The thing that will take more time is benchmarking and fine-tuning things based on benchmarks; that might leave things a bit close to the release date. Benchmarking could always be spun off into followup work if we're happy to merge without proper benchmarking.

@alexrp alexrp added this to the 0.14.0 milestone Jan 29, 2025
@dweiller dweiller marked this pull request as ready for review January 30, 2025 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

STM32 embedded debug binaries much larger with 0.14.0-dev.2851+b074fb7dd
3 participants