-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compiler-rt: memmove optimisation #22606
base: master
Are you sure you want to change the base?
Conversation
lib/compiler_rt/memcpy.zig
Outdated
@@ -18,7 +18,7 @@ comptime { | |||
} | |||
} | |||
|
|||
const Element = if (std.simd.suggestVectorLength(u8)) |vec_size| | |||
pub const Element = if (std.simd.suggestVectorLength(u8)) |vec_size| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just put them back in the same file, like it was before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could, especially if it ends up making sense to share significant parts of their implementations. I was actually planning to move Element
into common.zig
(with a more descriptive name) since memset
is going to want it as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've move Element
's definition to PreferredLoadStoreElement
in common.zig
- I anticipate using it in memset
and possibly memcmp
in the future.
Are you aiming to get this one in for 0.14.0? |
Yes, I'd say it's basically mergable as is (there's one or two small things I can think of that I'd change first), which would fix the code size issue we currently have, The thing that will take more time is benchmarking and fine-tuning things based on benchmarks; that might leave things a bit close to the release date. Benchmarking could always be spun off into followup work if we're happy to merge without proper benchmarking. |
This PR seeks to improve
memmove
performance and fix some issues with generated code size of the current compiler-rtmemmove
.I haven't yet benchmarked this implementation, though I expect the impact to be similar to #18912.
Here is a table of code sizes for
ReleastFast
(targets chosen somewhat randomly, feel free to suggest additions/removals from the list):thumb-freestanding-eabihf
cortex_m3
thumb-freestanding-eabihf
cortex_m4
thumb-freestanding-eabihf
cortex_m33
thumb-freestanding-eabihf
cortex_m52
aarch64-linux
cortex_a53
aarch64-linux
cortex_a75
aarch64-linux
cortex_x1
aarch64-linux
cortex_x4
x86_64-linux
x86_64
x86_64-linux
x86_64_v2
x86_64-linux
x86_64_v3
x86_64-linux
x86_64_v4
I've marked this a ready for review as I'm not sure when I'll get to benchmarking in earnest and I think this should be merged before 0.14. I think there's no problem merging this as-is (modulo any reviews) and doing the following todos in a follow-up post 0.14 if I don't get it done before hand.
Resolves #22603 (at least for the target discussed there, but presumably for any others as well).
Todo:
memcpy