-
Notifications
You must be signed in to change notification settings - Fork 259
Fix generated assembly for the outline-atomics feature in Apple AArch64 targets
#1061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…mbol access Co-authored-by: Trevor Gross <tmgross@umich.edu>
|
Self-note: the tests in |
tgross35
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual code changes look good to me, but I don't know why the tests are failing; -1 !- -1 has to come from UB somewhere. @taiki-e since you've been working in this area recently, any chance you have any ideas?
| /// Mach-O ARM64 relocation types: | ||
| /// ARM64_RELOC_PAGE21 | ||
| /// ARM64_RELOC_PAGEOFF12 | ||
| /// | ||
| /// These relocations implement the @PAGE / @PAGEOFF split used by | ||
| /// adrp + add sequences on Apple platforms. | ||
| /// | ||
| /// adrp xN, symbol@PAGE -> ARM64_RELOC_PAGE21 | ||
| /// add xN, xN, symbol@PAGEOFF -> ARM64_RELOC_PAGEOFF12 | ||
| /// | ||
| /// Relocation types defined by Apple in XNU: <mach-o/arm64/reloc.h>. | ||
| /// See: <https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/EXTERNAL_HEADERS/mach-o/arm64/reloc.h>. | ||
| #[cfg(target_vendor = "apple")] | ||
| macro_rules! sym { | ||
| ($sym:literal) => { | ||
| concat!($sym, "@PAGE") | ||
| }; | ||
| } | ||
|
|
||
| #[cfg(target_vendor = "apple")] | ||
| macro_rules! sym_off { | ||
| ($sym:literal) => { | ||
| concat!($sym, "@PAGEOFF") | ||
| }; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This matches my understanding, but @madsmtm mind taking a second look here?
|
@tgross35 I think the bug has to do with sign extensions. Take a look at this minimal use std::sync::atomic::AtomicU8;
/// non-zero if the host supports LSE atomics.
/// This is hardcorded to 1 because Apple M-series macs have this enabled.
static HAVE_LSE_ATOMICS: AtomicU8 = AtomicU8::new(1);
/// Implementation of Compare-And-Swap for i8 for Apple aarch64 platforms, with "Acquire" ordering semantics.
///
/// Stores the `new` value into the atomic integer given by `ptr` if the currently stored value is the same as the value of `current`.
/// The return value is always the previously stored value. If it is equal to `current`, then the value was updated.
#[unsafe(naked)]
pub unsafe extern "C" fn compare_and_swap_acq(current: i8, new: i8, ptr: *mut i8) -> i8 {
core::arch::naked_asm! {
".arch_extension lse",
"adrp x16, {have_lse}@PAGE",
"ldrb w16, [x16, {have_lse}@PAGEOFF]",
"cbz w16, 0f",
"casab w0, w1, [x2]",
"ret",
"0:",
"uxtb w16, w0",
"1:",
"ldaxrb w0, [x2]",
"cmp w0, w16",
"bne 2f",
"stxrb w17, w1, [x2]",
"cbnz w17, 0b",
"2:",
"ret",
have_lse = sym HAVE_LSE_ATOMICS,
};
}
fn main() {
let expected: i8 = -1;
let new: i8 = 0;
let mut target = expected.wrapping_add(10);
// begin test
let _ = unsafe { compare_and_swap_acq(expected, new, &mut target) };
target = expected;
let ret: i8 = unsafe { compare_and_swap_acq(expected, new, &mut target) };
assert_eq!(
ret, expected,
"the new return value should always be the previous value (i.e. the first parameter passed to the function), ret = {ret:?}, expected = {expected:?}, new = {new:?}"
);
}Executing it on a M-series mac with Probably because the byte load instructions (
If so, I think the fix is to sign extend with sxtb: #[unsafe(naked)]
pub unsafe extern "C" fn compare_and_swap_acq(current: i8, new: i8, ptr: *mut i8) -> i8 {
core::arch::naked_asm! {
".arch_extension lse",
"adrp x16, {have_lse}@PAGE",
"ldrb w16, [x16, {have_lse}@PAGEOFF]",
"cbz w16, 0f",
"casab w0, w1, [x2]",
+ "sxtb w0, w0", // Sign-extend byte to 32-bit for correct i8 return value
"ret",
"0:",
"uxtb w16, w0",
"1:",
"ldaxrb w0, [x2]",
"cmp w0, w16",
"bne 2f",
"stxrb w17, w1, [x2]",
"cbnz w17, 0b",
"2:",
+ "sxtb w0, w0", // Sign-extend byte to 32-bit for correct i8 return value
"ret",
have_lse = sym HAVE_LSE_ATOMICS,
};
}I just pushed a change to see if this fixes it. Locally on my laptop it works, but I'll wait on the CI results to be sure. |
|
I just noticed I'd forgotten to add the sign extension to the LSE path as well. The new commit 45f21ff adds that fix. Concerningly, before 45f21ff the CI checks passed (even though the corresponding |
| macro_rules! sign_extend { | ||
| (1) => { | ||
| concat!("sxtb ", reg!(1, 0), ", ", reg!(1, 0)) | ||
| }; | ||
| (2) => { | ||
| concat!("sxth ", reg!(2, 0), ", ", reg!(2, 0)) | ||
| }; | ||
| (4) => { "" }; | ||
| (8) => { "" }; | ||
| (16) => { "" }; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make this more flexible for the future (also more obvious to read), could you update these to take a second $num:literal param that gets passed to reg? Rather than always using 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit 8285fe1 should improve on this.
| concat!(lse!($op, $ordering, $bytes), $( " ", reg!($bytes, $reg), ", " ,)* "[", stringify!($mem), "]; ",), | ||
| "ret; ", | ||
| concat!(lse!($op, $ordering, $bytes), $( " ", reg!($bytes, $reg), ", " ,)* "[", stringify!($mem), "]\n",), | ||
| "ret\n", | ||
| // SXTB s(0), s(0) | ||
| concat!(sign_extend!($bytes), "\n"), | ||
| "8:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this go before the ret?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! thanks for catching that. The new commit should have this fixed.
|
Amanieu knows all of this much better than I do, requested a review. I wonder if the missing sign extension might happen to be the issue mentioned at rust-lang/rust#144938 (comment). @zmodem do you have more info about the crash? LLVM doesn't seem to sign extend either, may be worth opening an issue there https://github.com/llvm/llvm-project/blob/524589119afd59525ab1f124a1538f0311d608de/compiler-rt/lib/builtins/aarch64/lse.S.
Would this be possible to add? |
|
I'm still not sure I understand the difference here; any idea why the failure only showed up on MacOS?
Thinking about it more, there could probably be a hook for this that's only used for testing: #[cfg(feature = "mangled-names")]
pub unsafe fn set_have_lse_atomics(has_lse: bool) { /* ... */ }
#[cfg(feature = "mangled-names")]
pub fn get_have_lse_atomics() -> bool { /* ... */ }Then in the test file: #[track_caller]
fn with_maybe_lse_atomics(use_lse: bool, f: impl FnOnce()) {
// Ensure tests run in parallel don't interleave global settings
static LOCK: Mutex<()> = Mutex::new(());
let _g = LOCK.lock().unwrap();
let old = get_have_lse_atomics();
if use_lse || old { assert!(is_aarch64_feature_enabled("lse"); }
unsafe { set_have_lse_atomics(use_lse) };
f();
unsafe { set_have_lse_atomics(old) };
} |
Thanks for the cc. I'm afraid I don't have much details yet. I filed rust-lang/rust#151486 to track it. |
I guess that is because Apple's calling conventions about argument passing differ from the standard ABI. From Apple docs:
(Although it's not explicitly stated regarding the return value, I guess extension of the return value is callee's responsibility because it allows to avoid extension when returning the value as-is and using it.) |
|
@taiki-e I'm not sure I understand your explanation. From the Apple quote:
Let's take the minimal example, where the
So from that quote, I'd imagine the tests would at best only pass in Apple platforms (or not pass in any AArch64 platform), not the other way around. But in reality it is the other way around! Or maybe I'm missing something? |
Under my understanding, in our usage, it doesn't matter which state the arguments are passed in. Atomic, bit-op, and wrapping arithmetic instructions will work regardless of whether extension is applied. Comparisons also work thanks to zero extension. The issue lies in how the ABI expects the return value to be handled.
IIRC values obtained from atomic instructions are usually zero-extended, so Apple targets would likely require sign extension like this PR does. (And the absence of sign extension should be irrelevant for non-Apple targets.) |
|
Always sign extending isn't correct: this should only be performed if the integer type being passed as an argument or returned is a signed integer. I believe that for all of these intrinsics, the integer types are unsigned, in which case we just need to ensure that the result is zero-extended. This is already the case from the |
|
I believe the correct fix is to change the function declarations to use unsigned integer types instead of signed ones, since that's what LLVM's backend is expecting when calling these. |
bdb73af to
c8abf54
Compare
Co-authored-by: Trevor Gross <tmgross@umich.edu>
c8abf54 to
b1019f5
Compare
|
@Amanieu thanks for the feedback, I wasn't aware these functions expected unsigned types. Commit 66b43e2 should have fixed that. I reverted the sign-extension shenanigans as well. @tgross35 thanks for the idea on enabling LSE for the tests. Commit b1019f5 now makes the tests more comprehensive. Hopefully this works :) |
Context
See rust-lang/rust#149634 (comment).
Most of the credit goes to tgross35/rust@76a4adc.
Summary
The main fixes are (as per Trevor's comment on the issue in rust-lang/rust):
;) in the concatenated macros by newlines (\n)Additionally, I added some doc comments to explain the differences between Apple/Mach-O vs Linux/ELF syntax wrt to the relocation types.