Skip to content

Conversation

@vini-fda
Copy link

@vini-fda vini-fda commented Jan 12, 2026

Context

See rust-lang/rust#149634 (comment).

Most of the credit goes to tgross35/rust@76a4adc.

Summary

The main fixes are (as per Trevor's comment on the issue in rust-lang/rust):

  • Replacing semicolons (;) in the concatenated macros by newlines (\n)
  • Properly accounting for Mach-O vs ELF AArch64 relocation specifier syntax

Additionally, I added some doc comments to explain the differences between Apple/Mach-O vs Linux/ELF syntax wrt to the relocation types.

@vini-fda
Copy link
Author

Self-note: the tests in lse.rs are now up and running but many tests are failing. I'll check out the failing tests and try to fix as I go.

Copy link
Contributor

@tgross35 tgross35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual code changes look good to me, but I don't know why the tests are failing; -1 !- -1 has to come from UB somewhere. @taiki-e since you've been working in this area recently, any chance you have any ideas?

Comment on lines +167 to +191
/// Mach-O ARM64 relocation types:
/// ARM64_RELOC_PAGE21
/// ARM64_RELOC_PAGEOFF12
///
/// These relocations implement the @PAGE / @PAGEOFF split used by
/// adrp + add sequences on Apple platforms.
///
/// adrp xN, symbol@PAGE -> ARM64_RELOC_PAGE21
/// add xN, xN, symbol@PAGEOFF -> ARM64_RELOC_PAGEOFF12
///
/// Relocation types defined by Apple in XNU: <mach-o/arm64/reloc.h>.
/// See: <https://github.com/apple-oss-distributions/xnu/blob/f6217f891ac0bb64f3d375211650a4c1ff8ca1ea/EXTERNAL_HEADERS/mach-o/arm64/reloc.h>.
#[cfg(target_vendor = "apple")]
macro_rules! sym {
($sym:literal) => {
concat!($sym, "@PAGE")
};
}

#[cfg(target_vendor = "apple")]
macro_rules! sym_off {
($sym:literal) => {
concat!($sym, "@PAGEOFF")
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches my understanding, but @madsmtm mind taking a second look here?

@vini-fda
Copy link
Author

vini-fda commented Jan 22, 2026

@tgross35 I think the bug has to do with sign extensions. Take a look at this minimal main.rs file that reproduces the issue:

use std::sync::atomic::AtomicU8;
/// non-zero if the host supports LSE atomics.
/// This is hardcorded to 1 because Apple M-series macs have this enabled.
static HAVE_LSE_ATOMICS: AtomicU8 = AtomicU8::new(1);

/// Implementation of Compare-And-Swap for i8 for Apple aarch64 platforms, with "Acquire" ordering semantics.
///
/// Stores the `new` value into the atomic integer given by `ptr` if the currently stored value is the same as the value of `current`.
/// The return value is always the previously stored value. If it is equal to `current`, then the value was updated.
#[unsafe(naked)]
pub unsafe extern "C" fn compare_and_swap_acq(current: i8, new: i8, ptr: *mut i8) -> i8 {
    core::arch::naked_asm! {
        ".arch_extension lse",
        "adrp    x16, {have_lse}@PAGE",
        "ldrb    w16, [x16, {have_lse}@PAGEOFF]",
        "cbz     w16, 0f",
        "casab   w0, w1, [x2]",
        "ret",
        "0:",
        "uxtb    w16, w0",
        "1:",
        "ldaxrb  w0, [x2]",
        "cmp     w0, w16",
        "bne    2f",
        "stxrb   w17, w1, [x2]",
        "cbnz    w17, 0b",
        "2:",
        "ret",
        have_lse = sym HAVE_LSE_ATOMICS,
    };
}

fn main() {
    let expected: i8 = -1;
    let new: i8 = 0;
    let mut target = expected.wrapping_add(10);
    // begin test
    let _ = unsafe { compare_and_swap_acq(expected, new, &mut target) };
    target = expected;
    let ret: i8 = unsafe { compare_and_swap_acq(expected, new, &mut target) };
    assert_eq!(
        ret, expected,
        "the new return value should always be the previous value (i.e. the first parameter passed to the function), ret = {ret:?}, expected = {expected:?}, new = {new:?}"
    );
}

Executing it on a M-series mac with cargo run --release, it runs into the same issue:

assertion `left == right` failed
  left: -1
 right: -1

Probably because the byte load instructions (ldrb, ldaxrb) and the LSE atomic instruction (casab) zero-extend the loaded byte to fill the 32-bit register w0 (you can confirm this in the ARM docs for ldarb, ldaxrb and casab). So, when it loads -1 (binary 0xFF):

  • Expected (for signed i8): 0xFFFFFFFF (sign-extended)
  • Actual: 0x000000FF (zero-extended, equals 255 unsigned)

If so, I think the fix is to sign extend with sxtb:

 #[unsafe(naked)]
 pub unsafe extern "C" fn compare_and_swap_acq(current: i8, new: i8, ptr: *mut i8) -> i8 {
     core::arch::naked_asm! {
         ".arch_extension lse",
         "adrp    x16, {have_lse}@PAGE",
         "ldrb    w16, [x16, {have_lse}@PAGEOFF]",
         "cbz     w16, 0f",
         "casab   w0, w1, [x2]",
+        "sxtb    w0, w0",  // Sign-extend byte to 32-bit for correct i8 return value
         "ret",
         "0:",
         "uxtb    w16, w0",
         "1:",
         "ldaxrb  w0, [x2]",
         "cmp     w0, w16",
         "bne    2f",
         "stxrb   w17, w1, [x2]",
         "cbnz    w17, 0b",
         "2:",
+        "sxtb    w0, w0",  // Sign-extend byte to 32-bit for correct i8 return value
         "ret",
         have_lse = sym HAVE_LSE_ATOMICS,
     };
 }

I just pushed a change to see if this fixes it. Locally on my laptop it works, but I'll wait on the CI results to be sure.

@vini-fda
Copy link
Author

vini-fda commented Jan 22, 2026

I just noticed I'd forgotten to add the sign extension to the LSE path as well. The new commit 45f21ff adds that fix. Concerningly, before 45f21ff the CI checks passed (even though the corresponding main.rs code from the previous comment errors out on my mac without the LSE path fix). It would be nice if the CI checks covered that path as well...

@vini-fda vini-fda requested a review from tgross35 January 22, 2026 04:21
Comment on lines 199 to 209
macro_rules! sign_extend {
(1) => {
concat!("sxtb ", reg!(1, 0), ", ", reg!(1, 0))
};
(2) => {
concat!("sxth ", reg!(2, 0), ", ", reg!(2, 0))
};
(4) => { "" };
(8) => { "" };
(16) => { "" };
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this more flexible for the future (also more obvious to read), could you update these to take a second $num:literal param that gets passed to reg? Rather than always using 0.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 8285fe1 should improve on this.

Comment on lines 147 to 224
concat!(lse!($op, $ordering, $bytes), $( " ", reg!($bytes, $reg), ", " ,)* "[", stringify!($mem), "]; ",),
"ret; ",
concat!(lse!($op, $ordering, $bytes), $( " ", reg!($bytes, $reg), ", " ,)* "[", stringify!($mem), "]\n",),
"ret\n",
// SXTB s(0), s(0)
concat!(sign_extend!($bytes), "\n"),
"8:"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this go before the ret?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! thanks for catching that. The new commit should have this fixed.

@tgross35
Copy link
Contributor

Amanieu knows all of this much better than I do, requested a review.

I wonder if the missing sign extension might happen to be the issue mentioned at rust-lang/rust#144938 (comment). @zmodem do you have more info about the crash?

LLVM doesn't seem to sign extend either, may be worth opening an issue there https://github.com/llvm/llvm-project/blob/524589119afd59525ab1f124a1538f0311d608de/compiler-rt/lib/builtins/aarch64/lse.S.

It would be nice if the CI checks covered that path as well...

Would this be possible to add?

@tgross35 tgross35 requested a review from Amanieu January 22, 2026 09:36
@tgross35
Copy link
Contributor

I'm still not sure I understand the difference here; any idea why the failure only showed up on MacOS?

It would be nice if the CI checks covered that path as well...

Would this be possible to add?

Thinking about it more, there could probably be a hook for this that's only used for testing:

#[cfg(feature = "mangled-names")]
pub unsafe fn set_have_lse_atomics(has_lse: bool) { /* ... */ }
#[cfg(feature = "mangled-names")]
pub fn get_have_lse_atomics() -> bool { /* ... */ }

Then in the test file:

#[track_caller]
fn with_maybe_lse_atomics(use_lse: bool, f: impl FnOnce()) {
    // Ensure tests run in parallel don't interleave global settings
    static LOCK: Mutex<()> = Mutex::new(());
    let _g = LOCK.lock().unwrap();
    let old = get_have_lse_atomics();
    if use_lse || old { assert!(is_aarch64_feature_enabled("lse"); }
    unsafe { set_have_lse_atomics(use_lse) };
    f();
    unsafe { set_have_lse_atomics(old) };
}

@zmodem
Copy link

zmodem commented Jan 22, 2026

I wonder if the missing sign extension might happen to be the issue mentioned at rust-lang/rust#144938 (comment). @zmodem do you have more info about the crash?

Thanks for the cc. I'm afraid I don't have much details yet. I filed rust-lang/rust#151486 to track it.

@taiki-e
Copy link
Member

taiki-e commented Jan 22, 2026

any idea why the failure only showed up on MacOS?

I guess that is because Apple's calling conventions about argument passing differ from the standard ABI. From Apple docs:

The caller of a function is responsible for signing or zero-extending any argument with fewer than 32 bits. The standard ABI expects the callee to sign or zero-extend those arguments.

(Although it's not explicitly stated regarding the return value, I guess extension of the return value is callee's responsibility because it allows to avoid extension when returning the value as-is and using it.)

@vini-fda
Copy link
Author

vini-fda commented Jan 23, 2026

@taiki-e I'm not sure I understand your explanation. From the Apple quote:

The caller of a function is responsible for signing or zero-extending any argument with fewer than 32 bits. The standard ABI expects the callee to sign or zero-extend those arguments.

Let's take the minimal example, where the main function is the caller and the compare_and_swap_acq is the callee.

  • In the Apple convention, the caller (main) should be responsible for sign-extending, not the callee (the custom naked function compare_and_swap_acq)
  • In the standard convention, it's the other way around: the callee (our custom function) should perform that operation

So from that quote, I'd imagine the tests would at best only pass in Apple platforms (or not pass in any AArch64 platform), not the other way around. But in reality it is the other way around! Or maybe I'm missing something?

@taiki-e
Copy link
Member

taiki-e commented Jan 23, 2026

@vini-fda

So from that quote, I'd imagine the tests would at best only pass in Apple platforms (or not pass in any AArch64 platform), not the other way around. But in reality it is the other way around! Or maybe I'm missing something?

Under my understanding, in our usage, it doesn't matter which state the arguments are passed in. Atomic, bit-op, and wrapping arithmetic instructions will work regardless of whether extension is applied. Comparisons also work thanks to zero extension.

The issue lies in how the ABI expects the return value to be handled.

  • Although not explicitly stated in the standard ABI, extensions should not be expected. Otherwise, functions that just return input values would also require adding extension processing.
  • In Apple's calling conventions, argument extension is handled by the caller, so the return value is properly extended without additional processing when just returning input value. If return value extension is not callee's responsibility, the caller must handle extension for both arguments and return values, which is inefficient. Therefore, (although [not explicitly stated) return value extension is likely the callee's responsibility.

IIRC values obtained from atomic instructions are usually zero-extended, so Apple targets would likely require sign extension like this PR does. (And the absence of sign extension should be irrelevant for non-Apple targets.)

@Amanieu
Copy link
Member

Amanieu commented Jan 23, 2026

Always sign extending isn't correct: this should only be performed if the integer type being passed as an argument or returned is a signed integer. I believe that for all of these intrinsics, the integer types are unsigned, in which case we just need to ensure that the result is zero-extended. This is already the case from the ldrb/ldrh, so no additional instruction is needed.

@Amanieu
Copy link
Member

Amanieu commented Jan 23, 2026

I believe the correct fix is to change the function declarations to use unsigned integer types instead of signed ones, since that's what LLVM's backend is expecting when calling these.

@vini-fda vini-fda force-pushed the outline-atomics-apple-aarch64 branch from bdb73af to c8abf54 Compare January 23, 2026 23:11
Co-authored-by: Trevor Gross <tmgross@umich.edu>
@vini-fda vini-fda force-pushed the outline-atomics-apple-aarch64 branch from c8abf54 to b1019f5 Compare January 23, 2026 23:12
@vini-fda
Copy link
Author

vini-fda commented Jan 23, 2026

@Amanieu thanks for the feedback, I wasn't aware these functions expected unsigned types. Commit 66b43e2 should have fixed that. I reverted the sign-extension shenanigans as well.

@tgross35 thanks for the idea on enabling LSE for the tests. Commit b1019f5 now makes the tests more comprehensive. Hopefully this works :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants