-
Notifications
You must be signed in to change notification settings - Fork 103
ARM64 notes
The AArch64 ABI says that x18 is reserved for platform use. Apparently you have to treat it as something that can get clobbered at any time.
https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms https://stackoverflow.com/questions/71152539/consequence-of-violating-macoss-arm64-calling-convention
CCL has often used absolute addressing to access some stuff in low memory. On the Mac with Apple silicon, we're not going to be able to do this.
I also just ran across https://developer.apple.com/forums/thread/655950, which says "Modifying pagezero_size isn't a supportable option in the arm64 environment. arm64 code must be in an ASLR binary, which using a custom pagezero_size is incompatible with. An ASLR binary encodes signed pointers using a large random size along with the expected page zero size, and this combination is going to extend beyond the range of values covered in the lower 32-bits.”
On an Apple silicon Mac, it works to compile with cc -Wl,-pagezero_size,0x4000 -g foo.c
, but that produces a binary that won't run: "error: Malformed Mach-o file" is what the debugger prints out.
On an Intel Mac, that same cc -Wl,-pagezero_size,0x4000 -g foo.c
produces a binary that runs.
If that’s the case, then that may be an exciting problem for an ARM64 port (well, an Apple silicon port in particular, I suppose). Maybe we give up on controlling low memory and burn a register to point at the necessary data. Or maybe we can use the TCR (register pointing to per-thread data) somehow.
On Macs with Apple silicon, W^X memory protection is always on. We'll have to deal with that. https://developer.apple.com/documentation/apple-silicon/porting-just-in-time-compilers-to-apple-silicon
Need com.apple.security.cs.allow-jit entitlement, so that we can call mmap(2) with the MAP_JIT
flag. Threads have to use pthread_jit_write_protect_np
to enable and disable write access. Note that this operates per-thread.
Call sys_icache_invalidate after writing new instructions to memory.
CCL currently hard-codes a page size of 4K. On Apple silicon, page size is 16K.
Whenever the stack pointer is used as the base register in an address operand, it must have 16-byte alignment (hardware-enforced).
For example, this doesn’t work:
str x1, [sp, #-8]! ;OK, but sp has only 8 byte alignment...
str x0, [sp, #-8]! ;... so this subsequent store fails
One possibility is to use a different register (and a separate memory area) for the value stack. GPRs don’t have the alignment restriction. This sounds like it’s probably the way to go for CCL, because this maintains the invariant that the contents of the value stack between its bottom at top are always unambiguously nodes.
Some other techniques are described at https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/using-the-stack-in-aarch64-implementing-push-and-pop
Register indirect with offset
ldr x0, [Xn/sp, #imm]
ldr x0, [Xn] ;#0 implied if omitted
This offset can be -256 to +255, or an unsigned multiple of the operand size up to 4095 times the size. For example, ldr x0, [x1, #0x7ff8] is valid. Because the operand size is a 64-bit register, the #xfff immediate is shifted left 3 bits, yielding the 0x7ff8 value above.
It is possible to encode some values either way, with the unscaled signed 9-bit immediate, or with the scaled unsigned 12-bit immediate.
This is an uncomfortably small signed offset.