Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
177 changes: 177 additions & 0 deletions docs/design/coreclr/botr/clr-abi.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ The LoongArch64 ABI documentation is [here](https://github.com/loongson/LoongArc

The RISC-V ABIs Specification: [latest release](https://github.com/riscv-non-isa/riscv-elf-psabi-doc/releases/latest), [latest draft](https://github.com/riscv-non-isa/riscv-elf-psabi-doc/releases), [document source repo](https://github.com/riscv-non-isa/riscv-elf-psabi-doc).

Web Assembly Basic C ABI: [Basic C ABI](https://github.com/WebAssembly/tool-conventions/blob/main/BasicCABI.md)

# General Unwind/Frame Layout

For all non-x86 platforms, all methods must have unwind information so the garbage collector (GC) can unwind them (unlike native code in which a leaf method may be omitted).
Expand Down Expand Up @@ -698,3 +700,178 @@ The stack elements are always aligned to at least `INTERP_STACK_SLOT_SIZE` and n
Primitive types smaller than 4 bytes are always zero or sign extended to 4 bytes when on the stack.

When a function is async it will have a continuation return. This return is not done using the data stack, but instead is done by setting the Continuation field in the `InterpreterFrame`. Thunks are responsible for setting/resetting this value as we enter/leave code compiled by the JIT.

# Web Assembly ABI (R2R and JIT)

For managed methods compiled to Web Assembly (hereafter "managed code") the CLR generally follows the [Wasm Basic C ABI](https://github.com/WebAssembly/tool-conventions/blob/main/BasicCABI.md).

Managed code uses the same linear stack as C code. The stack grows down.

## Incoming argument ABI

The linear stack pointer is the first argument to all methods. At a native->managed transition it is the value of the `$__stack_pointer` global. This global is not updated within managed code, but is updated at managed->native boundaries. Within the method the stack pointer always points at the bottom (lowest address) of the stack; generally this is a fixed offset from the value the stack pointer held on entry, except in methods that can do dynamic allocation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means helper calls all become managed->native boundaries that require a stack pointer update at start and end, right? Is that a problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a code size cost for each helper call... it could be amortized with a custom wrapper for each helper that just does the $__stack_pointermaintenance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For PInvokes, I expect that this update will be done around the callsite in the managed code.

Where do you expect it to be done for FCalls? We assume that FCalls have the same managed calling convention. Can this update be done inside the FCall macro somehow, so that FCalls can continue to have managed calling convention?

If we are not able to do that, I guess we will need to create some sort of FCalls wrappers. It is doable, but it is not pretty - we have been there in the past.

For reference, what does native AOT / LLVM do for FCalls currently?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For NAOT/LLVM it looks like FCalls have an extra initial arg that they ignore:

https://github.com/dotnet/runtimelab/blob/7706cd182716062d4fa550e88abd004e1a82dcd5/src/coreclr/nativeaot/Runtime/MathHelpers.cpp#L12

I don't see where the stack pointer global is updated; maybe NAOT/LLVM doesn't need this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, what does native AOT / LLVM do for FCalls currently?

NAOT-LLVM shadow stack is allocated separately from the __stack_pointer stack, so we only need to track it for the purposes of transition frames with virtual unwinding (another way of putting it is that it is only used for managed code).

Can this update be done inside the FCall macro somehow, so that FCalls can continue to have managed calling convention?

I don't know if it is possible with __attribute__((naked)) trickery to do it all in one function, but it is definitely possible with __asm to insert a stub with the managed calling convention (that'd do global.set __stack_pointer) into FCIMPL.


A frame pointer, if used, points at the bottom of the "fixed" portion of the stack to facilitate use of Wasm addressing modes, which only allow positive offsets.

Structs are generally passed by-reference, unless they happen to exactly contain a single primitive field (or be a struct exactly containing such a struct). The linear stack provides the backing storage for the by-reference structs.

Structs are generally returned via hidden buffers, whose address is supplied by the caller and passed just after the `sp` argument. In such cases the return value of the method is the address of the return value. But if the struct can be passed on the wasm stack it is returned on the wasm stack.

(TBD: ABI for vector types)

### Prolog

The prolog will increment the stack pointer, home any arguments that are stored on the linear stack, and zero initialize slots on the linear stack as appropriate. It will establish a frame pointer if one is needed.

It will also save a frame descriptor onto the stack, for use during GC and EH. For methods with EH or GC, a slot on the linear stack will be reserved for a "virtual IP" that will index into the EH and GC info to provide within-method information and allow external code to walk the managed stack frames.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frame descriptor is GCInfo, yes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will also refer to the EH info, so a bit more general.

Copy link
Member

@BrzVlad BrzVlad Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need the stack frames to be linked together in order to provide support for walking managed frames ? So, from the sp value in the current frame, shouldn't we be able to obtain the sp of the parent frame in order to get the descriptor information, etc ? I guess the plan would be to fetch the previous sp from sp[-1]? Would there be methods where sp can be dynamically incremented, in which case this wouldn't work ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the last frame's base address, the plan is that the frames are self-descriptive.

To get the base address (using the scheme above where sp can diverge from $__stack_pointer), we rely on the fact stack walks can only happen once the R2R code has called back into native code (either helper methods, or the interpreter). These calls are passed sp as arguments and save that to the global $__stack_pointer and perhaps some other global or similar for easy access by the unwinder.

The frame descriptor will be at a known offset from this saved sp (likely 0) and the size of the frame will be stored in the descriptor, so the external code can compute the address of the parent frame that way, eg parent_sp = sp + sp[0].frameSize.

For dynamic-sized frames a copy of the prior sp can be likewise stored at some other known offset from sp) to provide the necessary chaining. If the frame grows then this value can be re-established to reflect the new size. Or we can equivalently store the total frame size.

If we follow Katelyn's proposal of keeping $__stack_pointer in sync for all managed methods then there's a bit less ceremony required, but from there the unwinding proceeds the same way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For methods with EH or GC

It is not clear what a "method with GC" means. Should this say for methods with calls or EH (ie non-leaf methods)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, methods with GC safe points (which will always be at calls) or EH.


### Epilog

Generally the epilogs will be empty. There is no notion of callee-save registers in Wasm, and no other global state to update.

## Outgoing call ABI

For direct managed calls, Wasm uses the Portable Entry Point feature to facilitate smooth interop with interpreted code. This means all managed calls are made indirectly, and the portable entry point is also passed as the last argument.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the long run will we optimize this for cases where we know both the caller and callee were crossgen'd? I'm fine with not specifying that yet though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can optimize that case. Though in general there is no guarantee the runtime will use R2R compiled callee code.

On Wasm this may be less of an issue because the cases where R2R method bodies end up being disqualified may not be possible.

Copy link
Member

@jkotas jkotas Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can optimize that case

Yes.

Though in general there is no guarantee the runtime will use R2R compiled callee code.

The fixups for the caller would have to verify that the directly callled method is going to use R2R too. If the fixup fails, R2R code for the caller would have to be rejected as well.

(We do something similar for ReJIT. If there is a ReJIT request for a method that got inlined, all methods that inlined it must be invalidated as well.)


The call sequence will then be
```
local.get sp
push arg 0
...
push arg N-1
load PortableEntryPointPtr
dup
load CellIndex (from pe)
call_indirect <tableIndex> <sigIndex> (sig is: int32 (sp) arg0... argN-1 int32 (pe))
```
Initially the cell will contain code to determine if the target method has R2R code or must be interpreted. If there is R2R code for the method it is fixed up as needed. Once the target is resolved the cell can be updated to just refer to the R2R code directly, if there is any, or to a thunk for invoking the interpreter.

For indirect managed calls the sequence is similar, but the portable entry point is obtained by calling a resolve helper:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should say "virtual managed calls". Indirect calls (calli) should get the portable-entry-point to call from IL stack, no need to call resolve helper.

Also, there is a potential optimization for vtable-based virtual calls to just fetch the entrypoint by indexing into vtable like we do everywhere else.

```
local.get sp
push arg 0
...
push arg N-1
... push args for resolution ...
call resolve
dup
load CellIndex (from pe)
call_indirect <tableIndex> <sigIndex> (sig is: int32 (sp) arg0... argN-1 int32 (pe))
```
Because the `pe` arg must be passed to the portable entrypoint, all method signatures must reflect the extra final argument (even though it will be unused). Thus for example a managed method like `int F(int x)` will have a Wasm signature `(func (param int32 int32 int32) (result int32))`.

Alternatively we may choose to pass the `pe` via a Wasm global.

Helper calls known to be native code can be called directly. The calling sequence must re-establish the `$__stack_pointer` global:

```
local.get sp
global.set $__stack_pointer

push arg 0
...
push arg N-1
call <tableIndex> <sigIndex> (sig is: arg0... argN-1)
```

Helper calls that are managed use the managed calling sequence.

## GC References at Call Sites

Wasm does not allow for outside access to the Wasm stack. So, before call sites that may trigger GC, all GC references live after the call (and all untracked GC references, which are effectively always live) must be saved to the linear stack. These GC references will be reported as pinned to the GC so that if they normally live in Wasm locals they do not need to be updated after the call. The live GC slots on the linear stack will be identified by the virtual IP (also stored on the linear stack) and the GC info (accessible from the frame descriptor, also on the linear stack).

So for example if we have code like `x(a, y(b)); ... a; ... b;` where `a` and `b` are gc refs that initially are in wasm locals, this fragment would compile into something like
```
;; sp for call to x
local.get sp

;; spill a to linear memory
local.get sp
local.get a
i32.store offset=(a's offset in gc area of stack)

;; arg a for call to x
local.get a

;; sp for call to y
local.get sp

;; spill b to linear memory
local.get sp
local.get b
i32.store offset=(b's offset in gc area of stack)

;; arg b for call to y
local.get b

;; update virtual IP for call to y with live gc refs
local.get sp
i32.const virtual-ip-for-call-to-y (gc info : a and b slots live)
i32.store offset=(virtual-ip offset)

;; fetch PE for y and cell index from PE, call y
load PortableEntryPointPtr for y
dup
load CellIndex (from pe)
call_indirect <tableIndex> <sigIndex> (sig is: int32 (sp) int32 int32 (pe) : returns int32)

;; update virtual IP for call to x with live gc refs [can be optimized out]
local.get sp
i32.const virtual-ip-for-call-to-x (gc info : a and b slots live)
i32.store offset=(virtual-ip offset)

;; fetch PE for x and cell index from PE, call x
load PortableEntryPointPtr for x
dup
load CellIndex (from pe)
call_indirect <tableIndex> <sigIndex> (sig is: int32 (sp) int32 int32 (pe) : returns int32)
```
Notes:
* As an optimization, we can avoid updating the virtual IP when the GC/EH info it refers to is unchanged from the last update.
* We may want to un-nest calls, relying on a Wasm local instead of the Wasm stack to convey nested call results to the parent call.
* As an optimization, we will try and minimize storing gc refs to the linear stack (eg if the value already there hasn't changed from the last update).
* As an optimization, we may try and have some gc refs primarily live on the linear stack, and not be held in Wasm locals.

## Tail Calls

For tail calls the only differences are the use of the `return_call_indirect` in the call, and passing the original `sp` value to the callee:
```
local.get sp
i32.const <frameSize>
i32.add

push arg 0
...
push arg N-1
load PortableEntryPointPtr
dup
load CellIndex (from pe)
return_call_indirect <tableIndex> <sigIndex> (sig is: int32 (sp) arg0... argN-1 int32 (pe))
```
and similarly for indirect managed calls.

## PInvoke

TBD

## Reverse PInvoke

TBD

## Async

TBD

## Interpreter Stubs

There will be stubs involved in both managed code->interpreter and interpreter->managed code calls. For R2R these will be per signature, generated by crossgen2.

### Interpreted -> Managed

The interpreter->managed stub will load the global `$__stack_pointer`, then the method arguments from the interpreter stack, and finally `int32.const 0` for the final pe argument, which will be ignored by managed code (that last part can be omitted, if we pass this via a wasm global instead), and then call the managed method.

On return the global `$__stack_pointer` is reset to the value it had on stub entry.

### Managed->Interpreted

This stub will be passed the current managed `sp` and must store it into the global `$__stack_pointer`. The interpreter stack (see above) will be extended with a new `InterpMethodContextFrame` frame, and arguments will be moved from Wasm locals to the frame. The `pe` argument will then be used to invoke the interpreter on the proper IL method body.