Skip to content

Backend

Bernard Teo edited this page Apr 29, 2020 · 9 revisions

Backend-Wasm

This crate is the WebAssembly backend for Sourceror. Backend-Wasm converts an IR program to a WebAssembly program.

Backend-Wasm also embeds a garbage collector into the WebAssembly program, since Source requires a garbage collector to work properly. The garbage collector implementation is part of this crate.

Structure of the WebAssembly program being emitted

This section explains the abstractions used by Sourceror to map IR to WebAssembly. It assumes that you are familiar with the WebAssembly specification, and the abstractions provided by WebAssembly. In particular, you should be aware that:

  • There are four places that can contain data in a running WebAssembly instance:
    • Global variables
      • Accessible from any function
    • Local variables (aka. registers)
      • Only accessible from the function where they are declared
      • You cannot take a reference to a local variable (so they act like registers)
      • The browser implementation will automatically decide how to map local variables to physical registers (and spill excess local variables onto the stack if necessary)
    • The (protected) stack
      • You cannot access any variable except the topmost one
      • You cannot access variables from parent scopes
    • Linear memory
      • Free-for-all block of unmanaged contiguous memory

Global variables and local variables are given a type (i32, i64, f32, or f64) specified in the WebAssembly binary. All WebAssembly instructions have a type signature, specifying how the operate on the protected stack. For example:

  • The i32.add instruction has type [i32, i32] -> [i32], which means it consumes two i32s from the stack and replaces it with one i32
  • The i64.extend_i32_s instruction has type [i32] -> [i64], which means it consumes one i32 from the stack and replaces it with one i32
  • The if bt instruction has type [i32] -> [bt] where bt is any type (encoded as an immediate in the instruction itself), which means it consumes one i32 from the stack and replaces it with one bt.

This makes it possible for the browser to statically check that every instruction in the WebAssembly binary will never read values of the incorrect type from the protected stack. This static check is known as validation in WebAssembly.

In Source (and hence in the IR), there are some things that cannot be directly represented by the above abstraction:

  • there are types like String and Any that won't fit into a 64-bit value
  • there are type like Func that actually contain a pointer as well as the closure
  • there are reference types like arrays and environments and pairs
    • so you need to be able to take a reference to them
    • they need to be garbage collected

To represent these things, we have to break up some IR types into multiple WebAssembly types, and in some cases make use of the linear memory.

Object representation

The diagram below shows how each IR type is converted to WebAssembly:

Strings and Structs are reference types, so they point to some location in the linear memory:

Content of linear memory

We partition the linear memory like this:

The first diagram is the general partition of linear memory. The second diagram shows the partitioning of the heap when using the Cheney memory manager. The third diagram shows the partitioning of the heap when using the Leaky memory manager. Memory managers are explained below.

The "stack" in this diagram is not the same as the protected stack mentioned above. To make the distinction clear, we call this stack the unprotected stack. The protected stack is validated by the WebAssembly compiler in the browser, but the unprotected stack is not. We will have to ensure on our own that we are interpreting the data in the unprotected stack correctly.

Unprotected stack

The unprotected stack currently is not used except for returning multiple values from a function (for example, when returning Any or Func). Since WebAssembly does not support a function returning multiple values (unless the multi-value proposal is supported), we have to define our own calling convention for it. In this case, we define that when returning multiple values from a function, the values should go onto the unprotected stack. In the future, some escape analysis might allow us to place environments on the unprotected stack if they are captured only by functions that do not live longer than the callee. For example, in this map function that transforms both elements of a pair using the same function:

function map(pr, f) {
    return pair(f(head(pr)), f(tail(pr)));
}

function add_two(pr) {
    const val = 2;
    const new_pr = map(pr, x => x + val); // #1
    return new_pr;
}

At the closure constructed at the line marked with #1, we need to take a reference to the surrounding environment (in particular, we need the val variable). If the compiler can prove that the function x => x + val does not live longer than the invocation of the map function on the same line, then the environment need not be placed on the garbage collected heap. We can instead place the environment on the unprotected stack.

String pool data

To minimise heap allocations and improve the speed of string manipulation, all compile-time string constants are deduplicated and placed in the linear memory. This area of linear memory is initialised at load time and does not get modified throughout the execution of the program.

String pooling is a well-known technique popularised by Java. It is more generally also called string interning, but string interning in general might apply to strings that are produced at runtime too. In Sourceror, string interning is only applied for strings that are compile-time constants. This allows us to generate the entire string pool at compilation time.

Heap

The heap is where all structs and environments are placed by default, unless through some optimisation techniques we can prove that it is correct to place it somewhere else.

There are currently two heap manager implementations that have been written for Sourceror.

  • Cheney — implementation of Cheney's garbage collector
  • Leaky — keeps allocating memory, without reclaiming any old memory

Sourceror is currently compiled with Cheney enabled. To use Leaky instead, you need to recompile Sourceror. Leaky isn't useful in production, but may be useful when debugging Sourceror.

The heap manager has total control over the heap area of the linear memory, and is able to decide how to place allocations in the heap area.

The linear memory is growable upwards (towards higher memory addresses) not dissimilar to the way memory is reserved first and then committed in most operating systems. WebAssembly allows us to commit memory as needed (up to the reserved amount) by growing the linear memory upwards. By placing the heap at the rightmost end of the linear memory, we allow the heap manager to grow its heap when it needs to. Both Cheney and Leaky supports growing the heap.

Memory alignment

All objects in the memory are 4-byte aligned. The object representation on the heap is exactly the same as when the object is in local variables or on the protected stack. When placing strings that have a length that is not a multiple of 4-bytes, additional padding is added. Currently, string operations do not make any assumption about the content of the padding bytes (though maybe it might be possible to change this and instead provide a guarantee that padding bytes in strings must be zeroed).

Stack direction

Notably, the unprotected stack grows downards while Cheney's roots stack grows upwards. This is a deliberate choice. As the browser's WebAssembly implementation will insert bounds-checks to all linear memory accesses (unless optimised away), we are leveraging this check to detect a stack overflow with no additional overhead!

Structure of this crate

The only public function in this module is this function in lib.rs:

fn run_backend(ir_program: &ir::Program, options: Options) -> wasmgen::WasmModule 
  • ir_program — the IR representation of the program
  • options — settings that control the WebAssembly binary that is produced
    • options.wasm_multi_value — whether we are allowed to use opcodes from the WebAssembly multi-value proposal (it makes returning the Any type faster because we won't need to spill into the linear memory)
    • options.wasm_bulk_memory — whether we are allowed to use opcodes from the WebAssembly bulk memory operations proposal (it makes things like string concatenation faster, and might make the garbage collector faster if the garbage collector supports it)
  • return value — the WebAssembly binary that is produced

The WebAssembly feature support is expected to be detected JavaScript code on the host before invoking Sourceror. (However, it is not implemented yet.)

The main part of the backend is in lib.rs and func.rs, so you should probably start from there to get an idea of what the backend does.

Main concepts (crate root)

The crate root (lib.rs) sets up most of the miscellaneous sections in the WebAssembly binary, such as:

  • the imported function to call for runtime errors
  • imported functions requested by library code (currently not used until Sourceror gains module/import support)
  • global variable for unprotected stack pointer
  • string pool
  • memory manager construction

We are left with translating functions, and this is the most important part of the backend.

Function conversion (func module)

Converting IR functions to WebAssembly is the bulk of the backend.

Translation is done recursively (depth-first traversal) &mdash every part of an IR function (e.g. blocks, statements, expressions) can be encoded independently of its nesting location (with a few caveats that can be abstracted away, mentioned below)). In other words, we can generate the instructions that correspond to each part of an IR function by knowing only information about that part itself. This lends nicely to a recursive depth-first traversal of the tree that the IR function represents. The translation converts the tree into a sequence of instructions (encoded as an array by Wasmgen).

Most of the code in this module is dedicated to translating every kind of statement and expression into the corresponding instruction sequence. They should be mostly self-explanatory when you read the code.

Meaning of comments

Throughout the backend, you will see comments like this one:

// net wasm stack: [source_type] -> [target_type]
encode_widening_operation(target_type, source_type, scratch, expr_builder);

The comment tells us how the function will change the protected stack. In this case, this function will pop a value with type source_type from the protected stack, and push a value with type target_type onto the stack. Note that source_type and target_type here refer to IR types — they may expand to a sequence of types in WebAssembly (for example, Any expands to i32 (tag) + i64 (data), and Undefined expands to nothing). By convention, for IR types that expand to more than one WebAssembly type, the part that has lower addresses in linear memory is placed at the top of the stack (so for Any, the i64 will be pushed into the protected stack before the i32).

There are also functions that change multiple items on the protected stack, like this one:

// net wasm stack: [return_ptr, source_type] -> []
encode_store_memory(0, target_type, source_type, scratch, expr_builder);

By convention, the top of the protected stack is on the right. So in the above example, return_ptr is pushed on the stack before source_type. (We use types and values interchangeably for the comment — when we say return_ptr is on the stack, we of course expect the stack to contain a value that is of the type of return_ptr.)

Until such time Wasmgen gains the ability to track the type of the protected stack at compilation time, we will need to rely on such comments to try to ensure that we don't make mistakes.

Tracking state in the depth-first traversal

It is mentioned above that there are a few caveats to encoding function parts independently of their nesting location. The main issue is the use of local variables. Recall that local variables are local to functions, not blocks. This means that all local variables must be declared at the start of a function.

To get around this problem, Wasmgen comes with a stack abstraction over local variable allocation called Scratch. (It is called as such because it was originally only meant for scratch variables — the transient local variables that are only used within the code emitted by some things (e.g. when trying to pop the second value from the protected stack, we need to pop the topmost value into a scratch variable, then pop the second value (which is now the topmost value), and then push the scratch variable back onto the protected stack). At that time, the IR was only able to declare local variables at the start of functions. However, to make it easier to do optimisations on the IR (by having more expressive variable lifetimes), the IR was changed to be able to declare locals in any block. This forced us to need to allocate and deallocate local variables at each block.)

Users of Scratch can do two kinds of operations:

  • allocate a new local variable (scratch.push_XXX)
  • deallocate the previously allocated local variable (scratch.pop_XXX)

It works as a stack — variables must be allocated and deallocated in a first-in-last-out (FILO) manner. There is currently no way (in Scratch) to detect if the caller is not deallocating local variables in the correct order. Things will just break if the order is wrong. A fix (currently not implemented) could be to make push_XXX into some kind of scope guard, which will deallocate the variable at the end of the enclosing scope; in this case there will be no need to explicitly call pop_XXX anymore.

With this abstraction, we can simply pass a reference to the Scratch object when doing the depth-first traversal.

Context

There are two objects passed to every encoding function in func.rs. They are EncodeContext and MutContext.

EncodeContext contains everything that will not change throughout the depth-first search. This includes things like function types (for encoding call instructions), struct field types, the unprotected stack pointer, and some other things. EncodeContext is Copy and so is passed by value into every function of the depth-first search.

MutContext contains everything that can change in the depth-first search. This includes the Scratch instance, the types and associated WebAssembly local variable indices (localidx) of each IR variable encountered from the top of the function until the current scope, and a reference to the WebAssembly module (WasmModule) so that types can be appended to the type section whenever necessary. It is passed by mutable reference into every function of the depth-first search.

[TODO: diagram with mutctx.local_map and mutctx.wasm_local_map]

Type conversion (varconv module)

The varconv module contains WebAssembly instruction sequences for moving IR values between the protected stack, local variables, and the linear memory. It also allows an optional type widening or narrowing.

(Note: Type widening is the conversion of an IR value from a specific type to the Any type. Type narrowing is the conversion of an IR value from the Any type to a specific type.)

Protected stack → Local variable (encode_store_local())

Pops the WebAssembly values representing the given IR value from the stack, and stores them in the given local(s). The IR value may be optionally widened.

Protected stack → Linear memory (encode_store_memory())

Pops the WebAssembly values representing the given IR value from the stack, and writes them to linear memory. The IR value may be optionally widened.

Local variable → Protected stack (encode_load_local())

Reads the IR value from the given locals, and pushes them onto the stack. The IR value currently can be optionally narrowed, but this may be changed in the future when the TypeOf expression is changed to an IfTypeOf statement (that acts like if(TypeOf(x) === v) { const x' = x; ... } else { ... }). If narrowing is enabled, it is unchecked — there are no checks to ensure that the given Any contains the correct variant.

(Note: IfTypeOf guarantees by design that all narrowing conversions are valid, and it aids possible future type inference.)

Linear memory → Protected stack (encode_load_memory())

Loads the IR value from the linear memory, and pushes them onto the stack. The IR value currently can be optionally narrowed, but this may be changed in the future when the TypeOf expression is changed to an IfTypeOf statement. If narrowing is enabled, it is unchecked — there are no checks to ensure that the given Any contains the correct variant.

Protected stack → Protected stack (encode_widening_operation())

Pops the WebAssembly values representing the given IR value from the stack, converts it to a wider representation, and pushes it back onto the stack.

This is used when encoding expressions to convert a narrow result (returned by a previous operation) into a wider value (require by the next operation).

Protected stack → Protected stack (encode_narrowing_operation())

Pops the WebAssembly values representing the given IR value from the stack, converts it to a narrower representation, and pushes it back onto the stack. The narrowing is checked &mdash you can specify what should be encoded if the value contained in the Any does not have the expected type.

This is used when encoding expressions such as the condition of an if-statement, where we need to assert the type of the given expression before converting it to a boolean type.

Garbage collector (gc module)

The HeapManager trait at the root of this module defines the interface between the GC implementation and the rest of the backend. There is some amount of documentation in the source file itself.

The following functions are defined by HeapManager ('...' indicates parameters not shown for brevity):

  • fn initial_heap_size() -> u32
  • // net wasm stack: [] -> [i32(ptr)]
    `fn encode_fixed_allocation(&self, ir_vartype: ir::VarType, ...);
  • // net wasm stack: [i32(num_bytes)] -> [i32(ptr)]
    `fn encode_dynamic_allocation(&self, ir_vartype: ir::VarType, ...);
  • // net wasm stack: [] -> []
    `fn encode_local_roots_prologue(&self, local_types: &[ir::VarType], ...);
  • // net wasm stack: [] -> []
    `fn encode_local_roots_epilogue(&self, local_types: &[ir::VarType], ...);
  • // net wasm stack: [] -> []
    `fn encode_local_root_read(&self, local_root: (ir::VarType, wasmgen::LocalIdx), ...);
  • // net wasm stack: [] -> []
    `fn encode_local_root_write(&self, local_root: (ir::VarType, wasmgen::LocalIdx), ...);
  • // net wasm stack: [] -> []
    `fn encode_local_roots_init(&self, local_types: &[ir::VarType], ...);

Initial heap size

The initial_heap_size() function returns the initial size of the memory that should be pre-committed for the heap manager when the WebAssembly module is instantiated. This allows the backend to figure out the proper amount of space to originally commit for the linear memory, and this value is encoded in the WebAssembly binary itself. Note that the units of the value returned by the initial_heap_size() function is in pages, which have a specific size in WebAssembly regardless of the host system.

Encoding allocations

encode_fixed_allocation() and encode_dynamic_allocation() emits the WebAssembly instructions that make a fixed or dynamic allocation respectively. A fixed allocation is one that has a fixed compile-time size (i.e. a Struct). A dynamic allocation is one whose size is only known at runtime (i.e. a String or Array (unimplemented)). Note that the i32(num_bytes) for the dynamic allocation is the exact number of characters in the String (or elements in the Array) — it includes neither the additional bytes to store the length, nor the padding bytes to round up to a multiple of 4 bytes. It is the responsibility of the GC implementation to add the length and the padding bytes. (The GC is required to ensure that all allocations return a pointer that is a multiple of 4, because all other code emitted in the backend assumes this.) It is roughly equivalent of the new keyword in Java or C#.

The GC roots stack

Most GC implementations need a set of objects that are defined to be reachable, and that is where the reachability search starts from. It is sometimes known as the roots from which to do the search. This set of objects contains precisely every object referenced by a local variable in any scope in the call stack from the entry point (i.e. the main function) until the current point of execution. Hence, we can store copies of these references in a stack, and push/pop them at the start/end of their lifetimes. This stack defines the roots from which to do the search, and is called the GC roots stack. The heap manager implementation sets aside part of the linear memory that it owns for this stack.

It should be noted that we cannot directly walk the local variables when running the garbage collector, because local variables in WebAssembly cannot be accessed by inner frames of the call stack (the call stack is inaccessible to WebAssembly).

Recall that local variables are usually stored in registers, while the GC roots stack is stored in linear memory, and hence in addressable memory (or some cache of it). In order to mitigate significant slowness due to accessing the linear memory, the current design defers writing local variables to the GC roots stack until absolutely necessary.

When is it absolutely necessary? It might be tempting to say that it is necessary whenever we make a fixed or dynamic allocation. But that is not entirely true. It is actually only necessary if the allocation will trigger a run of the garbage collector (e.g. when it runs out of free space). Since making an allocation does not call a separate function (unless the garbage collector is triggered), the local variables of the current function are still accessible when making the allocation. Hence, we can actually try to make the allocation first, and only write the local variables to the GC roots stack if the allocation fails. The garbage collector is meant to run very rarely (as compared to the total number of allocations made), so this allows us to elide writing to the linear memory almost all the time. This is precisely what our backend does &mdash encode_fixed_allocation() and encode_dynamic_allocation() have additional parameters to specify the local variables to be written to the GC roots stack if allocation fails.

Depending on the heap manager implementation, objects may be moved around in memory, so the values of the local variables submitted to the GC roots stack might change. If objects are moved around, the heap manager has the responsibility to update the local variables before returning from encode_fixed_allocation() or encode_dynamic_allocation(). (Note: There may be a bug where intermediate results of expressions may not be submitted to the GC roots stack. For example: ("a" + "b") + ("c" + "d"); — the string "ab" is calculated first and resides in the heap manager, but the concatenation of "c" and "d" might trigger the garbage collector, in which case the "ab" string might get lost. A test case that triggers this bug would be appreciated.)

There is a second place where it is absolutely necessary to write local variables to the GC roots stack — before calling any function that might allocate memory. This is because once the call is made, the callee will be unable to access the caller's local variables. To write local variables to the GC roots stack, encode_local_roots_prologue() should be used. To pop the values from the GC roots stack back to local variables after a function call, encode_local_roots_epilogue() should be used. On a heap manager implementation that does not move objects around (e.g. mark-and-sweep), encode_local_roots_epilogue() should be a no-op. Currently, determining if a function might allocate memory is unimplemented, so encode_local_roots_prologue() and encode_local_roots_epilogue() is invoked for all function calls.

It is wasteful to write to linear memory and read it back on every function call, especially if in a tight loop. Ideally, if there is a local variable declared outside of a loop and the loop is executed at least once, it should be pushed onto the GC roots stack once before the start of loop and popped once after the end of the loop. If the variable is accessed within the loop, the functions encode_local_root_read() and encode_local_root_write() should be used instead — they are meant to read and write to the GC roots stack without changing the number of items in the stack. This is currently not implemented, so encode_local_root_read() and encode_local_root_write() are currently never used.

Note: Determining if a loop is executed at least once does not actually impose any performance penalty. This is because a while-loop is already compiled into an instructions that contain a separate code path if there are no iterations (this is consistent with other optimising compilers, as it elides one jump). For example, using C-like syntax:

while (condition) {
    do_stuff();
}

is loosely translated into

if (condition) {
    do {
        do_stuff();
    } while (condition);
}

Initialising references

Local variables in WebAssembly are zero-initialized, but since local variables may be reused (Scratch determines local variable reuse) the content of an uninitialized local variable is generally arbitrary. The GC will choke if tries to access the object referred to by an uninitialized reference. To initialize a reference type (so that it can be pushed onto the GC roots stack if necessary), encode_local_roots_init() should be used. Its exact behaviour depends on the heap manager implementation, but on heaps that use a GC roots stack, reference types are typically set to a special value of -1 (to denote a null value), and Any should be set to Unassigned (Note: Leaky doesn't do anything, which might be a bug since it doesn't set Any to Unassigned.).

Clone this wiki locally