-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Cranelift: implement "precise store traps" in presence of store-teari…
…ng hardware. As discussed at in #7237 and WebAssembly/design#1490, some instruction-set architectures do not guarantee that a store that "partially traps" (overlaps multiple pages, only one of which disallows the store) does not also have some side-effects. In particular, the part of the store that *is* legal might succeed. This has fascinating implications for virtual memory-based WebAssembly heap implementations: when a store is partially out-of-bounds, it should trap (there is no "partially" here: if the last byte is out of bounds, the store is out of bounds). A trapping store should not alter the final memory state, which is observable by the outside world after the trap. Yet, the simple lowering of a Wasm store to a machine store instruction could violate this expectation, in the presence of "store tearing" as described above. Neither ARMv8 (aarch64) nor RISC-V guarantee lack of store-tearing, and we have observed it in tests on RISC-V. This PR implements the idea first proposed [here], namely to prepend a load of the same size to every store. The idea is that if the store will trap, the load will as well; and precise exception handling, a well-established principle in all modern ISAs, guarantees that instructions beyond a trapping instruction have no effect. This is off by default, and is mainly meant as an option to study the impact of this idea and to allow for precise trap execution semantics on affected machines unless/until the spec is clarified. On an Apple M2 Max machine (aarch64), this was measured to have a 2% performance impact when running `spidermonkey.wasm` with a simple recursive Fibonacci program. It can be used via the `-Ccranelift-ensure_precise_store_traps=true` flag to Wasmtime. [here]: WebAssembly/design#1490 (comment)
- Loading branch information
Showing
6 changed files
with
181 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
//! Precise-store-traps pass. | ||
//! | ||
//! On some instruction-set architectures, a store that crosses a page | ||
//! boundary such that one of the pages would fault on a write can | ||
//! sometimes still perform part of its memory update on the other | ||
//! page. This becomes relevant, and problematic, when page | ||
//! protections are load-bearing for Wasm VM semantics: see [this | ||
//! issue] where a partially-out-of-bounds store in Wasm is currently | ||
//! defined to perform no side-effect, but with a common lowering on | ||
//! several ISAs and on some microarchitectures does actually perform | ||
//! a "torn write". | ||
//! | ||
//! [this issue]: https://github.com/WebAssembly/design/issues/1490 | ||
//! | ||
//! This pass performs a transform on CLIF that should avoid "torn | ||
//! partially-faulting stores" by performing a throwaway *load* before | ||
//! every store, of the same size and to the same address. This | ||
//! throwaway load will fault if the store would have faulted due to | ||
//! not-present pages (this still does nothing for | ||
//! readonly-page-faults). Because the load happens before the store | ||
//! in program order, if it faults, any ISA that guarantees precise | ||
//! exceptions (all ISAs that we support) will ensure that the store | ||
//! has no side-effects. (Microarchitecturally, once the faulting | ||
//! instruction retires, the later not-yet-retired entries in the | ||
//! store buffer will be flushed.) | ||
//! | ||
//! This is not on by default and remains an "experimental" option | ||
//! while the Wasm spec resolves this issue, and serves for now to | ||
//! allow collecting data on overheads and experimenting on affected | ||
//! machines. | ||
use crate::cursor::{Cursor, FuncCursor}; | ||
use crate::ir::types::*; | ||
use crate::ir::*; | ||
|
||
fn covering_type_for_value(func: &Function, value: Value) -> Type { | ||
match func.dfg.value_type(value).bits() { | ||
8 => I8, | ||
16 => I16, | ||
32 => I32, | ||
64 => I64, | ||
128 => I8X16, | ||
_ => unreachable!(), | ||
} | ||
} | ||
|
||
/// Perform the precise-store-traps transform on a function body. | ||
pub fn do_precise_store_traps(func: &mut Function) { | ||
let mut pos = FuncCursor::new(func); | ||
while let Some(_block) = pos.next_block() { | ||
while let Some(inst) = pos.next_inst() { | ||
match &pos.func.dfg.insts[inst] { | ||
&InstructionData::StackStore { | ||
opcode: _, | ||
arg: data, | ||
stack_slot, | ||
offset, | ||
} => { | ||
let ty = covering_type_for_value(&pos.func, data); | ||
let _ = pos.ins().stack_load(ty, stack_slot, offset); | ||
} | ||
&InstructionData::DynamicStackStore { | ||
opcode: _, | ||
arg: data, | ||
dynamic_stack_slot, | ||
} => { | ||
let ty = covering_type_for_value(&pos.func, data); | ||
let _ = pos.ins().dynamic_stack_load(ty, dynamic_stack_slot); | ||
} | ||
&InstructionData::Store { | ||
opcode, | ||
args, | ||
flags, | ||
offset, | ||
} => { | ||
let (data, addr) = (args[0], args[1]); | ||
let ty = match opcode { | ||
Opcode::Store => covering_type_for_value(&pos.func, data), | ||
Opcode::Istore8 => I8, | ||
Opcode::Istore16 => I16, | ||
Opcode::Istore32 => I32, | ||
_ => unreachable!(), | ||
}; | ||
let _ = pos.ins().load(ty, flags, addr, offset); | ||
} | ||
&InstructionData::StoreNoOffset { | ||
opcode: Opcode::AtomicStore, | ||
args, | ||
flags, | ||
} => { | ||
let (data, addr) = (args[0], args[1]); | ||
let ty = covering_type_for_value(&pos.func, data); | ||
let _ = pos.ins().atomic_load(ty, flags, addr); | ||
} | ||
&InstructionData::AtomicCas { .. } | &InstructionData::AtomicRmw { .. } => { | ||
// Nothing: already does a read before the write. | ||
} | ||
&InstructionData::NullAry { | ||
opcode: Opcode::Debugtrap, | ||
} => { | ||
// Marked as `can_store`, but no concerns here. | ||
} | ||
inst => { | ||
assert!(!inst.opcode().can_store()); | ||
} | ||
} | ||
} | ||
} | ||
} |
41 changes: 41 additions & 0 deletions
41
cranelift/filetests/filetests/egraph/precise-store-traps.clif
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
test optimize | ||
set opt_level=speed | ||
set ensure_precise_store_traps=true | ||
target x86_64 | ||
|
||
function %f0(i64) { | ||
block0(v0: i64): | ||
v1 = iconst.i64 0 | ||
store.i64 v1, v0 | ||
return | ||
} | ||
|
||
; check: load.i64 v0 | ||
; check: store v1, v0 | ||
|
||
function %f1(i64, i8x16) { | ||
block0(v0: i64, v1: i8x16): | ||
store.i64 v1, v0 | ||
return | ||
} | ||
|
||
; check: load.i8x16 v0 | ||
; check: store v1, v0 | ||
|
||
function %f2(i64, i64) { | ||
block0(v0: i64, v1: i64): | ||
istore8 v1, v0 | ||
return | ||
} | ||
|
||
; check: load.i8 v0 | ||
; check: istore8 v1, v0 | ||
|
||
function %f3(i64, i64) { | ||
block0(v0: i64, v1: i64): | ||
atomic_store.i64 v1, v0 | ||
return | ||
} | ||
|
||
; check: atomic_load.i64 v0 | ||
; check: atomic_store v1, v0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters