Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend validation algorithm appendix #390

Merged
merged 4 commits into from
Jul 20, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
142 changes: 99 additions & 43 deletions document/core/appendix/algorithm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,21 +16,24 @@ Consequently, it can be integrated directly into a decoder.
The algorithm is expressed in typed pseudo code whose semantics is intended to be self-explanatory.


.. index:: value type, stack, label, frame, instruction
.. index:: value type, reference type, vector type, number type, packed type, field type, structure type, array type, function type, compound type, sub type, recursive type, defined type, stack, label, frame, instruction

Data Structures
~~~~~~~~~~~~~~~

Types
.....

Value types are representable as a set of enumerations.
Value types are representable as sets of enumerations:

.. code-block:: pseudo

type num_type = I32 | I64 | F32 | F64
type vec_type = V128
type heap_type = Def(idx : nat) | Func | Extern | Bot
type heap_type =
Any | Eq | I31 | Struct | Array | None |
Func | Nofunc | Extern | Noextern | Bot |
Def(def : def_type)
type ref_type = Ref(heap : heap_type, null : bool)
type val_type = num_type | vec_type | ref_type | Bot

Expand All @@ -43,38 +46,54 @@ Value types are representable as a set of enumerations.
func is_ref(t : val_type) : bool =
return not (is_num t || is_vec t) || t = Bot

Equivalence and subtyping checks can be defined on these types.
Similarly, :ref:`defined types <syntax-deftype>` :code:`def_type` can be represented:

.. code-block:: pseudo

func eq_def(n1, n2) =
... // check that both type definitions are equivalent (TODO)

func matches_null(null1 : bool, null2 : bool) : bool =
return null1 = null2 || null2

func matches_heap(t1 : heap_type, t2 : heap_type) : bool =
switch (t1, t2)
case (Def(n1), Def(n2))
return eq_def(n1, n2)
case (Def(_), Func)
return true
case (Bot, _)
return true
case (_, _)
return t1 = t2

func matches_ref(t1 : ref_type, t2 : ref_type) : bool =
return matches_heap(t1.heap, t2.heap) && matches_null(t1.null, t2.null)

func matches(t1 : val_type, t2 : val_type) : bool =
switch (t1, t2)
case (Ref(_), Ref(_))
return matches_ref(t1, t2)
case (Bot, _)
return true
case (_, _)
return t1 = t2
type packed_type = I8 | I16
type field_type = Field(val : val_type | packed_type, mut : bool)

type struct_type = Struct(fields : list(field_type))
type array_type = Array(fields : field_type)
type func_type = Func(params : list(val_type), results : list(val_type))
type comp_type = struct_type | array_type | func_type

type sub_type = Sub(super : list(def_type), body : comp_type, final : bool)
type rec_type = Rec(types : list(sub_type))

type def_type = Def(rec : rec_type, proj : int32)

func unpack_field(t : field_type) : val_type =
if (it = I8 || t = I16) return I32
return t

func expand_def(t : def_type) : comp_type =
return t.rec.types[t.proj].body

These representations assume that all types have been :ref:`closed <type-closed>` by :ref:`substituting <type-subst>` all :ref:`type indices <syntax-typeidx>` (in :ref:`concrete heap types <syntax-heaptype>` and in :ref:`sub types <syntax-subtype>`) with their respective :ref:`defined types <syntax-deftype>`.
This includes *recursive* references to enclosing :ref:`defined types <syntax-deftype>`,
such that type representations form graphs and may be *cyclic* for :ref:`recursive types <syntax-rectype>`.

We assume that all types have been *canonicalized*, such that equality on two type representations holds if and only if their :ref:`closures <type-closure>` are syntactically equivalent, making it a constant-time check.

.. note::
For the purpose of type canonicalization, recursive references from a :ref:`heap type <syntax-heaptype>` to an enclosing :ref:`recursive type <syntax-reftype>` (i.e., forward edges in the graph that form a cycle) need to be distinguished from references to previously defined types.
However, this distinction does not otherwise affect validation, so is ignored here.
In the graph representation, all recursive types are effectively infinitely :ref:`unrolled <aux-unroll-rectype>`.

We further assume that :ref:`validation <valid-valtype>` and :ref:`subtyping <match-valtype>` checks are defined on value types, as well as a few auxiliary functions on compound types:

.. code-block:: pseudo

func validate_val_type(t : val_type)
func validate_ref_type(t : ref_type)

func matches_val(t1 : val_type, t2 : val_type) : bool
func matches_ref(t1 : val_type, t2 : val_type) : bool

func is_func(t : comp_type) : bool
func is_struct(t : comp_type) : bool
func is_array(t : comp_type) : bool

Context
.......
Expand All @@ -84,6 +103,8 @@ For the purpose of presenting the algorithm, it is maintained in a set of global

.. code-block:: pseudo

var return_type : list(val_type)
var types : array(def_type)
var locals : array(val_type)
var locals_init : array(bool)
var globals : array(global_type)
Expand Down Expand Up @@ -142,7 +163,7 @@ However, these variables are not manipulated directly by the main checking funct

func pop_val(expect : val_type) : val_type =
let actual = pop_val()
error_if(not matches(actual, expect))
error_if(not matches_val(actual, expect))
return actual

func pop_num() : num_type | Bot =
Expand Down Expand Up @@ -249,7 +270,6 @@ it is an invariant of the validation algorithm that there always is at least one
However, a polymorphic stack cannot underflow, but instead generates :code:`Bot` types as needed.



.. index:: opcode

Validation of Opcode Sequences
Expand Down Expand Up @@ -354,15 +374,51 @@ Other instructions are checked in a similar manner.
push_vals(label_types(ctrls[n]))
push_val(Ref(rt.heap, false))

case (call_ref)
let rt = pop_ref()
if (rt.heap =/= Bot)
error_if(not is_def(rt.heap))
let ft = funcs[rt.heap.idx]
pop_vals(ft.params)
push_vals(ft.results)
case (br_on_cast n rt1 rt2)
validate_ref_type(rt1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it ok that the new helper functions here, validate_ref_type, and is_func/struct etc, are never defined above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added them to the Data Structures section.

validate_ref_type(rt2)
pop_val(rt1)
push_val(rt2)
pop_vals(label_types(ctrls[n]))
push_vals(label_types(ctrls[n]))
pop_val(rt2)
push_val(diff_ref_type(rt2, rt1))

case (return)
pop_vals(return_types)
unreachable()

case (call_ref x)
let t = expand_def(types[x])
error_if(not is_func(t))
pop_vals(t.params)
pop_val(Ref(Def(types[x])))
push_vals(t.results)

case (return_call_ref x)
let t = expand_def(types[x])
error_if(not is_func(t))
pop_vals(t.params)
pop_val(Ref(Def(types[x])))
error_if(t.results.len() =/= return_types.len())
push_vals(t.results)
pop_vals(return_types)
unreachable()

case (struct.new x)
let t = expand_def(types[x])
error_if(not is_struct(t))
for (ti in reverse(t.fields))
pop_val(unpack_field(ti))
push_val(Ref(Def(types[x])))

case (struct.set x n)
let t = expand_def(types[x])
error_if(not is_struct(t) || n >= t.fields.len())
pop_val(Ref(Def(types[x])))
pop_val(unpack_field(st.fields[n]))

.. note::
It is an invariant under the current WebAssembly instruction set that an operand of :code:`Unknown` type is never duplicated on the stack.
It is an invariant under the current WebAssembly instruction set that an operand of :code:`Bot` type is never duplicated on the stack.
This would change if the language were extended with stack instructions like :code:`dup`.
Under such an extension, the above algorithm would need to be refined by replacing the :code:`Unknown` type with proper *type variables* to ensure that all uses are consistent.
Under such an extension, the above algorithm would need to be refined by replacing the :code:`Bot` type with proper *type variables* to ensure that all uses are consistent.
Loading