-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does the combo of subtyping and unreachable code require actual type inference? #516
Comments
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
I don't think this really gets to the crux of the issue, but I'm curious about how keeping the stack like you describe can ever work. For example, this is perfectly valid:
Wouldn't keeping the stack around produce an error about the |
I guess the difference is that So I guess the question is: is truncation necessary for generators? I think this is equivalent to asking whether we can add an |
First of all, the fact that the example is rejected has nothing to do with unreachable code or "stack truncation". The following sequence is invalid regardless of what's before it:
The reason is that after the
Note how the stack result is determined (and fixed) by the type of $𝑙. (A more permissive rule would be possible, but would be more complicated yet only rarely more useful in practice. Edit: well, possible except for the loss of principal types.) As for truncating the stack, that is "mandated" by the spec in so far as the declarative type of
There are infinitely many ways to instantiate the 𝑡₁* and 𝑡₂* in this rule, but an algorithm always needs to make the most general choice, otherwise it wouldn't be complete. And the most general algorithmic solution is to pop everything currently the stack because after that the algorithm can successfully pop any type it wants, especially bottom, which is at least as permissive as anything that could have been on the stack before. A simple example demonstrating this would be
An algorithm that does not "truncate" the stack would incorrectly reject this. As an aside, the algorithm from the Appendix does not just truncate the stack, it replaces it with the polymorphic stack. So a more faithful way to annotate the example above e.g. would be
where "..." denotes the polymorphic bottom of the stack, from which arbitrary types can be drawn. An alternative is to annotate the actual instruction types:
Here, the output of each type must match up with the input of the next. For br, 𝑡₁* was instantiated with i32 and 𝑡₂* with bot. Any reference type would also work instead of bot, but that would require guessing ahead. So, to summarise the answers to your questions:
And to the question in the issue's subject: no, forward checking is enough. The principal types property of the Wasm type system (esp the "Closed Principal Forward Types" theorem) ensures that. Finally, Firefox and wasm-smith seem to have bugs in their validators, but from your example alone it is unclear whether they are related to stack polymorphism. |
Thanks for the explanation, and pointing out the root issue with the In
instead of
in some places where we shouldn't have. That caused us to consider the following valid:
but then adding an Glad that things have been cleared up! FWIW, Firefox seems to have a similar bug. Filed https://bugzilla.mozilla.org/show_bug.cgi?id=1879096 for them. Thanks again! |
Can I motivate you to add a test for this to the Wasm test suite? :) |
Writing one already :) |
Addresses issues discussed in WebAssembly/gc#516
…ches When, for example, `[a i32]` is on the stack and we have a `br_if @label` where `@label`'s type is `[b]` and `a <: b`, then whole `br_if` is typed `[b i32] -> [b]`. Because `a <: b` the `br_if` is valid. However, `a <: b` does *not* mean that the instruction results in `[a]`. It "erases" the `a` and pushes a `b`. This addresses WebAssembly/gc#516 in `wasm-smith`.
Add a test for #516 to the test suite
Just a note that |
|
|
I've been asm.js'd! |
* wasmparser: Fix behavior of branching to labels and subtyping Addresses issues discussed in WebAssembly/gc#516 * wasm-smith: Properly erase subtype information after conditional branches When, for example, `[a i32]` is on the stack and we have a `br_if @label` where `@label`'s type is `[b]` and `a <: b`, then whole `br_if` is typed `[b i32] -> [b]`. Because `a <: b` the `br_if` is valid. However, `a <: b` does *not* mean that the instruction results in `[a]`. It "erases" the `a` and pushes a `b`. This addresses WebAssembly/gc#516 in `wasm-smith`.
I added equivalent tests for all of the above instructions. |
First of all, thanks for adding all the test cases! I am somewhat concerned that the current spec can result in some module binary size bloat due to the loss of precision.
This would still be invalid.
And if we had more than 2 types on the branch target, then we might need to I don't think I understand the concept of principal types and what the concrete restrictions are here but why can't
Here
and another condition was added that each type in |
See #201 for previous discussion of this general issue, and specifically #201 (comment) for the reason we kept the status quo of not refining the output types. (In short, you're right that we allow unconstrained type variables in the typing of instructions, but so far we don't allow typing to involve any kind of constraints on type variables.) Thankfully it would be a backward-compatible change to refine the output types for all these instructions, so it's still something we could decide to do in the future. |
No cast should be necessary. The obvious fix would be storing the operands in temp locals:
Another possibility is to introduce an intermediate block with the right type:
Both alternatives would probably result in almost the same machine code in an optimising jit. That said, the ability to pass additional operands to a br_on_null is a bit of niche feature already that we just added because it was "free". I am not aware that it is used heavily in the wild, let alone with the need to have heterogeneous output edges (I perhaps lack imagination for how a producer would make use of that systematically and what for). Of course, it is always possible to construct examples that would get slightly larger, but is it likely it'll produce any significant code size problem in practice? |
The instructions that would have the biggest impact in practice if we improved the typing are |
WDYT about rephrasing the rule slightly to make it more obvious what happens? Maybe:
IIUC that's the existing behavior, just written in a way that's perhaps less likely to be misunderstood. The fact that multiple implementations got this wrong seems to indicate that there is room for improvement regarding human understandability of these rules. The current way it's spelled out requires the reader to (1) remember that implicit upcasting applies to the input types, and (2) carefully read the second line and take into account that it changes what "happens" on the first line, as opposed to glossing over it thinking "yeah, sure, the label type must match, obviously". Phrased differently, the current spelling |
Yes, something like that would be possible. When done consistently, this change is similar to the standard transformation from a “declarative” to an “algorithmic” formulation (in the context of a type system with subsumption). However, this is not the only rule where such an observation applies. To be consistent, we’d need to similarly inline subsumption in all other relevant rules — which in fact would be every single instruction typing rule in Wasm, because every input type but bottom can be subtyped further. Even in this very rule it would already be arbitrary to stop short of also introducing ℎ𝑡₁ and ℎ𝑡₂ with another subtype constraint between those. Of course, the big disadvantage of a coherent reformulation like that is that it creates a lot of redundancy and noise in the rules, which is why the preferred formulation of such a system commonly is to factor all that into one single rule for subsumption. (There are other technical advantages of the declarative formulation, but I don’t want to bore the audience with those. For example, the notion of principal type cannot even be defined meaningfully in such a set-up.) Fortunately, the spec already has a (somewhat more implementation-oriented) algorithmic presentation of validation, in the Appendix. I encourage all readers of the spec to look there as well. Having both these presentations separates concerns, and doesn’t require compromising or complicating either. Trying to mix aspects of both in a single formulation o.t.o.h. would lose the respective advantages of both. Beyond that, I think non-normative notes are the preferable way to emphasise potentially non-obvious implications of specific rules or definitions, like is already done in various other places. |
@rossberg I am not sure if adding a generic note to the Appendix will make it less likely for implementers to run into this issue.
Edit: OK, I think I got it now, with the 4th line item it means that |
@Liedtke, I indeed had in mind a note (or several ones) in the main body of the spec. |
This fixes the behavior to be aligned with the spec. Detailed discussion on this: WebAssembly/gc#516 Fixed: v8:14643 Change-Id: Icaebf96ef3710bb5aeba19192e8245287f5fcb49 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5339891 Commit-Queue: Matthias Liedtke <[email protected]> Reviewed-by: Manos Koukoutos <[email protected]> Cr-Commit-Position: refs/heads/main@{#92658}
It turns out Binaryen was also incorrect in all of these cases except for |
@rossberg, it turns out that fixing this in Binaryen changes the calculation of precise types from a linear pass to an arbitrarily expensive fixed point computation because the types of exampleTo find the smallest fixed point, we need to compute the types using the old behavior that refines beyond the smallest fixed point, then iteratively relax the types until we reach a fixed point. Consider this program: (type $A (sub (struct)))
(type $B (sub $A (struct)))
(type $C (sub $B (struct)))
(func (result anyref)
(block $a (result anyref) ;; refinable to (ref $A)
(br $a (struct.new $A))
(br_if $a
(block $b (result anyref) ;; refinable to (ref $A)
(br $b (struct.new $B))
(br_if $a
(block $c (result anyref) ;; refinable to (ref $A)
(br $c (struct.new $C))
(br_if $b
(struct.new $C)
(i32.const 0)
)
)
(i32.const 0)
)
)
(i32.const 0)
)
)
) In the initial pass we undershoot the fixed point and calculate these types, which would be the final results if the typing rules preserved type information:
Then we feed these types back to the
Iterating again will compute type This is a much bigger cost than I had anticipated. Can you go into more detail about the reasoning behind the current rules? You wrote above that they are necessary to preserve principal types, but my understanding is that we could both have principal types and preserve type information in these instructions if we relaxed our principal types rule to allow bounds on type variables. Is that correct? If so, is there a good reason we might not want to do that? If there isn't a good reason to keep the current rules that outweighs the extra complexity they bring, I suggest we ask the CG for approval to switch to the type-preserving rules for all of these instructions. |
@tlively, what is your definition of "precise types"? If you mean reconstructing the most permissive type annotations for a given "erased" program, then I am pretty sure that in a language with subtyping and loops like Wasm, any complete algorithm will involve a fixpoint iteration (either on the program itself or on a set of constraints it generates), regardless of the details of the rules. It may just be a coincidence that Binaryen was able to get away without that so far? How does it handle loop parameters? And of course, it is to be expected that type reconstruction is significantly harder than type checking. Also, I believe the rules for block or br_if never changed. IIUC, you are asking whether we could change them to something else? But that's more or less a zero-sum game. If we were to optimise the rules for the sake of one specific algorithm, then we'd almost inevitably make them harder for some other. In particular, there always is a trade-off between ease for producers and ease for consumers. Principal types make checking (much) more efficient, so they are beneficial for (a general class of) consumers, at the cost of some possible extra work for producers, which is in line with Wasm's general approach. |
True, but I want something weaker than that. I just want our typing rules to not unnecessarily lose type information. For example, I want the type of
We don't support loop parameters at all because they are not easily representable in our AST IR. If we add support for them, we will require a fixed point computation at that point, but it would still be good to avoid needing a fixed point computation for
That's one way to look at it, yes. Another way to look at it is that when we introduced subtyping, we had two choices about how to handle in the typing of
That's fine in general, but in this case switching would be better for both existing producers and existing consumers.
Yes, I agree, and I would like to keep principal types. Can you address my questions about this? I wrote,
|
Wouldn't that lead to the problems I originally described in OP? Because doing that for branching instructions makes the OP example valid, and therefore leads to the situation I tried to describe here:
Perhaps this is exactly the kind of problem that you are running into? |
For your implementation as described in the OP, yes, I can see that this would cause problems. But I think you could resolve them by using the stack-truncating algorithm as described in the appendix. It might also be necessary to update the algorithm from the appendix to say that it should always pop bottom types (e.g. |
Ah! This is a good insight, thanks for sharing. Popping bottom types from unreachable operand stacks should indeed solve this issue. |
In the Wizard code validator, (almost) every pop from the value stack has an expected type[1]. The control stack stores a top limit, and if popping would cross the top limit, then it checks if the current control stack entry is in the unreachable state, and then allows it (popping a value of the expected type if so). This replaced a bunch of logic that previously did different things if the current control entry was reachable or not. Popping the control stack then resets reachability to what it was in the previous control scope. [1] A couple exceptions are the numeric select instruction and |
It turns out that the validation algorithm in the appendix already pops the real bottom type block (result anyref)
block (result funcref)
unreachable
i32.const 0
br_if 0 ;; now [bot i32] -> [bot], previously [bot i32] -> [funcref]
i32.const 0
br_if 1 ;; now [bot i32] -> [bot], previously invalid because [funcref] </: [anyref]
unreachable
end
unreachable
end |
I've created a PR that shows the concrete modifications we would like to make to the spec: #532. I recommend that we continue the discussion there, since the original question here has been resolved. |
[Sorry for the delay, I’m at a conference right now, and no time to reply earlier.]
Good question. Unfortunately, that wouldn’t quite match the definition of principal type. And indeed it would make the notion somewhat useless for the original reason we cared about it, as a criterion for type annotations. That’s because subtyping constraints are very (too) powerful – you can encode random stuff in them. For example, with a <: b ∧ b <: a you can effectively express equivalence, meaning that it is a free pass to move almost all structural information "to the side”, thereby justifying dropping most type annotations. Instead of “every legal type is a (supertype of a) substitution instance of the principal type” you’ll have “every legal type is a solution to a set of inequations”, which is a rather big qualitative difference and does not constrain much at all anymore. Equality vs inequality constraints also make a huge difference algorithmically, because subtype constraints tend to accumulate. Closely related to the discussion, this not-so-fine line is what made “full” type inference for functional languages practical, while it explodes for languages with subtyping. And the seemingly innocent relaxation we're discussing, which essentially expands inference to subtyping, would be crossing that exact line.
Okay, but then, the aforementioned issues aside, aren't you basically suggesting to tailor the Wasm specification for one algorithm that isn’t even correct/complete for Wasm as it stands, and by means of a change that will be moot once it is made so? Would you really argue that is sufficient motivation? I’d be very hesitant to make such a change at the last minute, with such unclear repercussions to the fabric of the type system. |
I updated Wizard's code validator to comply with the new test that was added. It was doing the wrong thing according to the spec, preserving types on the stack after However, I do get @tlively 's point better now. The obvious implementation in a abstract-interpretation-based validator (which all production algorithms are, for efficiency) is to just leave types on the stack; that's most efficient and most precise. It concerns me a bit that we're losing type information for a type-system consistency property like principal types. It also seems to me that no abstract-interpretation-based approach is going to be solving a constraint system; that only seems to come up with bottom types due to unreachable. In practice what they are going to do is try to pop label args, and if that fails because it goes into a polymorphic part of the stack, they'll default to the label's arg types and continue forward with those. It's a little concerning that we're hitting issues with this now; the spec tests have really poor coverage and yet implementations are relying on them to know if they've gotten things right. We need more effort here to test more cases, maybe even automatically generating many possible combinations or even fuzzing. Implementations will continue to get subtle details like this wrong and rely on tests to find issues. |
If I’m not mistaken, the form of subtyping constraints I have in mind where upper bounds must be closed, i.e. cannot themselves contain type variables, is much less powerful. The system of inequalities it allows you to form is just an independent collection of closed upper bounds for each type variable, so it is trivially solved with GLB calculations. And we already depend on those for typing
Yes, exactly :)
The biggest problem is that AFAIK no implementation contributes their own tests back upstream as spec tests, so for many of us the spec tests become merely a box to check to finish a proposal. We will be able to start contributing some of Binaryen’s tests upstream soon. |
@tlively I had a separate offline conversation with Nick and the need for a combined regression suite seems to be universally felt. What I've done in Wizard has often been to slice and dice spec tests into smaller units and write my own |
We are already doing the same for select, for example, in order to avoid the need for lub's. Type systems are always compromises between precision and other factors like simplicity.
Well, no, abstract interpretation is using a fixpoint iteration (because loops, joins, and such). But that's what @tlively tries to avoid – which is exactly what would get us a constraint system instead (solving which, btw, will also require a fixpoint iteration, over the constraints).
That wouldn't work, unfortunately. Since the same type can flow through multiple such instructions in an instruction sequence, you'd need to be able to accumulate multiple constraints on the same variable – or alternatively, allow constraints relating variables to other variables. Because instruction types have both co- and contra-variance, I'm also 99% sure you'd need both upper and lower bounds. I don't think you can get away with closed bounds either, because of composition (afaict, closed bounds are never enough for such systems). Somebody would have to work through the proofs to figure all this out, and that's no longer trivial. The fact that all this is non-obvious is alone reason to be uncomfortable. The current rules are at least conservative. |
@rossberg, have you had a chance to take a look at the PR I uploaded making this concrete? It would be helpful if you could explain the problems you see with the proposed change in particular, rather than with having type bounds generally.
Do you agree that accumulating multiple constraints would be fine if they were all closed upper bounds?
The only sources of nontrivial bounds we need are branch target types for the conditional branch instructions and local types for |
Actually I don't think this is true for Wasm loops. Loops have type annotations, so the only part of the stack that can be polymorphic is due to unreachable code before the loop leaving (potentially constrained) type variables on the stack. But looping back to the start of the loop can only happen within the loop, and the polymorphic part of the stack before the loop cannot be modified (or even accessed--it's a validation error to do that); there are no additional constraints to merge back in when coming back. |
The proposed change really has nothing to do with fixed point computations at all, and I regret ever bringing them up. The goal is to eliminate unnecessary casts, not to eliminate the need for fixed point computations. The fixed point algorithm I previously described was a way to eliminate some of the same unnecessary casts. |
I looked at the PR, but I'm not sure that helps. The difficulty is the principal type property, which isn't gonna work as currently spelled in the PR.
Composition meaning sequencing. Every type annotation will also contribute constraints. For example, when the result of a br_if flows into a select, you get a new constraint on the prior variable from select's annotation. In some cases these constraints are only partial, i.e., have different structure. For example, when the result of br_if flows into extern.convert_any, the combined constraint is gonna be something like α <: (ref β any), where β abstracts nullability. Similarly for other cases where other variables come into play. This is not closed. (And it probably implies the need for bounds on other variable classes as well.)
Well, we arrived at the current semantics after extensive and controversial discussion, which already considered all this at length. Usually we avoid revisiting such discussions unless there are significant new findings – double so this late in the game. It doesn't sound like you think there are?
The annotations on loops are of course just as imprecise as the ones we are discussing. Any analysis that wants to reconstruct "precise" types, as with abstract interpretation, would of course not use them. Unless they do not care about more precise types than whatever the type system specifies, but then the point of discussion is moot as well. |
Consider the following example. I've annotated it with the operand stack in between each opcode twice:
wasm-smith
's "actual" stack.The difference is that the appendix algorithm truncates the stack after unreachable code, letting subsequent pops materialize their expected types, while
wasm-smith
maintains the "actual" stack going into the unreachable code.AFAICT, the formal typing rules don't actually mandate the appendix's operand stack truncation. There has never been divergence between the two approaches until now, with the introduction of subtyping.
In the following example, the
br 0
makes the subsequent code unreachable. The appendix will additionally truncate the stack. Next, thebr_on_null
is typed[anyref anyref] -> [anyref (ref any)]
due to the type of its label. Withwasm-smith
's actual stack, this type checks fine becausestructref <: anyref
combined with the rules for subtyping between instructions. For the appendix algorithm, we have an empty stack in unreachable code, so when thebr_on_null
wants to pop theanyref
that the label demands, it does, and then when it pushes it back onto the stack, it pushes ananyref
. Not astructref
. This is the source of the divergence between the two approaches. Finally, thebr_on_cast_fail
observes the "actual"structref
, triggering atype mismatch: expected structref, found anyref
error in the appendix algorithm.The reference interpreter, and
wasmparser
, report the type mismatch error and consider the example invalid. But despite that, it isn't clear to me that it really is invalid.Firefox and
wasm-smith
consider the example valid.If this example is valid, then it seems like popping from an empty operand stack in unreachable code would have to, in this case, produce a value that is constrained to be some subtype of
anyref
but not necessarily exactlyanyref
and then the subsequentbr_on_cast_fail
would add a second constraint that the value is additionally a subtype ofstructref
, and then implementations would need to solve these constraints to determine validity. But that kind of constraint solving and "real" type inference is something that we intentionally avoid requiring in Wasm validation. So if this example is currently valid, it seems like we want to do something to avoid the need for this kind of constraint solving.Concrete Questions
Is this statement true or false?
Is the example valid or invalid?
Thanks for your time!
The text was updated successfully, but these errors were encountered: