Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot properly unwind stack as a second phase #122

Closed
RossTate opened this issue Jun 23, 2020 · 12 comments
Closed

Cannot properly unwind stack as a second phase #122

RossTate opened this issue Jun 23, 2020 · 12 comments

Comments

@RossTate
Copy link
Contributor

First, before going any further, I want to clarify that this issue is not about extending the current proposal with two-phase exception handling; rather, the issue is about the potential for such an extension.

Motivation for Two-Phase Exception Handling

Exception handling is often implemented in two phases:

  1. Inspect the stack to figure out what to do with the exception.
  2. If appropriate, unwind a portion of the stack that has been discarded (i.e. that control has left).

Many languages are designed to support both single-phase stack unwinding (in which the stack is unwound while inspecting the stack) and two-phase stack unwinding. But many tools and languages specifically require two-phase stack unwinding, such as interactive debuggers wanting the stack to be in tact when intercepting an uncaught exception, and such as C# when clauses requiring (stateful) user-specified exception-filtering code to be run before unwinding code like C++ destructors and C# finally blocks is executed. Furthermore, due to WebAssembly's general-purpose goal, many languages will likely need at least custom filtering/delegating wasm code to be run just to find out if a given handler is actually intended for a given exception. Thus it seems likely that WebAssembly will eventually need to support two-phase exception handling.

Designing for Two-Phase Exception Handling

In general, this first-phase code will need to run arbitrary instructions and inspect the exception at hand in order to decide to do one of the following:

  1. Continue the search for a suitable handler.
  2. Resume the exception without unwinding the stack. That is, provide the thrower of the exception with information it needs to continue executing.
  3. Delegate control to an appropriate handler (e.g. the label for one of the many surface-level catch clauses associated with a given surface-level try block) after unwinding the stack.

Note in particular that the second phase—stack unwinding—only happens in this third case and it has a destination: where on the stack to unwinding up to and where in the code to delegate control to after stack unwinding is finished.

Incompatibility with Two-Phase Exception Handling

Now consider the current proposal and what it would require to extend it with a design for two-phase exception handling. The first phase would have to skip the current catch blocks in order to avoid executing unwinding code associated with them. That means that the second phase, if it happens, would need to somehow execute this unwinding code. The issue is that the unwinding code in catch blocks is not clearly delimited, so there's no easy way for a second phase to know when unwinding associated with a particular catch is done and it can move on to unwinding further up the stack (or to transferring control to the target destination). One might think that the end of the catch block is clearly such a static delimiter, but realize that it's perfectly valid for a catch block to simply br to some label expecting an exnref. In reality, the delimiter is determined dynamically when the exnref that was caught by the catch is passed to rethrow. That's why I'm being careful to use the phrase "unwinding code associated with" rather than "unwinding code within".

Due to this dynamic determination of the end of unwinding code via rethrow, the second phase's only option for interoperating with the current catch blocks seems to be to give a catch that needs to be unwound a special exnref that specifies how to continue the second phase once the exnref is rethrown.

One issue this raises is that during the first phase one would like to keep track of the unwinders on the stack that were encountered while searching for a suitable handler in order to quickly iterate through them in the second phase, but because an exnref can escape the scope of the catch block this list of unwinders can be invalidated, meaning one has to search for the next unwinder each time the special exnref is rethrown.

Another issue is that the special exnref needs to store the information for continuing the second phase once it's rethrown. In particular, it needs to store the destination of the second phase, say as a combination of a stack-frame pointer (how far up the stack to unwind to) and a code pointer (the address of the code of the determined handler). But this special exnref is a first-class value. It can be passed to and rethrown from another thread, or it can be passed to and rethrown from another stack that may or may not be (temporarily) fused with the one the exnref points into. How do we specify the semantics of these unintended interactions of features? Worse yet, the exnref can outlive the validity of the stack frame it points to. Even worse, that stack frame can be replaced with a new stack frame so that it appears valid, but that new stack frame might not be associated with the code pointer in the exnref, making it unsound to redirect to.

Conclusion

Based on my analysis above, the current design seems to make any extension of itself to two-phase exception handling at the least less efficient, likely excessively complicated (at least to formalize and reason about), and possibly unsound or intractable. The primary cause of these problems seems to be the dynamic, rather than static, delimiting of unwinding code, and in particular dynamically delimiting via a value that can escape the stack being unwound or can outlive the validity of the unwinding destination.

@aheejin
Copy link
Member

aheejin commented Jun 23, 2020

You said the current design is not compatible with the second phase of the unwinding, which I don't fully understand. I think it can be more of a problem for the first phase, because it has to search catches, and with control flow escaping, it is hard to determine what the next catch in the EH stack is. (This is my hunch, but I want to confirm this with VM people.) The second phase for us is just running the code as is. So while I now better understand some of your argument that control flow escaping may not be compatible with two-phase unwinding, I think it's more of a problem for the first phase, and not second phase.

In general, this first-phase code will need to run arbitrary instructions and inspect the exception at hand in order to decide to do one of the following:

  1. Continue the search for a suitable handler.
  2. Resume the exception without unwinding the stack. That is, provide the thrower of the exception with information it needs to continue executing.
  3. Delegate control to an appropriate handler (e.g. the label for one of the many surface-level catch clauses associated with a given surface-level try block) after unwinding the stack.

Not sure what 2 means. Are you referring to resumable EH? I'd like to avoid mixing discussions for resumable EH with two-phase unwinding.

@RossTate
Copy link
Contributor Author

I think it can be more of a problem for the first phase, because it has to search catches, and with control flow escaping, it is hard to determine what the next catch in the EH stack is. (This is my hunch, but I want to confirm this with VM people.)

Huh, so this is a second issue for the first phase that I hadn't though of. The issue I had thought of was that the first phase needs to not execute unwinding code associated with catch blocks. (By the way, I realized I should clarify that by "unwinding code" I mean specifically destructors and finally clauses. Not sure if that was clear before.) The issue you've identified is another on top of that one.

Due to either issue, I had figured any language needing filtered (or resumable) exceptions would use a different event type or throw instruction that would skip over catch blocks in the first phase. That then leaves the problem of executing the unwinding code for those catch blocks in the second phase.

Are you referring to resumable EH? I'd like to avoid mixing discussions for resumable EH with two-phase unwinding.

I am, but resumable EH is best implemented with two-phase EH (though there's a common misconception that it requires continuations). This blog post is itself not very informative, but it does link to a bunch of useful discussions on the topic (including addressing the misconception regarding continuations). The following is an excerpt about implementing resumable EH with two-phase EH:

the difference between resumable exceptions and regular exceptions only requires a very minor tweak in implementation, even in languages such as C++ and Java: handle the exception as a new activation at the top of the stack, rather than unwinding first.

Even without adding resumable exceptions to C++, the feature can be useful for implementing C++. In particular, throw; can be conceptually divided into two steps: std::current_exception and std::rethrow_exception. The second step essentially translates to throw __cpp_exception in the current proposal. For the first step, I believe you're essentially maintaining a shadow stack of the exceptions currently being caught, and so std::current_exception either returns the top of this stack or traps if the stack is empty. (At least, this seemed to be what the wasm implementation of the libc++abi was doing or planning on doing.) Alternatively, std::current_exception could be implemented by throwing a resumable (non-C++-level) exception, and compilations of C++-level catch clauses could handle this exception by resuming it with the payload of the __cpp_excpetion they've caught. If std::current_exception were executed outside of any C++-level catch clause, then it would trap. So, just like exceptions can be implemented using a shadow stack of handlers, the current wasm libc++abi seems to be essentially implementing a special resumable exception for std::current_exception via a shadow stack.

That all said, if you would prefer to put resumable exceptions aside for the sake of focusing the discussion, that's fine with me. I just first want it be clear that they are in the scope of two-phase EH, and even within the scope of implementing C++ exception handling (among other languages).

@aheejin
Copy link
Member

aheejin commented Jun 24, 2020

OK, apparently the problem I thought I realized was different from the problem you have been talking about, which I still don't understand.

And it has been hard to tell which situations you are referring to. Are you referring to the actions of hypothetical follow-on proposal only? Or are you talking about the ABI break, or module compatibility between modules compiled with current EH and the hypothetical follow-on EH proposals?

I think it can be more of a problem for the first phase, because it has to search catches, and with control flow escaping, it is hard to determine what the next catch in the EH stack is. (This is my hunch, but I want to confirm this with VM people.)

Huh, so this is a second issue for the first phase that I hadn't though of. The issue I had thought of was that the first phase needs to not execute unwinding code associated with catch blocks. (By the way, I realized I should clarify that by "unwinding code" I mean specifically destructors and finally clauses. Not sure if that was clear before.) The issue you've identified is another on top of that one.

If the first phase does not execute any code in catch blocks, it wouldn't execute any branches out of it, so I'm not sure why it's a problem, but anyway, the problem I thought about is, there is no clear "EH pad stack" in the VM side anymore because the code can escape a catch block and somewhere else.

Also, I still don't understand why you think it's a problem for the second phase. You continuously mentioned continuations in #118, which was confusing. I asked you many times why we need it, but I only got two answers: "@rossberg said we need it (only he didn't; he said it for resumable EH and not two-phase EH)" and "Then how are you gonna make this run without continuations?" As a person who don't even understand why continuations are necessary as you suggest in the first place, I don't know how to answer the second question myself. (I asked other few people who joined discussions in #118, but they don't seem to understand it either.) Are you saying we need continuations when we run two modules, one of which is compiled with the current EH proposal and the other is compiled with the hypothetical follow-on EH proposal, together? I think module compatibility is a completely separate issue which we should talk about separately, but I'm not sure if you are talking about this or not.

Due to either issue, I had figured any language needing filtered (or resumable) exceptions would use a different event type or throw instruction that would skip over catch blocks in the first phase. That then leaves the problem of executing the unwinding code for those catch blocks in the second phase.

Not sure what you mean. Shouldn't we run unwinding code in the second phase? Did you mean first phase?

Even without adding resumable exceptions to C++, the feature can be useful for implementing C++. In particular, throw; can be conceptually divided into two steps: std::current_exception and std::rethrow_exception. The second step essentially translates to throw __cpp_exception in the current proposal. For the first step, I believe you're essentially maintaining a shadow stack of the exceptions currently being caught, and so std::current_exception either returns the top of this stack or traps if the stack is empty. (At least, this seemed to be what the wasm implementation of the libc++abi was doing or planning on doing.) Alternatively, std::current_exception could be implemented by throwing a resumable (non-C++-level) exception, and compilations of C++-level catch clauses could handle this exception by resuming it with the payload of the __cpp_excpetion they've caught. If std::current_exception were executed outside of any C++-level catch clause, then it would trap. So, just like exceptions can be implemented using a shadow stack of handlers, the current wasm libc++abi seems to be essentially implementing a special resumable exception for std::current_exception via a shadow stack.

I'm not sure what you mean in this paragraph. In case you think we did some own implementation to support these features, wasm EH library does not even modify libc++ part, which contains all those functions you mentioned. All modifications to libraries are in libc++abi, and those modifications are also mostly contained in the personality function. We also have our own libunwind, which contains EH instructions to trigger VM to unwinder the stack. (And we don't have much more "plan" to add something much more to our current libc++abi at this point. In case you are referring to the plan related to rethrow in reply to your mail, as you pointed out somewhere else, C++'s rethrow throws out stack trace, so I guess we don't need that. Of course, libc++abi itself manages its own exception stack, but we haven't and don't plan to change that part.)

@dschuff
Copy link
Member

dschuff commented Jun 24, 2020

(Sorry this is a lot of text. In fact it's really 2 posts because I initially wrote the first section to help me understand and clarify, and then I realized what I think is actually our more fundamental mismatch in assumptions. So if this is tl:dr, then at least read the last part)

What should happen when MVP exceptions mix with future 2-phase exceptions?

I wanted to try to clarify what I think the central issue here is (this post is already a big improvement in that direction compared to #118 but I think we can start even narrower).

This basically mirrors what I said in #118 but I think the claim here is not that a 2-phase exnref-using system can't ever exist, but that if we were to add 2-phase unwinding we wouldn't be (reasonably) able to make old code (built with MVP exceptions) work as intended when a 2-phase exception is thrown while old code is on the stack.

To clarify the goal here, what do we want to happen in this case?

Suppose we did have a system with an MVP that had a statically-delimited catch/unwind block (no exnref) and then added 2-phase via a filter clause associated with each unwind block (and suppose you can rethrow from inside the unwind block, leaving aside for now the possibility of splitting handling from unwind code).
Can you clarify what the expected behavior should be when a 2-phase exception is thrown?

Here's my guess: Ross indicated in your most recent comment that we would skip these unwind blocks during the first phase, and then execute them as usual during the second phase. If the exception is unhandled it, gets rethrown before the unwind block is exited and unwinding continues. We expect the exception to be unhandled because it is foreign to the MVP code. But the MVP currently allows for foreign exceptions to be handled too. So if the exception is handled we fall (or branch) out of the catch block, and at this point we know unwinding is completely done. In either case we know whether this frame's unwind code is finished or not based on our static scope, although the exact point at which unwinding continues or ends is dynamic, because a rethrow or a branch could be conditional. We also always know statically where unwinding will continue to, because its the scope that surrounds the catch block (or the callsite of the current function)

Edge cases

(Here I'm just exploring other consequences of this idea; ignore this paragraph if you disagree with the above). In this case we don't know during the first phase whether MVP frames will eventually handle this exception or not. So it could happen that phase 1 finds a handler further up the stack and begins the unwind phase, but the exception is instead handled by the MVP frame. This seems probably ok to me. It could also happen that no 2-phase-aware handler is found and we invoke the debugger or whatever after the first phase (instead of running the second phase), but the exception would have been handled by an MVP frame. This seems like it could be undesirable, but not the worst thing in the world.

The problem with exnref escape

(Here I'm attempting to restate what I think is Ross's characterization of the problem).
Now suppose we do have exnref. You could still skip the MVP catch blocks in the first phase, and now you execute them in the second phase. You still know when unwinding must continue (when the exnref is rethrown) but if the exnref escapes the function you never know for sure that unwinding will stop/has stopped until the function returns. You also can't tell statically from a rethrow location where unwinding will continue to; i.e. which catch block in the function (or to the caller). I think that's the fundamental issue. As Ross said above the exnref itself could carry that information (you do know where unwind will continue to when the exnref is created based on the catch block). The main problem that sticks out to me is the problem of the exnref escaping the stack entirely e.g. to another thread, or how it might interact with another stack-switching mechanism.

The real mismatch in our assumptions

Edit: I wrote all of the above which helped me clarify, but I think I figured out what the fundamental disconnect is:
According to (what I think is) Ross's characterization of the problem the unwind-continuation destination is defined by the scope at the time of exnref creation (i.e. by its catch block). Even if the exnref escapes, if it is ever rethrown then unwinding continues in the catch block that encloses its block of creation. This would make it essentially a first-class continuation.
As Heejin said several times in #118 we have not been thinking of an exnref as any kind of continuation but instead just a package containing a value. This means that when it gets rethrown, unwinding continues based on the scope enclosing the rethrow, not the original throw. (The explainer hints at this, although doesn't say so explicitly: "A rethrow has the same effect as a throw other than an exception is not created"). In other words, the control flow is just the same as if there were a fresh exception.

In that case, it doesn't matter what happens when the exception escapes, other than the fact that it keeps a reference to the thrown object. This behavior is perfectly sufficient for solving the unwind-mismatch issue (since that problem is localized to a particular try block) but might be surprising. In fact we have been imagining that it might be possible to preserve the original JS/wasm stack trace on a rethrow, but I think that if an exnref does escape its function of creation then that becomes complicated/impossible without making the exnref keep much more information than just the thrown object).

Anyway I think we should be clear on what our assumptions are about rethrowing before we think about what extending an exnref-based proposal would mean.

@RossTate
Copy link
Contributor Author

RossTate commented Jun 25, 2020

Thanks, @dschuff, for the thorough digest! Given the fundamental limitations of the format, it's hard to say for certain that we have the same understanding of the issues, but my sense is that we're roughly on the same page except for the details of rethrowing/escaping, but I think that's your point: we should get a mutual understanding of that pinned down.

As Heejin said several times in #118 we have not been thinking of an exnref as any kind of continuation but instead just a package containing a value.

Yes, I understand that that is the intent of exnref. And for a while I thought that that would be workable with 2-phase, even if not what I find to be in an ideal manner, which is why I had let my disagreement with that aspect of the design go, but then recently I realized the issue with destination and escaping.

This means that when it gets rethrown, unwinding continues based on the scope enclosing the rethrow, not the original throw.

Yes, and this causes the problem: how does unwinding know where to continue to (both up to where on the stack, and where in the code to transfer control to when it's done) in the new scope, and how does it know that that destination is even still valid in the new scope?

Sorry if you already understood this; I wasn't entirely sure what parts were questions and what parts were statements.

@dschuff
Copy link
Member

dschuff commented Jun 25, 2020

This means that when it gets rethrown, unwinding continues based on the scope enclosing the rethrow, not the original throw.

Yes, and this causes the problem: how does unwinding know where to continue to (both up to where on the stack, and where in the code to transfer control to when it's done) in the new scope, and how does it know that that destination is even still valid in the new scope?

The point is that the destination is determined by the new scope. It's exactly as if you threw a fresh exception.

@RossTate
Copy link
Contributor Author

But how does it know where to stop?

The problem would be much easier to illustrate with a whiteboard, but I'll try with text another way.

In the first phase, the stack is walked to look for potential handlers. When a potential handler is found, its code is run on the bottom (leaf?) of the stack being walked. That way it can call whatever functions it needs to figure out which of the actual handlers it has is a fit, if any. If it does, then the relevant code pointer and stack pointer are saved in registers and the unwinding process begins. This process pops off frames until it either reaches the saved stack pointer or it encounters an unwinder. Without exnref, the stack and code pointer are saved on the stack and the unwinder is called. When the unwinder finishes, its frame is popped off and the stack and code pointer are popped off the stack into registers and the unwinding process continues.

The important thing to note is that the destination stack and code pointer are saved throughout the entire unwinding process (as are all the values to forward to the given code pointer, but I'm ignoring those for now for simplicity). Critical to this was treating each unwinder on the stack like a function, saving these values on the stack and then popping them off when the unwinder function completes. But unwinding with exnref doesn't work like that; it's a direct transfer of control, which means we can't save these values on the stack. It's impossible to accomplish the goal (i.e. get the destination determined by the filtering code) without this information, so it has to go somewhere. And the only place it can go is in the exnref.

Does that better illustrate the problem? I'm up for meeting over Zoom, where I imagine a (digital) whiteboard would be very helpful.

@aheejin
Copy link
Member

aheejin commented Jun 25, 2020

The important thing to note is that the destination stack and code pointer are saved throughout the entire unwinding process (as are all the values to forward to the given code pointer, but I'm ignoring those for now for simplicity). Critical to this was treating each unwinder on the stack like a function, saving these values on the stack and then popping them off when the unwinder function completes. But unwinding with exnref doesn't work like that; it's a direct transfer of control, which means we can't save these values on the stack.

What are "these values"? Not exactly sure what you mean by "it is critical that they are treated like a function" either. Those frames that contain unwinder code are already on the stack at the point when unwinding starts. We pop those frame off while we unwind, but don't understand why we should create a new stack frame for each unwinder, if that's what you mean. exnref-based EH also pops stack frame as unwinding goes.

It's impossible to accomplish the goal (i.e. get the destination determined by the filtering code) without this information, so it has to go somewhere. And the only place it can go is in the exnref.

The VM can store information on the destination catch. While in the process of unwinding, if the control flow is transferred to that catch, the VM can know that this is the destination. No?

Let me ask you another question: why can the first phase find the destination while the second phase can't? What's the ability the first phase possesses that the second phase doesn't? (Disclaimer: I actually think the first phase can't; I explained this in #122 (comment))


And could you answer my questions in #122 (comment)?

And it has been hard to tell which situations you are referring to. Are you referring to the actions of hypothetical follow-on proposal only? Or are you talking about the ABI break, or module compatibility between modules compiled with current EH and the hypothetical follow-on EH proposals?

@dschuff also wrote a long paragraph about interaction between MVP-compiled module and new-proposal-compiled module, but we are not even sure about whether you are talking about the hypothetical two-phase proposal only or the possible compatibility issue between when modules compiled with first and follow-on proposals run together.

@aheejin
Copy link
Member

aheejin commented Jun 25, 2020

I'm ok with meeting over Zoom. When are you available? I'm basically OK all day today going forward and anytime tomorrow except for 1-2pm PDT.

@dschuff
Copy link
Member

dschuff commented Jun 26, 2020

But how does it know where to stop?

Here you are getting a bit ahead of what I was trying to say before: I was just trying to say that as currently specified, exnref carries no continuation information or pointers into the stack, etc. So escape doesn't matter.
And it doesn't have to know where to stop because there's only one phase.

But adding 2-phase brings up a problem. And I think you, Heejin, and I are mostly just talking about the same problem or class of problems (with a focus on different aspects and using different words).

I would frame the problem as being a problem of mismatch between the unwinding paths taken by the 2 phases; i.e. which catch/unwind blocks (or associated filter blocks) are entered as unwinding continues. I think we all agree that for 2-phase unwinding to work, the paths must match.
In your formulation, you are assuming that we are making them match, and further assuming that the path taken by the first phase is the canonical path. This means that from a given filter block, phase-1 unwinding continues to the scope immediately surrounding the block (which is the same as the one that surrounds the catch/unwind block); and during the second phase we would maintain that path by making rethrow jump directly to that block regardless of where it was rethrown from (i.e. use different behavior from the current proposal). This becomes problematic in several ways when the exnref can escape the scope where it was created.

In Heejin's formulation, the path taken by the second phase is the "canonical" path. And that path is determined not by the scope surrounding the catch block, but dynamically based on the location of the rethrow. In this formulation, the problem is to make the path taken by the first phase match that canonical path, but there's no way to do that statically. That's why she's characterized the problem as being a problem with the first phase (whereas you are talking about the problem being associated with the second phase).
But IMO it's just 2 aspects of the same problem. And I think we are maybe starting to come around to the idea that it might be desirable to think about a system where the catch block can have a statically-declared label where control is transferred outside the scope of the associated try (potentially bypassing 1 or more enclosing try blocks) but still within a declared enclosing block, which would mean that the unwind path is determined statically, and the same path can be used for filters and catch/unwind blocks. (This is what Heejin sketched briefly in #118).

@RossTate
Copy link
Contributor Author

Ah, I think I see what you're saying and how it suggests a convergence of ideas from two different directions. That perspective will help with tomorrow's meeting. Thanks!

@aheejin
Copy link
Member

aheejin commented Oct 12, 2020

Many of concerns here were focused on exnref, which was removed in Sep 15 CG meeting. Closing.

@aheejin aheejin closed this as completed Oct 12, 2020
ioannad pushed a commit to ioannad/exception-handling that referenced this issue Feb 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants