Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding Exceptional CFGs #185

Open
RossTate opened this issue Oct 19, 2021 · 9 comments
Open

Encoding Exceptional CFGs #185

RossTate opened this issue Oct 19, 2021 · 9 comments

Comments

@RossTate
Copy link
Contributor

WebAssembly does not directly support standard low-level representations of control flow, like control-flow graphs, but as part of its design and release there was both a high-level algorithm for and an integrated implementation of CFG-to-wasm conversion. From WebAssembly/design#796 I get the impression that there are many WebAssembly users that heavily rely on this conversion. This proposal similarly does not support standard low-level representations of exceptions, such as LLVM's per-call-site unwind destinations or JVM's exception tables. Is there a high-level algorithm for converting (one of) these standard representations into this proposal? I would have expected LLVM's unwind destinations to be supported, but this C++ example Conrad presented a while ago seems to produce invalid wasm because LLVM's optimizer makes the IR stray from the specific control-flow pattern the current implementation of CFGStackify seems to expect. Is this an oversight, or is it instead expected that some people heavily relying on CFG-to-wasm conversion should not use exceptions?

@aheejin
Copy link
Member

aheejin commented Oct 19, 2021

I am aware of that issue @conrad-watt mentioned, and we haven't fixed it yet, but we have a open bug filed for it and it's on our list of things to fix. I think it can be fixed by either cloning involved BBs or placing a br_table on top of a try block.

@RossTate
Copy link
Contributor Author

I know you are aware of the specific example above, but that example is illustrative of a larger issue, and the fact that the bug has been open for half a year makes me concerned that the implications of the larger issue are not being recognized.

Let's consider an EH representations like exception tables. With an exception table, every entry specifies a range of code and the relevant handler to be used if an exception occurs within that range (and often a condition, e.g. an exception type, or a classification, e.g. "cleanup" or "catch"). These closely correspond to how exceptions are often implemented (i.e. the runtime checks whether the return address is within a range specified in the exception table). LLVM lowers to exception tables, and its own representation is essentially a "floating" take on exception tables, meaning instructions are free to move around in order to make code transformations easier.

An important aspect of exception tables is that the exception handling code can itself be included in the ranges covered by the exception table. This is important for handling nested exception handling, like in Java or C++.
But the exception table does not necessarily need to have a nested structure. The JVM even explicitly allows recursive exception tables (where exceptions in the exception handling code is handled by that same exception handling code), even though it's not required by Java, because it's easy to implement and can be helpful for supporting various features.

We saw one such feature when the Go team presented defer, which allows Go programmers to dynamically create deconstructor-like function calls that always gets executed regardless of how the function exits (e.g. "panic" or return). The most general way to implement this functionality is to have a loop that executes the remaining deferred function calls (in reverse order from when they were deferred, i.e. FIFO). The challenge is that if one of these deferred function calls panics (i.e. throws an exception), then the remaining deferred function calls are still supposed to be executed, though using this more recent panic. After all the deferred function calls, the most recent panic gets propagated up (i.e. rethrown). I wrote up this example to illustrate the semantics.

This behavior is easy to represent with exception tables. You simply have the last (i.e. lowest priority) entry in the table indicate that the entire function body should use the loop described above as its exception-cleanup code. In particular, even the loop itself is included in that range, which means that exceptions thrown by any deferred function call will simply jump to the start of the loop, which will get you the desired behavior (provided you pop a deferred function off the deferred-function stack before calling it).

I cannot figure out how to encode this example cleanup feature using catch_all/rethrow, despite the fact that its expressible in common EH representations, which makes me skeptical that there's a general-purpose algorithm for converting from exception tables or unwind destinations into this proposal. If you have suggestions, I'd be interested to hear them. But, given that this example uses only semi-structured control flow, your suggestions of br_table or duplicating code seem unlikely to help.

@conrad-watt
Copy link
Contributor

conrad-watt commented Oct 21, 2021

This seems too strong an objection. From a distance, I'd expect that the FixIrreducibleControlFlow LLVM pass could be extended to transform exceptional edges of the CFG just as it already transforms non-exceptional ones. This also seems to be what @aheejin is suggesting (and she has far more experience with this pass than I do). This approach would potentially incur additional indirections, as is already the case with non-exceptional irreducible control flow.

An important aspect of exception tables is that the exception handling code can itself be included in the ranges covered by the exception table. This is important for handling nested exception handling, like in Java or C++.
But the exception table does not necessarily need to have a nested structure.

My understanding (and @aheejin please feel free to correct me) is that the LLVM exception-handling approach used when targeting Wasm does guarantee a nested structure (see here and here), and that Wasm isn't the only target which requires/enforces this structure (see here).

@RossTate
Copy link
Contributor Author

RossTate commented Oct 21, 2021

Ah, I had missed the last paragraph of LLVM's funclets. That's in line with a common complaint language implementers express about LLVM's exception-handling design: it's overspecialized to C++. This same complaint has been made (by myself and others) about this proposal.

So how would you support Go's defer semantics? It would be trivial to express in a simpler design, but it's unclear how to support it with the current proposal.

@tlively
Copy link
Member

tlively commented Oct 21, 2021

I expect that in the context of running the defer functions, Go would want to conservatively catch any exceptions and move them into userspace so it could implement its exceptional control flow using non-exceptional mechanisms. When it has finished running the defer functions, it would re-throw whatever the final exception was. Although this is not a precise description of a transformation, I don't think the problem as described sufficiently motivates spending more time to figure out the details. Is there reason to believe this general strategy wouldn't work?

@RossTate
Copy link
Contributor Author

Try to do that in a way that accommodates the same use cases as the proposal provides for C++:

  • the unwinding code (i.e. the defer functions) gets called for JavaScript exceptions
  • uncaught exceptions/panics get debugging support, e.g. stack traces

I can't get the current proposal to support either of these due to the combination of the restrictions on rethrow and the inability for a catch_all to handle the exceptions within its body. That is, the issue is that the proposal makes it impossible for a re-throw to recreate the functionality of a rethrow.

@RossTate
Copy link
Contributor Author

Here's why I can't get the current proposal to support this:

Suppose the main body of the function throws a JS exception or panics. In this case, you went the deferred functions to run, so you at least need to put the main body inside a try/catch_all. If none of the deferred functions recover the panic, then you need to rethrow the exception, so the entire loop for the deferred functions needs to be inside the body the of the catch_all. But if one of those deferred functions throws a JS exception or panics, then you need the rest of the deferred functions to run, so you need to at least put the call to the deferred function inside a try/catch_all, and then the loop for the remaining deferred functions needs to be inside the catch_all so that you can rethrow that exception if none of the remaining deferred functions. But then if a remaining deferred function throws....

So with the current proposal, you seem to need an infinite function body to support this. This is even the case if you forgo handling JS exceptions and just handle panics but still want debugging support for uncaught panics.

@tlively
Copy link
Member

tlively commented Oct 22, 2021

Oh yeah, good point that unknown exceptions can’t be moved into user space because catch_all doesn’t give you a user space value. I guess exnref would have solved this problem, but that ship has sailed. This seems like a good candidate problem to solve in a follow-on proposal.

@RossTate
Copy link
Contributor Author

Unfortunately exnref had its own problems. catch_all is actually the easier problem to deal with. You just do what the JVM and single-phase C++ implementations do: convert the foreign exception into a Java/C++ exception at the boundary (often by just wrapping it). The real problem is that re-throw is weaker than rethrow with respect to debugging support, even for exceptions you know the tag of. Any variation that makes it possible to explicitly construct and propagate debug information solves this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants