-
-
Notifications
You must be signed in to change notification settings - Fork 192
fix: AOT interop with managed .NET runtimes #1392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for finding the diff to make the other implementations work @jpnurmi. Since this deviates significantly, we should document the change as clearly as possible (for our internal use) to ensure understanding that we have shifted the focus to the current AOT signal/exception interface or adapt the tests to cover the relevant area.
Console.WriteLine("dereference another NULL object from managed code"); | ||
var s = default(string); | ||
var c = s.Length; | ||
if (args is ["managed-exception"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this if
necessary? I am not trying to be a stickler, but we should minimize the test code. If I see an if
for the "managed-exception"
inside the catch
, I would assume that this code would be reached if we pass any other argument at the command line, which shouldn't be the case, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the test re-triggers a NullReferenceException
from within the managed exception handler and leaks it into native code:
sentry-native/tests/fixtures/dotnet_signal/Program.cs
Lines 56 to 61 in 695f4a4
catch (NullReferenceException exception) | |
{ | |
Console.WriteLine("dereference another NULL object from managed code"); | |
var s = default(string); | |
var c = s.Length; | |
} |
I put it behind the "managed-exception"
argument to be able to test the scenario where a managed exception is handled without leaking it into native code, to let execution continue normally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, this highlights a similar divergence to the one I previously raised with the test assertion. Because the initial reason for doing this was to have two managed exceptions, where one is caught in managed code and the other isn't, both would end up in our signal handler.
Neither should create a native event, at least that was my assumption, since they both have nothing to do with an actual native crash, because you would get a stack trace of the runtime (or typically much less than that because whichever stackwalker is in effect will not have sufficient information to walk the runtime's stack).
In the AOT case, though, and apparently Mono too, since CLR AOT seems to be based on Mono AOT, you not only observe two SIGSEGV
s (as in the CLR JIT), but also a final SIGABRT
coming from the unhandled exception. This SIGABRT
from my pov shouldn't trigger a native event either (for the same reason as above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify: I think it is acceptable for Mono to raise a SIGABRT
at the end, but I wouldn't bypass this with an if
statement inside the catch
.
It is okay, because Sentry does install a top-level handler for .NET anyway, right? So the chance of triggering the SIGABRT
handler of the Native SDK is relatively low. And if it triggers anyway, then that might highlight another issue.
But instead of preventing it from happening, I would let the C# code run as initially intended and then show inside the test that the serialized envelope exists in the AOT/Mono case where two managed exceptions are raised, but that it is a SIGABRT and not the SIGSEGV that triggered the managed code exception. This way, you don't hide that behavior and explicitly show the difference between the two implementations in the test assertions.
Does that make sense?
I think developing a heuristic for ignoring that particular SIGABRT
is rather an unnecessary investment at this point (except if you already know that it will be a problem downstream).
Co-authored-by: Mischan Toosarani-Hausberger <[email protected]>
SENTRY_WARN("get_stack_pointer is not implemented for this architecture. " | ||
"Signal chaining may not work as expected."); | ||
return NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is #error
or a runtime warning a better choice here? i ended up with a warning because i thought it would be annoying to potentially break builds for irrelevant platforms that Sentry .NET doesn't even support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fine either way, but I would also err on giving a runtime response rather than failing at build time, as the entire execution path is optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can isolate the SIGABRT
in case of an unhandled exception on AOT/Mono as well.
SENTRY_WARN("get_stack_pointer is not implemented for this architecture. " | ||
"Signal chaining may not work as expected."); | ||
return NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fine either way, but I would also err on giving a runtime response rather than failing at build time, as the entire execution path is optional.
Console.WriteLine("dereference another NULL object from managed code"); | ||
var s = default(string); | ||
var c = s.Length; | ||
if (args is ["managed-exception"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, this highlights a similar divergence to the one I previously raised with the test assertion. Because the initial reason for doing this was to have two managed exceptions, where one is caught in managed code and the other isn't, both would end up in our signal handler.
Neither should create a native event, at least that was my assumption, since they both have nothing to do with an actual native crash, because you would get a stack trace of the runtime (or typically much less than that because whichever stackwalker is in effect will not have sufficient information to walk the runtime's stack).
In the AOT case, though, and apparently Mono too, since CLR AOT seems to be based on Mono AOT, you not only observe two SIGSEGV
s (as in the CLR JIT), but also a final SIGABRT
coming from the unhandled exception. This SIGABRT
from my pov shouldn't trigger a native event either (for the same reason as above).
SENTRY_DEBUG("runtime converted the signal to a managed " | ||
"exception, we do not handle the signal"); | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is absolutely correct, but the only side-effect currently visible is for the logging toggle. Similarly, to how we "leave" the signal handler before chaining, we must also re-enable logging immediately after "leaving" and disable it again before re-entering, because if it were a managed code exception, we want logging to remain enabled.
We can also move the entire sig_slot
assignment down below the CHAIN_AT_START
code, to make the path dependencies more obvious.
However, I think both have a lower priority than figuring out the signaling sequence of both runtimes and how we can align them.
I'm trying to fix the scenario where Mono's signal handler detects a managed exception, it modifies the context to transfer execution to a managed exception handler, and then execution returns to Sentry Native's signal handler. In this case, Sentry Native needs to detect that Mono wanted execution to continue, and abort crash handling. In case of a real native crash, though, if we invoke Mono's signal handler first and Mono's native crash handling decides to call |
Isn't that what you're trying to do here all along? Or is there yet another difference when you use pure Mono?
Were you able to observe this? Because this only happens when |
Btw, if it is the latter, then this is also the reason why I suggested that CLR JIT support can be dropped altogether. When I started this project (which was over a year ago), the primary goal was to determine how much the handler interaction between the various runtime implementations converges. I started with CLR JIT as a baseline. However, if we primarily have downstream usage for another implementation that diverges entirely in signal handling, then we can either drop the current implementation or add another handler strategy. |
I tried creating a test case using
|
No wait, it's the newly added IP/SP check that prevents native crash handling, too. 🤦 How the heck do we distinguish between these....... |
I was wary of checking ucontext modifications along the signal chain. I didn't have the time to review the implementation, but I can. |
Try, as a first step, to switch the order of the handler chain for Mono (and drop your current IP/SP check or even the CHAIN_AT_FIRST strategy entirely). The way it seems to be operating makes more sense if their handlers get installed last. In "old" Mono, there were managed-language side functions that could (un)install signal handlers at specific points (which could be controlled from the sentry-dotnet around the native SDK initialization) to control the chain being:
rather than what we have now:
Not sure if they are still exposed in the dotnet/runtime mono fork, but we can certainly try to achieve something similar. Then we would have their handler first and might not need an alternative strategy inside our handler; maybe not even for CLR (but one step at a time). |
Swapping the order of the signal handlers would work. I was able to confirm the theory on Linux, even though I had to patch Mono to either
to make it possible to swap the order in either managed or native code, respectively. However, that's just Linux, which is not relevant for Sentry .NET on Android or iOS. The problem is that there's no such type as |
When chaining signal handlers in AOT mode, detect whether the .NET runtime converts a signal to a managed exception and transfers execution to the managed exception handler. In this case, Sentry Native should abort crash handling because the exception is caught and handled in managed code.
See also:
Note
Detect .NET runtime converting signals to managed exceptions and skip native crash handling; add JIT/AOT tests and changelog entry.
get_stack_pointer
andget_instruction_pointer
for multiple architectures to read fromucontext_t
.CHAIN_AT_START
, compare IP/SP before/after invoking prior handler; if changed, treat as managed exception and abort native handling.run_jit_*
) and add AOT runners (run_aot_*
), including AOT publish and execution.Program.cs
to use null-forgivings!
and conditionally rethrow viaargs
("managed-exception"
).test_aot_signals_inproc
and rename JIT test; adjust skip reasons/messages.Written by Cursor Bugbot for commit 060b18b. This will update automatically on new commits. Configure here.