Skip to content

Conversation

jpnurmi
Copy link
Collaborator

@jpnurmi jpnurmi commented Sep 29, 2025

When chaining signal handlers in AOT mode, detect whether the .NET runtime converts a signal to a managed exception and transfers execution to the managed exception handler. In this case, Sentry Native should abort crash handling because the exception is caught and handled in managed code.

try
{
    var s = default(string);
    var c = s.Length;
}
catch (NullReferenceException exception)
{
    // the exception is caught and handled in managed code. in AOT mode, execution
    // should continue normally without sentry-native's crash handling kicking in 
}

See also:


Note

Detect .NET runtime converting signals to managed exceptions and skip native crash handling; add JIT/AOT tests and changelog entry.

  • Inproc backend (Linux):
    • Add get_stack_pointer and get_instruction_pointer for multiple architectures to read from ucontext_t.
    • When CHAIN_AT_START, compare IP/SP before/after invoking prior handler; if changed, treat as managed exception and abort native handling.
  • Tests:
    • Refactor JIT runners (run_jit_*) and add AOT runners (run_aot_*), including AOT publish and execution.
    • Update fixture Program.cs to use null-forgiving s! and conditionally rethrow via args ("managed-exception").
    • Add separate test_aot_signals_inproc and rename JIT test; adjust skip reasons/messages.
  • Changelog:
    • Add Unreleased note: fix AOT interop with managed .NET runtimes.

Written by Cursor Bugbot for commit 060b18b. This will update automatically on new commits. Configure here.

@jpnurmi jpnurmi marked this pull request as ready for review September 29, 2025 12:10
cursor[bot]

This comment was marked as outdated.

@jpnurmi jpnurmi requested a review from supervacuus September 29, 2025 13:53
Copy link
Collaborator

@supervacuus supervacuus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding the diff to make the other implementations work @jpnurmi. Since this deviates significantly, we should document the change as clearly as possible (for our internal use) to ensure understanding that we have shifted the focus to the current AOT signal/exception interface or adapt the tests to cover the relevant area.

Console.WriteLine("dereference another NULL object from managed code");
var s = default(string);
var c = s.Length;
if (args is ["managed-exception"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this if necessary? I am not trying to be a stickler, but we should minimize the test code. If I see an if for the "managed-exception" inside the catch, I would assume that this code would be reached if we pass any other argument at the command line, which shouldn't be the case, right?

Copy link
Collaborator Author

@jpnurmi jpnurmi Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the test re-triggers a NullReferenceException from within the managed exception handler and leaks it into native code:

catch (NullReferenceException exception)
{
Console.WriteLine("dereference another NULL object from managed code");
var s = default(string);
var c = s.Length;
}

I put it behind the "managed-exception" argument to be able to test the scenario where a managed exception is handled without leaking it into native code, to let execution continue normally.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this highlights a similar divergence to the one I previously raised with the test assertion. Because the initial reason for doing this was to have two managed exceptions, where one is caught in managed code and the other isn't, both would end up in our signal handler.

Neither should create a native event, at least that was my assumption, since they both have nothing to do with an actual native crash, because you would get a stack trace of the runtime (or typically much less than that because whichever stackwalker is in effect will not have sufficient information to walk the runtime's stack).

In the AOT case, though, and apparently Mono too, since CLR AOT seems to be based on Mono AOT, you not only observe two SIGSEGVs (as in the CLR JIT), but also a final SIGABRT coming from the unhandled exception. This SIGABRT from my pov shouldn't trigger a native event either (for the same reason as above).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: I think it is acceptable for Mono to raise a SIGABRT at the end, but I wouldn't bypass this with an if statement inside the catch.

It is okay, because Sentry does install a top-level handler for .NET anyway, right? So the chance of triggering the SIGABRT handler of the Native SDK is relatively low. And if it triggers anyway, then that might highlight another issue.

But instead of preventing it from happening, I would let the C# code run as initially intended and then show inside the test that the serialized envelope exists in the AOT/Mono case where two managed exceptions are raised, but that it is a SIGABRT and not the SIGSEGV that triggered the managed code exception. This way, you don't hide that behavior and explicitly show the difference between the two implementations in the test assertions.

Does that make sense?

I think developing a heuristic for ignoring that particular SIGABRT is rather an unnecessary investment at this point (except if you already know that it will be a problem downstream).

Co-authored-by: Mischan Toosarani-Hausberger <[email protected]>
@jpnurmi jpnurmi changed the title fix: interop with managed .NET runtimes fix: AOT interop with managed .NET runtimes Sep 30, 2025
cursor[bot]

This comment was marked as outdated.

Comment on lines +475 to +477
SENTRY_WARN("get_stack_pointer is not implemented for this architecture. "
"Signal chaining may not work as expected.");
return NULL;
Copy link
Collaborator Author

@jpnurmi jpnurmi Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is #error or a runtime warning a better choice here? i ended up with a warning because i thought it would be annoying to potentially break builds for irrelevant platforms that Sentry .NET doesn't even support

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine either way, but I would also err on giving a runtime response rather than failing at build time, as the entire execution path is optional.

Copy link
Collaborator

@supervacuus supervacuus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can isolate the SIGABRT in case of an unhandled exception on AOT/Mono as well.

Comment on lines +475 to +477
SENTRY_WARN("get_stack_pointer is not implemented for this architecture. "
"Signal chaining may not work as expected.");
return NULL;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine either way, but I would also err on giving a runtime response rather than failing at build time, as the entire execution path is optional.

Console.WriteLine("dereference another NULL object from managed code");
var s = default(string);
var c = s.Length;
if (args is ["managed-exception"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this highlights a similar divergence to the one I previously raised with the test assertion. Because the initial reason for doing this was to have two managed exceptions, where one is caught in managed code and the other isn't, both would end up in our signal handler.

Neither should create a native event, at least that was my assumption, since they both have nothing to do with an actual native crash, because you would get a stack trace of the runtime (or typically much less than that because whichever stackwalker is in effect will not have sufficient information to walk the runtime's stack).

In the AOT case, though, and apparently Mono too, since CLR AOT seems to be based on Mono AOT, you not only observe two SIGSEGVs (as in the CLR JIT), but also a final SIGABRT coming from the unhandled exception. This SIGABRT from my pov shouldn't trigger a native event either (for the same reason as above).

SENTRY_DEBUG("runtime converted the signal to a managed "
"exception, we do not handle the signal");
return;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is absolutely correct, but the only side-effect currently visible is for the logging toggle. Similarly, to how we "leave" the signal handler before chaining, we must also re-enable logging immediately after "leaving" and disable it again before re-entering, because if it were a managed code exception, we want logging to remain enabled.

We can also move the entire sig_slot assignment down below the CHAIN_AT_START code, to make the path dependencies more obvious.

However, I think both have a lower priority than figuring out the signaling sequence of both runtimes and how we can align them.

@jpnurmi
Copy link
Collaborator Author

jpnurmi commented Oct 1, 2025

I'm trying to fix the scenario where Mono's signal handler detects a managed exception, it modifies the context to transfer execution to a managed exception handler, and then execution returns to Sentry Native's signal handler. In this case, Sentry Native needs to detect that Mono wanted execution to continue, and abort crash handling.

In case of a real native crash, though, if we invoke Mono's signal handler first and Mono's native crash handling decides to call _exit(), then Sentry Native misses the crash. 🙁

@supervacuus
Copy link
Collaborator

supervacuus commented Oct 1, 2025

I'm trying to fix the scenario where Mono's signal handler detects a managed exception, it modifies the context to transfer execution to a managed exception handler, and then execution returns to Sentry Native's signal handler. In this case, Sentry Native needs to detect that Mono wanted execution to continue, and abort crash handling.

Isn't that what you're trying to do here all along? Or is there yet another difference when you use pure Mono?

In case of a real native crash, though, if we invoke Mono's signal handler first and Mono's native crash handling decides to call _exit(), then Sentry Native misses the crash. 🙁

Were you able to observe this? Because this only happens when crash_chaining is disabled. I cannot imagine that crash or signal chaining is off by default (especially not on Android or Linux).

@supervacuus
Copy link
Collaborator

Isn't that what you're trying to do here all along? Or is there yet another difference when you use pure Mono?

Btw, if it is the latter, then this is also the reason why I suggested that CLR JIT support can be dropped altogether. When I started this project (which was over a year ago), the primary goal was to determine how much the handler interaction between the various runtime implementations converges. I started with CLR JIT as a baseline. However, if we primarily have downstream usage for another implementation that diverges entirely in signal handling, then we can either drop the current implementation or add another handler strategy.

@jpnurmi
Copy link
Collaborator Author

jpnurmi commented Oct 1, 2025

Were you able to observe this? Because this only happens when crash_chaining is disabled. I cannot imagine that crash or signal chaining is off by default (especially not on Android or Linux).

I tried creating a test case using mcs + mono --aot on Linux. Mono's native crash reporter kicks in when we call Mono's signal handler for a native crash, and execution ends there...

=================================================================
        Native Crash Reporting
=================================================================
Got a SIGSEGV while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries
used by your application.
=================================================================

=================================================================
        Native stacktrace:
=================================================================
        0x62c7d67295fa - mono :
        0x62c7d66c7e8a - mono :
        0x62c7d671cad0 - mono :
        0x728ba68491a6 - /tmp/pytest-of-jpnurmi/pytest-55/cmake0/libcrash.so : native_crash
        0x40961618 - Unknown

=================================================================
        Native Crash Reporting
=================================================================
Got a SIGSEGV while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries
used by your application.
=================================================================

=================================================================
        Native stacktrace:
=================================================================
        0x62c7d67295fa - mono :
        0x62c7d66c7e8a - mono :
        0x62c7d671cad0 - mono :
        0x728ba68491a6 - /tmp/pytest-of-jpnurmi/pytest-55/cmake0/libcrash.so : native_crash
        0x40961618 - Unknown

=================================================================
        Telemetry Dumper:
=================================================================
Pkilling 0x125944111036096x from 0x125944120812288x
Entering thread summarizer pause from 0x125944120812288x
Finished thread summarizer pause from 0x125944120812288x.
Failed to create breadcrumb file (null)/crash_hash_0x3652010b5

Waiting for dumping threads to resume

=================================================================
        Basic Fault Address Reporting
=================================================================
Memory around native instruction pointer (0x728ba68491a6):0x728ba6849196  ff ff ff f3 0f 1e fa 55 48 89 e5 b8 0a 00 00 00  .......UH.......
0x728ba68491a6  c7 00 64 00 00 00 90 5d c3 f3 0f 1e fa 55 48 89  ..d....].....UH.
0x728ba68491b6  e5 48 83 ec 30 64 48 8b 04 25 28 00 00 00 48 89  .H..0dH..%(...H.
0x728ba68491c6  45 f8 31 c0 48 c7 45 d8 00 40 00 00 48 8b 45 d8  [email protected].

=================================================================
        Managed Stacktrace:
=================================================================
          at <unknown> <0xffffffff>
          at dotnet_signal.Program:native_crash <0x000a7>
          at dotnet_signal.Program:Main <0x000e8>
          at <Module>:runtime_invoke_void_object <0x00091>
=================================================================

@jpnurmi
Copy link
Collaborator Author

jpnurmi commented Oct 1, 2025

Were you able to observe this? Because this only happens when crash_chaining is disabled. I cannot imagine that crash or signal chaining is off by default (especially not on Android or Linux).

I tried creating a test case using mcs + mono --aot on Linux. Mono's native crash reporter kicks in when we call Mono's signal handler for a native crash, and execution ends there...

No wait, it's the newly added IP/SP check that prevents native crash handling, too. 🤦 How the heck do we distinguish between these.......

@supervacuus
Copy link
Collaborator

No wait, it's the newly added IP/SP check that prevents native crash handling, too. 🤦 How the heck do we distinguish between these.......

I was wary of checking ucontext modifications along the signal chain. I didn't have the time to review the implementation, but I can.

@supervacuus
Copy link
Collaborator

supervacuus commented Oct 1, 2025

No wait, it's the newly added IP/SP check that prevents native crash handling, too. 🤦 How the heck do we distinguish between these.......

Try, as a first step, to switch the order of the handler chain for Mono (and drop your current IP/SP check or even the CHAIN_AT_FIRST strategy entirely). The way it seems to be operating makes more sense if their handlers get installed last. In "old" Mono, there were managed-language side functions that could (un)install signal handlers at specific points (which could be controlled from the sentry-dotnet around the native SDK initialization) to control the chain being:

DFL <- Native SDK <- mono handler

rather than what we have now:

DFL <- mono handler <- Native SDK

Not sure if they are still exposed in the dotnet/runtime mono fork, but we can certainly try to achieve something similar. Then we would have their handler first and might not need an alternative strategy inside our handler; maybe not even for CLR (but one step at a time).

@jpnurmi jpnurmi marked this pull request as draft October 1, 2025 12:36
@jpnurmi
Copy link
Collaborator Author

jpnurmi commented Oct 2, 2025

Swapping the order of the signal handlers would work. I was able to confirm the theory on Linux, even though I had to patch Mono to either

to make it possible to swap the order in either managed or native code, respectively. However, that's just Linux, which is not relevant for Sentry .NET on Android or iOS. The problem is that there's no such type as Mono.Runtime on either Android or iOS... 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants