Skip to content

Conversation

@noahsmartin
Copy link
Contributor

@noahsmartin noahsmartin commented Nov 25, 2025

Fixes #4040

The problem was that for some events such as app hangs and CPUDiagnostics the metric kit payload contains an entire flamegraph. This change parses the flame graph into samples, and then picks the longest duration sample as the one we use for a stack trace. It also fixes over-counting the CPUDiagnostics which was emitting more than one event per exception. This is also fixed by identifying the longest duration sample.

Eventually we should support showing the full flame graph on the frontend, but with this PR the bug is fixed and we can consider that a future improvement.

I tested this on test flight by triggering crashes and verifying they are still correct, as well as manually replaying hang reports received in production

@noahsmartin noahsmartin added the ready-to-merge Use this label to trigger all PR workflows label Nov 25, 2025
@codecov
Copy link

codecov bot commented Nov 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.017%. Comparing base (8c1cafc) to head (db09d78).
✅ All tests successful. No failed tests found.

Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##              main     #6908       +/-   ##
=============================================
- Coverage   85.057%   85.017%   -0.040%     
=============================================
  Files          454       455        +1     
  Lines        27739     27692       -47     
  Branches     12162     12157        -5     
=============================================
- Hits         23594     23543       -51     
- Misses        4100      4105        +5     
+ Partials        45        44        -1     
Files with missing lines Coverage Δ
Sources/Sentry/SentryFormatterSwift.m 100.000% <100.000%> (ø)
Sources/Sentry/SentryMetricKitIntegration.m 100.000% <100.000%> (ø)
...s/Swift/Core/MetricKit/SentryMXCallStackTree.swift 100.000% <100.000%> (+25.000%) ⬆️
Sources/Swift/Core/MetricKit/SentryMXManager.swift 100.000% <100.000%> (ø)

... and 4 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8c1cafc...db09d78. Read the comment docs.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 25, 2025

Performance metrics 🚀

  Plain With Sentry Diff
Startup time 1229.79 ms 1264.65 ms 34.86 ms
Size 24.14 KiB 1.03 MiB 1.00 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
762a701 1225.95 ms 1244.49 ms 18.54 ms
61414e8 1225.49 ms 1254.28 ms 28.79 ms
588dd7c 1235.11 ms 1241.76 ms 6.65 ms
f5666e7 1227.08 ms 1260.18 ms 33.10 ms
b9aacb6 1230.42 ms 1251.00 ms 20.58 ms
16f6edc 1234.02 ms 1269.67 ms 35.65 ms
b709887 1193.52 ms 1216.74 ms 23.22 ms
5fc3364 1222.36 ms 1252.33 ms 29.96 ms
1fe932f 1231.92 ms 1253.44 ms 21.52 ms
80538ca 1216.70 ms 1253.92 ms 37.22 ms

App size

Revision Plain With Sentry Diff
762a701 23.75 KiB 1022.78 KiB 999.03 KiB
61414e8 23.75 KiB 867.69 KiB 843.94 KiB
588dd7c 23.75 KiB 938.33 KiB 914.58 KiB
f5666e7 23.75 KiB 963.18 KiB 939.43 KiB
b9aacb6 23.75 KiB 913.64 KiB 889.89 KiB
16f6edc 23.74 KiB 1022.94 KiB 999.19 KiB
b709887 23.75 KiB 1.01 MiB 1016.14 KiB
5fc3364 23.75 KiB 1.00 MiB 1006.00 KiB
1fe932f 23.75 KiB 913.63 KiB 889.88 KiB
80538ca 23.75 KiB 989.99 KiB 966.24 KiB

Previous results on branch: mxRewriting

Startup times

Revision Plain With Sentry Diff
bf3e245 1222.17 ms 1257.09 ms 34.92 ms
44da6c5 1234.22 ms 1266.02 ms 31.80 ms
c5b3f7a 1220.18 ms 1259.44 ms 39.25 ms
8824c94 1226.98 ms 1262.65 ms 35.67 ms

App size

Revision Plain With Sentry Diff
bf3e245 24.14 KiB 1.03 MiB 1.00 MiB
44da6c5 24.14 KiB 1.03 MiB 1.00 MiB
c5b3f7a 24.14 KiB 1.01 MiB 1013.45 KiB
8824c94 24.14 KiB 1.01 MiB 1014.88 KiB

@noahsmartin noahsmartin force-pushed the mxRewriting branch 2 times, most recently from d978079 to 169cf94 Compare December 1, 2025 20:59
@noahsmartin noahsmartin force-pushed the mxRewriting branch 2 times, most recently from 432f4ad to a7949ab Compare December 10, 2025 22:32
@noahsmartin noahsmartin marked this pull request as ready for review December 11, 2025 00:31
[self captureMXEvent:callStackTree
params:params
diagnosticJSON:[diagnostic JSONRepresentation]];
[self captureMXEvent:callStackTree params:params diagnosticJSON:[NSData data]];

This comment was marked as outdated.

// that just contains the most common callstack.
func sentryMXBacktrace(inAppLogic: SentryInAppLogic?, handled: Bool) -> [SentryThread] {
callStacks.map { callStack in
let thread = SentryThread(threadId: 0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: All threads incorrectly assigned thread ID 0

In sentryMXBacktrace(), all threads are assigned threadId: 0 regardless of their position in the callStacks array. The old Objective-C code used an incrementing counter to assign unique thread IDs (@(i)). When callStackPerThread is true (e.g., for crash diagnostics), multiple call stacks represent different threads and need distinct IDs. Using callStacks.enumerated().map would provide the index needed for unique thread IDs.

Fix in Cursor Fix in Web

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h: Yes that's valid feedback. I think the backend and frontend might get confused when having multiple threads with the threadId 0. If this is intendended, we should add a comment explaining why we have 0 for all threads.

Copy link
Member

@philipphofmann philipphofmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @noahsmartin, I added a few comments.

Comment on lines +264 to +266
// let firstEvent = try XCTUnwrap(invocations.first).event
// let secondEvent = try XCTUnwrap(invocations.element(at: 1)).event
// let thirdEvent = try XCTUnwrap(invocations.element(at: 2)).event
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m: There is still some test code commented out. Please either fixe the test or remove them if they aren't relevant anymore.

Comment on lines +45 to +47
// it represents data that is sampled across many threads and aggregated, we do not
// know which samples came from which thread. Instead we create just one fake thread
// that just contains the most common callstack.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m: Maybe worth adding info that we plan to change this by adding a flamegraph to app hang events. If we don't have GH issue, I think we could create one and link to the Android implementation and/or the frontend work that makes this possible.

// that just contains the most common callstack.
func sentryMXBacktrace(inAppLogic: SentryInAppLogic?, handled: Bool) -> [SentryThread] {
callStacks.map { callStack in
let thread = SentryThread(threadId: 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h: Yes that's valid feedback. I think the backend and frontend might get confused when having multiple threads with the threadId 0. If this is intendended, we should add a comment explaining why we have 0 for all threads.

@@ -1,3 +1,4 @@
@_implementationOnly import _SentryPrivate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m: I feels a bit strange to me now that we have a dependency on _SentryPrivate here. This is now a bit mixed. We have both MetricKitCallStack JSON parsing with the JSONDecoder and also toDebugMeta and public func prepare(event: Event, inAppLogic: SentryInAppLogic?, handled: Bool) .

I would prefer keeping this file focused on only parsing JSON data. I would move the conversion logic to a different file. It can be an extension, but doing this in a different file feels a bit cleaner to me.

public let callStackRootFrames: [SentryMXFrame]
// A Sample is the standard data format for a flamegraph taken from https://github.com/brendangregg/FlameGraph
// It is less compact than Apple's MetricKit format, but contains the same data and is easier to work with
struct Sample {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m: I think plenty of the structs and methods here could be fileprivate. We should hide that complexity and only make public what's required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-to-merge Use this label to trigger all PR workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SDK wrongly parses MetricKit MXHangDiagnostic stack trace

3 participants