Bug 1807324 - Add macrobenchmarks for startup metrics #842

rahulsainani · 2023-02-16T10:51:33Z

What

This PR adds introduces jetpack macrobenchmark library to measure startup metrics with different startup modes - Cold, Warm & Hot.

How

– Add dependencies and create benchmark module
– Fight with Glean and fix dependency conflict
– Add startupBenchmark to capture startup metrics.
– Run using AndroidStudio just like any other test or using gradle ./gradlew benchmark:connectedBenchmarkAndroid Test

How the results look?

Possible Next Steps

– Run this on CI so we can measure consistently and graph the metrics
– Identify critical app flows and integrate FrameMetrics and TraceMetrics for them
– Since this is a prerequisite to add BaselineProfiles, generate and add baseline profiles to improve possibly improve app startup performance, especially for the first week of the new version release (when cloud profiles have not been loaded on the device).

Issues

– Running using gradle ./gradlew benchmark:connectedBenchmarkAndroidTest has an issue while installing the apk for the correct ABI. Is that something we've observed before?

Exception thrown during onBeforeAll invocation of plugin com.google.testing.platform.plugin.android.AndroidDevicePlugin.
Failed to install APK(s): /Users/rahulsainani/StudioProjects/firefox-android/fenix/app/build/outputs/apk/fenix/benchmark/app-fenix-x86_64-benchmark.apk
INSTALL_FAILED_NO_MATCHING_ABIS: INSTALL_FAILED_NO_MATCHING_ABIS: Failed to extract native libraries, res=-113

Pull Request checklist

Quality: This PR builds and passes detekt/ktlint checks (A pre-push hook is recommended)
Tests: This PR includes thorough tests or an explanation of why it does not
Changelog: This PR includes a changelog entry or does not need one
Accessibility: The code in this PR follows accessibility best practices or does not include any user facing features

After merge

Milestone: Make sure issues closed by this pull request are added to the milestone of the version currently in development.
Breaking Changes: If this is a breaking change, please push a draft PR on Reference Browser to address the breaking issues.

GitHub Automation

https://bugzilla.mozilla.org/show_bug.cgi?id=1807324

MatthewTighe

Couple small drive-by comments. I am very excited by this though, thanks for bringing it in! Would love to see follow-ups around scrolling the home screen and the tabs tray. Is there a plan to add this to our pipelines as well?

fenix/benchmark/src/main/java/org/mozilla/fenix/benchmark/utils/MacroBenchmarkRule.kt

MatthewTighe · 2023-02-17T17:38:22Z

fenix/app/src/main/AndroidManifest.xml

+        <profileable
+            android:shell="true"
+            tools:targetApi="29" />


Without looking too deeply, I'm curious whether adding this to our normal manifest could have any impact on production performance and if so whether there is there an easy way to create a "copy" app that we could use under test.

Good question, looking into the docs, the profileable tag makes the non debuggable builds profileable on 29+, it doesn't look like it has any impact on the performance since it's designed to benchmark release builds for more accurate perf metrics.

fenix/benchmark/src/main/java/org/mozilla/fenix/benchmark/utils/Constants.kt

MarcLeclair · 2023-02-17T17:44:04Z

Couple small drive-by comments. I am very excited by this though, thanks for bringing it in! Would love to see follow-ups around scrolling the home screen and the tabs tray. Is there a plan to add this to our pipelines as well?

So, macrobenchmark is mostly used for something that happens once in a while (I.e start up). So this test shouldn't include that in my opinion since we want a clear view of start up. However, we do have microbenchmarks that do just that! I have a patch ready to land that I never landed since there was issues with dependencies a few months ago that seems resolved now. I think if we landed that we would have a good understanding of the entire Home Screen :)

MatthewTighe

Had just a couple more things I wanted to touch on before approving, but this is looking good and I was able to verify the workflow locally.

Running using gradle ./gradlew benchmark:connectedBenchmarkAndroidTest has an issue while installing the apk for the correct ABI. Is that something we've observed before?

I was able to resolve this by adding autosignReleaseWithDebugKey to my local.properties. Let me know if that helps!

MatthewTighe · 2023-02-22T22:43:20Z

fenix/benchmark/build.gradle

+            debuggable = true
+            if (gradle.hasProperty("localProperties.autosignReleaseWithDebugKey")) {
+                signingConfig signingConfigs.debug
+            }


Curious about 2 things here:

Does this module need to be debuggable for the profiling to work correctly? Just a little surprising given the emphasis on using release builds, but I didn't actually try the setup wizard

I think this signingConfig property will be added by the releaseTemplate closure used in the top-level build file. Is it necessary to have it in both places?

Good question, on my Samsung A50 (android 11) the metrics can't be read/extracted when this is set to false. On Pixel 5 & 7 (android 13) it works regardless of this setting, so I set it true. The samples seem to do the same.

This is the benchmark/build.gradle, the closure is defined in app/build.gradle, so that won't work right? Although i'm not sure if this condition is even required. This is something I'd like to get feedback on, based on how we sign apks on CI and how we would like to sign the benchmark apk as well.

MatthewTighe · 2023-02-22T22:43:41Z

fenix/benchmark/build.gradle

+/**
+ * This fixes the dependency resolution issue with Glean Native. The glean gradle plugin does this
+ * and that's applied to the app module. Since there are no other uses of the glean plugin in the
+ * benchmark module, we do this manually here.
+ */
+configurations.all {
+    resolutionStrategy.capabilitiesResolution.withCapability("org.mozilla.telemetry:glean-native") {
+        def toBeSelected = candidates.find { it.id instanceof ModuleComponentIdentifier && it.id.module.contains('geckoview') }
+        if (toBeSelected != null) {
+            select(toBeSelected)
+        }
+        because 'use GeckoView Glean instead of standalone Glean'
+    }
+}


I don't really understand what's happening here, would you mind explaining it to me?

There was a build error because of glean dependency resolution conflict as it was coming from geckoview as well. The glean plugin applied on app module handles this case, along with other functionality. Since benchmark module doesn't need all the glean gradle plugin's functionalities, we added this to fix the dependency resolution conflict issue. Does that help understand better? 😅 This was an issue that took some time to figure out. Jan Erik was a big here here!

Writing it out helps me remember better, so this is for myself really!

In a regular GeckoView build we have Glean baked in because there is a client within Gecko already so we really want to use that. So when build dependency resolution happens, we want to tell gradle "choose this one, don't worry":

graph TD A[GeckoView + Glean] --> B[Fenix APK] C[Rust Components] --> B

Loading

In a GeckoView Lite build, which third party consumers may use because they don't want Mozilla telemetry in it, they could still use a separate compiled component for Glean but that would mean there is a large amount of duplication in there, or they have to figure out how to deliver telemetry to and fro from the engine to this separate component (which the path we didn't take because it's quite a bit more complicated):

graph TD A[GeckoView] --> B[Fenix APK] C[Rust Components] --> B D["Glean Component (optional)"] --> B

Loading

MatthewTighe · 2023-02-22T22:51:24Z

So, macrobenchmark is mostly used for something that happens once in a while (I.e start up). So this test shouldn't include that in my opinion since we want a clear view of start up. However, we do have microbenchmarks that do just that! I have a patch ready to land that I never landed since there was issues with dependencies a few months ago that seems resolved now.

Oh cool! I would be interested in reading through that patch as well.

@rahulsainani another thing to consider for follow-ups:

IIRC, we have historically treated time-to-interactivity as a more valuable metric than time-to-display since that captures a more realistic picture of a user experience. I believe this patch only measures the latter, so it would be great to either add that metric to these existing tests or create new ones to measure that separately.

I would also be interested in understanding better the results between the startup different flows. These results show a pretty massive difference between hot and cold startup, though I don't know what kind of values to expect. My results on a Pixel 4a, Android 13:

StartupBenchmark_startupHot
timeToInitialDisplayMs   min 64.7,   median 71.6,   max 87.4
Traces: Iteration 0 1 2 3 4
StartupBenchmark_startupCold
timeToInitialDisplayMs   min 755.1,   median 812.1,   max 850.2
Traces: Iteration 0 1 2 3 4
StartupBenchmark_startupWarm
timeToInitialDisplayMs   min 161.2,   median 188.1,   max 200.6
Traces: Iteration 0 1 2 3 4

rahulsainani · 2023-02-23T09:04:43Z

@MatthewTighe Good to see it working in your local env 👌

Yes, you're right that this PR only adds Time to initial display (TTID). There's still value in measure TTID as lots happen till that time and it can give a good signal about app performance. I'd like to add tests to get time-to-interactivity aka Time to Full Display (TTFD). For that Activity.reportFullyDrawn() is used by the benchmark as a signal. However, for this PR, the goal is to try macrobenchmarks and explore the benefits and where it fits in our perf metrics observability setup and how this is a prerequisite to add baseline profiles. Ref

We can also benchmark other macro flows like Login/Changing language and capture TraceMetrics for those, or FrameMetrics for scroll performance. But I'll stop myself from going into tangents here 😄

The numbers look consistent to me comparing to Google Play Vitals.

Cold Startup is what we're interested in, when the app is not in memory. Hot is when the app is in memory but the activity is in background, all it does it bring it to the foreground.

rahulsainani · 2023-02-23T10:21:17Z

I was able to resolve this by adding autosignReleaseWithDebugKey to my local.properties. Let me know if that helps!

Somehow that's not working on my machine, the local.properties flag is needed to run from AS also, so I had it enabled. But it's good to know that it works for you 😄

MatthewTighe

This looks good from my perspective, so adding an approval in case it unblocks you. It does sound like maybe you were looking for additional feedback about our CI systems which I am not confident in answering, but it seems like those may be resolved in follow-ups.

rahulsainani · 2023-02-24T08:11:39Z

This looks good from my perspective, so adding an approval in case it unblocks you. It does sound like maybe you were looking for additional feedback about our CI systems which I am not confident in answering, but it seems like those may be resolved in follow-ups.

Thanks @MatthewTighe for taking the time to review and dig into macrobenchmarks 🙌
Yes, I'll wait for some more feedback and still need the solution to the signing issue on the build. 🚧

jonalmeida · 2023-03-02T01:39:42Z

For this unrelated CI failure, please rebase to the latest main to include the fix in your branch:

[task 2023-03-01T17:36:01.392Z] FAILURE: Build failed with an exception.
[task 2023-03-01T17:36:01.392Z] 
[task 2023-03-01T17:36:01.392Z] * What went wrong:
[task 2023-03-01T17:36:01.392Z] Execution failed for task ':support-rusthttp:generateDebugUnitTestStubRFile'.
[task 2023-03-01T17:36:01.392Z] > Could not resolve all files for configuration ':support-rusthttp:debugUnitTestRuntimeClasspath'.
[task 2023-03-01T17:36:01.392Z]    > Failed to transform monitor-1.4.0.aar (androidx.test:monitor:1.4.0) to match attributes {artifactType=android-symbol-with-package-name, org.gradle.category=library, org.gradle.libraryelements=jar, org.gradle.status=release, org.gradle.usage=java-runtime}.
[task 2023-03-01T17:36:01.392Z]       > Could not find monitor-1.4.0.jar (androidx.test:monitor:1.4.0).
[task 2023-03-01T17:36:01.392Z]         Searched in the following locations:
[task 2023-03-01T17:36:01.393Z]             file:/builds/worker/fetches/external-gradle-dependencies/google/androidx/test/monitor/1.4.0/monitor-1.4.0.aar
[task 2023-03-01T17:36:01.393Z]             file:/builds/worker/fetches/external-gradle-dependencies/google/androidx/test/monitor/1.4.0/monitor-1.4.0.jar
[task 2023-03-01T17:36:01.394Z]    > Failed to transform monitor-1.4.0.aar (androidx.test:monitor:1.4.0) to match attributes {artifactType=android-symbol-with-package-name, org.gradle.status=release}.
[task 2023-03-01T17:36:01.394Z]       > Could not find monitor-1.4.0.aar (androidx.test:monitor:1.4.0).
[task 2023-03-01T17:36:01.394Z]         Searched in the following locations:
[task 2023-03-01T17:36:01.394Z]             file:/builds/worker/fetches/external-gradle-dependencies/google/androidx/test/monitor/1.4.0/monitor-1.4.0.aar
[task 2023-03-01T17:36:01.394Z]             file:/builds/worker/fetches/external-gradle-dependencies/google/androidx/test/monitor/1.4.0/monitor-1.4.0.jar
[task 2023-03-01T17:36:01.394Z] 
[task 2023-03-01T17:36:01.394Z] * Try:
[task 2023-03-01T17:36:01.394Z] > Run with --stacktrace option to get the stack trace.
[task 2023-03-01T17:36:01.394Z] > Run with --info or --debug option to get more log output.
[task 2023-03-01T17:36:01.394Z] > Run with --scan to get full insights.

rahulsainani · 2023-03-02T08:38:31Z

Thanks for looking into it and fixing it @jonalmeida 🙌

rahulsainani · 2023-03-03T17:32:40Z

Although this PR doesn't change anything in the main app, after discussing with @csadilek I ran the local perf tests on main and this branch:

Sharing some test numbers below:

main branch

'max': 390.0,
 'mean': 358.68,
 'median': 358.0,
 'min': 332.0,
 'replicate_count': 25,
 'replicates': [365.0, 358.0, 390.0, 338.0, 332.0, 337.0, 353.0, 360.0, 378.0,
                347.0, 378.0, 342.0, 355.0, 353.0, 367.0, 352.0, 364.0, 365.0,
                356.0, 341.0, 371.0, 369.0, 368.0, 347.0, 381.0],
 'stdev': 14.935193336545732

benchmarks branch (this)

'max': 403.0,
 'mean': 367.44,
 'median': 367.0,
 'min': 336.0,
 'replicate_count': 25,
 'replicates': [361.0, 368.0, 389.0, 354.0, 359.0, 372.0, 376.0, 365.0, 350.0,
                373.0, 383.0, 363.0, 353.0, 353.0, 367.0, 403.0, 356.0, 371.0,
                371.0, 390.0, 356.0, 336.0, 374.0, 358.0, 385.0],
 'stdev': 14.947352050892938

nightly-play (2023-03-03)

'max': 420.0,
 'mean': 374.2,
 'median': 380.0,
 'min': 311.0,
 'replicate_count': 25,
 'replicates': [338.0, 311.0, 365.0, 376.0, 343.0, 351.0, 390.0, 398.0, 361.0,
                391.0, 365.0, 401.0, 392.0, 388.0, 394.0, 365.0, 380.0, 382.0,
                354.0, 420.0, 372.0, 384.0, 361.0, 380.0, 393.0],
 'stdev': 23.475164181179508

jonalmeida

LGTM!

Left one nit about a doc, but that's not blocking being a nit. 🚀

jonalmeida · 2023-03-13T22:53:09Z

fenix/benchmark/build.gradle

+    buildTypes {
+        // This benchmark buildType is used for benchmarking, and should function like your
+        // release build (for example, with minification on). It's signed with a debug key
+        // for easy local/CI testing.


Suggested change

// for easy local/CI testing.

// for easy local testing.

Signing with a debug key is only something that a developer has setup locally. Taskcluster does have a throwaway signing key for CI builds, but I don't think this is the same as the debug one.

If we're looking to iterate this to run on CI builds then we'd need to change this line. For now, it might be prudent to correct docs? I'll leave that up to you to consider.

Yes the idea would be to run it on CI builds, with the next steps.
Just mentioning, this build.gradle is for the benchmark module that generates it's own apk that performs the test on the fenix apk. Not sure if taskcluster already have a signing key for that? I'll remove it for now and we can add this again as/if we run it on CI 👍

jonalmeida · 2023-03-13T23:07:55Z

fenix/benchmark/build.gradle

+/**
+ * This fixes the dependency resolution issue with Glean Native. The glean gradle plugin does this
+ * and that's applied to the app module. Since there are no other uses of the glean plugin in the
+ * benchmark module, we do this manually here.
+ */
+configurations.all {
+    resolutionStrategy.capabilitiesResolution.withCapability("org.mozilla.telemetry:glean-native") {
+        def toBeSelected = candidates.find { it.id instanceof ModuleComponentIdentifier && it.id.module.contains('geckoview') }
+        if (toBeSelected != null) {
+            select(toBeSelected)
+        }
+        because 'use GeckoView Glean instead of standalone Glean'
+    }
+}


Writing it out helps me remember better, so this is for myself really!

In a regular GeckoView build we have Glean baked in because there is a client within Gecko already so we really want to use that. So when build dependency resolution happens, we want to tell gradle "choose this one, don't worry":

graph TD A[GeckoView + Glean] --> B[Fenix APK] C[Rust Components] --> B

Loading

In a GeckoView Lite build, which third party consumers may use because they don't want Mozilla telemetry in it, they could still use a separate compiled component for Glean but that would mean there is a large amount of duplication in there, or they have to figure out how to deliver telemetry to and fro from the engine to this separate component (which the path we didn't take because it's quite a bit more complicated):

graph TD A[GeckoView] --> B[Fenix APK] C[Rust Components] --> B D["Glean Component (optional)"] --> B

Loading

JohanLorenzo · 2023-03-15T09:18:14Z

Chatted with @rahulsainani. At the moment, the Mergify merge queue doesn't work anymore because of Mergifyio/mergify#5075. I'm merging this PR manually while we figure out a way to put the merge queue back in a working state.

JohanLorenzo · 2023-03-15T09:30:49Z

Permanent fix will be tracked in https://bugzilla.mozilla.org/show_bug.cgi?id=1822491

rahulsainani assigned jonalmeida, MarcLeclair, Amejia481 and Mugurell Feb 16, 2023

rahulsainani added the 🕵️‍♀️ needs review PRs that need to be reviewed label Feb 16, 2023

rahulsainani marked this pull request as ready for review February 16, 2023 10:53

rahulsainani force-pushed the 1807324-macrobenchmarks-for-startup-metrics branch 4 times, most recently from a1d7fa5 to d84b4c1 Compare February 16, 2023 18:26

JohanLorenzo mentioned this pull request Feb 17, 2023

Bug 1817398 - Run external-gradle-dependencies-lint-fenix when `Dep… #866

Merged

MatthewTighe reviewed Feb 17, 2023

View reviewed changes

rahulsainani force-pushed the 1807324-macrobenchmarks-for-startup-metrics branch 6 times, most recently from d7f94fd to 0314845 Compare February 22, 2023 10:32

MatthewTighe reviewed Feb 22, 2023

View reviewed changes

This comment was marked as resolved.

Sign in to view

rahulsainani force-pushed the 1807324-macrobenchmarks-for-startup-metrics branch from 0314845 to 610c40d Compare February 23, 2023 08:48

MatthewTighe approved these changes Feb 23, 2023

View reviewed changes

rahulsainani unassigned Amejia481, jonalmeida and Mugurell Feb 24, 2023

rahulsainani unassigned MarcLeclair Feb 24, 2023

rahulsainani force-pushed the 1807324-macrobenchmarks-for-startup-metrics branch 4 times, most recently from e97874f to 2e81c59 Compare March 1, 2023 11:40

rahulsainani force-pushed the 1807324-macrobenchmarks-for-startup-metrics branch from d470cb9 to 1211794 Compare March 2, 2023 08:35

rahulsainani force-pushed the 1807324-macrobenchmarks-for-startup-metrics branch 4 times, most recently from 330d90b to d930990 Compare March 3, 2023 14:52

rahulsainani requested a review from jonalmeida March 6, 2023 17:03

jonalmeida approved these changes Mar 13, 2023

View reviewed changes

rahulsainani force-pushed the 1807324-macrobenchmarks-for-startup-metrics branch from d930990 to e09b5d4 Compare March 14, 2023 09:37

rahulsainani added approved PR that has been approved and removed 🕵️‍♀️ needs review PRs that need to be reviewed labels Mar 14, 2023

Bug 1807324 - Add macrobenchmarks for startup metrics

2638432

rahulsainani force-pushed the 1807324-macrobenchmarks-for-startup-metrics branch from e09b5d4 to 2638432 Compare March 14, 2023 16:54

rahulsainani added 🛬 needs landing PRs that are ready to land and removed 🛬 needs landing PRs that are ready to land labels Mar 15, 2023

JohanLorenzo merged commit 17eb261 into mozilla-mobile:main Mar 15, 2023

rahulsainani deleted the 1807324-macrobenchmarks-for-startup-metrics branch March 16, 2023 08:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 1807324 - Add macrobenchmarks for startup metrics #842

Bug 1807324 - Add macrobenchmarks for startup metrics #842

rahulsainani commented Feb 16, 2023 •

edited by github-actions bot

Loading

MatthewTighe left a comment

MatthewTighe Feb 17, 2023

rahulsainani Feb 20, 2023

MarcLeclair commented Feb 17, 2023

MatthewTighe left a comment

MatthewTighe Feb 22, 2023

rahulsainani Feb 23, 2023

MatthewTighe Feb 22, 2023

rahulsainani Feb 23, 2023

jonalmeida Mar 13, 2023

MatthewTighe commented Feb 22, 2023 •

edited

Loading

This comment was marked as resolved.

rahulsainani commented Feb 23, 2023

rahulsainani commented Feb 23, 2023

MatthewTighe left a comment

rahulsainani commented Feb 24, 2023

jonalmeida commented Mar 2, 2023

rahulsainani commented Mar 2, 2023

rahulsainani commented Mar 3, 2023

jonalmeida left a comment

jonalmeida Mar 13, 2023

jonalmeida Mar 13, 2023

rahulsainani Mar 14, 2023

jonalmeida Mar 13, 2023

JohanLorenzo commented Mar 15, 2023

JohanLorenzo commented Mar 15, 2023

Bug 1807324 - Add macrobenchmarks for startup metrics #842

Bug 1807324 - Add macrobenchmarks for startup metrics #842

Conversation

rahulsainani commented Feb 16, 2023 • edited by github-actions bot Loading

What

How

How the results look?

Possible Next Steps

Issues

Pull Request checklist

After merge

GitHub Automation

MatthewTighe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcLeclair commented Feb 17, 2023

MatthewTighe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MatthewTighe commented Feb 22, 2023 • edited Loading

This comment was marked as resolved.

rahulsainani commented Feb 23, 2023

rahulsainani commented Feb 23, 2023

MatthewTighe left a comment

Choose a reason for hiding this comment

rahulsainani commented Feb 24, 2023

jonalmeida commented Mar 2, 2023

rahulsainani commented Mar 2, 2023

rahulsainani commented Mar 3, 2023

jonalmeida left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JohanLorenzo commented Mar 15, 2023

JohanLorenzo commented Mar 15, 2023

rahulsainani commented Feb 16, 2023 •

edited by github-actions bot

Loading

MatthewTighe commented Feb 22, 2023 •

edited

Loading