Skip to content
This repository has been archived by the owner on Jun 17, 2024. It is now read-only.

Bug 1807324 - Add macrobenchmarks for startup metrics #842

Conversation

rahulsainani
Copy link
Contributor

@rahulsainani rahulsainani commented Feb 16, 2023

What

This PR adds introduces jetpack macrobenchmark library to measure startup metrics with different startup modes - Cold, Warm & Hot.

How

– Add dependencies and create benchmark module
– Fight with Glean and fix dependency conflict
– Add startupBenchmark to capture startup metrics.
– Run using AndroidStudio just like any other test or using gradle ./gradlew benchmark:connectedBenchmarkAndroid Test

How the results look?

Screenshot 2023-02-16 at 15 47 45

Possible Next Steps

– Run this on CI so we can measure consistently and graph the metrics
– Identify critical app flows and integrate FrameMetrics and TraceMetrics for them
– Since this is a prerequisite to add BaselineProfiles, generate and add baseline profiles to improve possibly improve app startup performance, especially for the first week of the new version release (when cloud profiles have not been loaded on the device).

Issues

– Running using gradle ./gradlew benchmark:connectedBenchmarkAndroidTest has an issue while installing the apk for the correct ABI. Is that something we've observed before?

Exception thrown during onBeforeAll invocation of plugin com.google.testing.platform.plugin.android.AndroidDevicePlugin.
Failed to install APK(s): /Users/rahulsainani/StudioProjects/firefox-android/fenix/app/build/outputs/apk/fenix/benchmark/app-fenix-x86_64-benchmark.apk
INSTALL_FAILED_NO_MATCHING_ABIS: INSTALL_FAILED_NO_MATCHING_ABIS: Failed to extract native libraries, res=-113

Pull Request checklist

  • Quality: This PR builds and passes detekt/ktlint checks (A pre-push hook is recommended)
  • Tests: This PR includes thorough tests or an explanation of why it does not
  • Changelog: This PR includes a changelog entry or does not need one
  • Accessibility: The code in this PR follows accessibility best practices or does not include any user facing features

After merge

  • Milestone: Make sure issues closed by this pull request are added to the milestone of the version currently in development.
  • Breaking Changes: If this is a breaking change, please push a draft PR on Reference Browser to address the breaking issues.

GitHub Automation

https://bugzilla.mozilla.org/show_bug.cgi?id=1807324

@rahulsainani rahulsainani added the 🕵️‍♀️ needs review PRs that need to be reviewed label Feb 16, 2023
@rahulsainani rahulsainani marked this pull request as ready for review February 16, 2023 10:53
@rahulsainani rahulsainani force-pushed the 1807324-macrobenchmarks-for-startup-metrics branch 4 times, most recently from a1d7fa5 to d84b4c1 Compare February 16, 2023 18:26
Copy link
Contributor

@MatthewTighe MatthewTighe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple small drive-by comments. I am very excited by this though, thanks for bringing it in! Would love to see follow-ups around scrolling the home screen and the tabs tray. Is there a plan to add this to our pipelines as well?

Comment on lines +52 to +53
<profileable
android:shell="true"
tools:targetApi="29" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without looking too deeply, I'm curious whether adding this to our normal manifest could have any impact on production performance and if so whether there is there an easy way to create a "copy" app that we could use under test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, looking into the docs, the profileable tag makes the non debuggable builds profileable on 29+, it doesn't look like it has any impact on the performance since it's designed to benchmark release builds for more accurate perf metrics.

@MarcLeclair
Copy link
Contributor

Couple small drive-by comments. I am very excited by this though, thanks for bringing it in! Would love to see follow-ups around scrolling the home screen and the tabs tray. Is there a plan to add this to our pipelines as well?

So, macrobenchmark is mostly used for something that happens once in a while (I.e start up). So this test shouldn't include that in my opinion since we want a clear view of start up. However, we do have microbenchmarks that do just that! I have a patch ready to land that I never landed since there was issues with dependencies a few months ago that seems resolved now. I think if we landed that we would have a good understanding of the entire Home Screen :)

@rahulsainani rahulsainani force-pushed the 1807324-macrobenchmarks-for-startup-metrics branch 6 times, most recently from d7f94fd to 0314845 Compare February 22, 2023 10:32
Copy link
Contributor

@MatthewTighe MatthewTighe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had just a couple more things I wanted to touch on before approving, but this is looking good and I was able to verify the workflow locally.

Running using gradle ./gradlew benchmark:connectedBenchmarkAndroidTest has an issue while installing the apk for the correct ABI. Is that something we've observed before?

I was able to resolve this by adding autosignReleaseWithDebugKey to my local.properties. Let me know if that helps!

Comment on lines 30 to 33
debuggable = true
if (gradle.hasProperty("localProperties.autosignReleaseWithDebugKey")) {
signingConfig signingConfigs.debug
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about 2 things here:

  1. Does this module need to be debuggable for the profiling to work correctly? Just a little surprising given the emphasis on using release builds, but I didn't actually try the setup wizard
  2. I think this signingConfig property will be added by the releaseTemplate closure used in the top-level build file. Is it necessary to have it in both places?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Good question, on my Samsung A50 (android 11) the metrics can't be read/extracted when this is set to false. On Pixel 5 & 7 (android 13) it works regardless of this setting, so I set it true. The samples seem to do the same.
  2. This is the benchmark/build.gradle, the closure is defined in app/build.gradle, so that won't work right? Although i'm not sure if this condition is even required. This is something I'd like to get feedback on, based on how we sign apks on CI and how we would like to sign the benchmark apk as well.

Comment on lines +42 to +57
/**
* This fixes the dependency resolution issue with Glean Native. The glean gradle plugin does this
* and that's applied to the app module. Since there are no other uses of the glean plugin in the
* benchmark module, we do this manually here.
*/
configurations.all {
resolutionStrategy.capabilitiesResolution.withCapability("org.mozilla.telemetry:glean-native") {
def toBeSelected = candidates.find { it.id instanceof ModuleComponentIdentifier && it.id.module.contains('geckoview') }
if (toBeSelected != null) {
select(toBeSelected)
}
because 'use GeckoView Glean instead of standalone Glean'
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand what's happening here, would you mind explaining it to me?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a build error because of glean dependency resolution conflict as it was coming from geckoview as well. The glean plugin applied on app module handles this case, along with other functionality. Since benchmark module doesn't need all the glean gradle plugin's functionalities, we added this to fix the dependency resolution conflict issue. Does that help understand better? 😅 This was an issue that took some time to figure out. Jan Erik was a big here here!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing it out helps me remember better, so this is for myself really!

In a regular GeckoView build we have Glean baked in because there is a client within Gecko already so we really want to use that. So when build dependency resolution happens, we want to tell gradle "choose this one, don't worry":

graph TD
A[GeckoView + Glean] --> B[Fenix APK]
C[Rust Components] --> B
Loading

In a GeckoView Lite build, which third party consumers may use because they don't want Mozilla telemetry in it, they could still use a separate compiled component for Glean but that would mean there is a large amount of duplication in there, or they have to figure out how to deliver telemetry to and fro from the engine to this separate component (which the path we didn't take because it's quite a bit more complicated):

graph TD
A[GeckoView] --> B[Fenix APK]
C[Rust Components] --> B
D["Glean Component (optional)"] --> B
Loading

@MatthewTighe
Copy link
Contributor

MatthewTighe commented Feb 22, 2023

So, macrobenchmark is mostly used for something that happens once in a while (I.e start up). So this test shouldn't include that in my opinion since we want a clear view of start up. However, we do have microbenchmarks that do just that! I have a patch ready to land that I never landed since there was issues with dependencies a few months ago that seems resolved now.

Oh cool! I would be interested in reading through that patch as well.

@rahulsainani another thing to consider for follow-ups:

IIRC, we have historically treated time-to-interactivity as a more valuable metric than time-to-display since that captures a more realistic picture of a user experience. I believe this patch only measures the latter, so it would be great to either add that metric to these existing tests or create new ones to measure that separately.

I would also be interested in understanding better the results between the startup different flows. These results show a pretty massive difference between hot and cold startup, though I don't know what kind of values to expect. My results on a Pixel 4a, Android 13:

StartupBenchmark_startupHot
timeToInitialDisplayMs   min 64.7,   median 71.6,   max 87.4
Traces: Iteration 0 1 2 3 4
StartupBenchmark_startupCold
timeToInitialDisplayMs   min 755.1,   median 812.1,   max 850.2
Traces: Iteration 0 1 2 3 4
StartupBenchmark_startupWarm
timeToInitialDisplayMs   min 161.2,   median 188.1,   max 200.6
Traces: Iteration 0 1 2 3 4

@mergify

This comment was marked as resolved.

@rahulsainani rahulsainani force-pushed the 1807324-macrobenchmarks-for-startup-metrics branch from 0314845 to 610c40d Compare February 23, 2023 08:48
@rahulsainani
Copy link
Contributor Author

@MatthewTighe Good to see it working in your local env 👌

Yes, you're right that this PR only adds Time to initial display (TTID). There's still value in measure TTID as lots happen till that time and it can give a good signal about app performance. I'd like to add tests to get time-to-interactivity aka Time to Full Display (TTFD). For that Activity.reportFullyDrawn() is used by the benchmark as a signal. However, for this PR, the goal is to try macrobenchmarks and explore the benefits and where it fits in our perf metrics observability setup and how this is a prerequisite to add baseline profiles. Ref

We can also benchmark other macro flows like Login/Changing language and capture TraceMetrics for those, or FrameMetrics for scroll performance. But I'll stop myself from going into tangents here 😄

The numbers look consistent to me comparing to Google Play Vitals.

Cold Startup is what we're interested in, when the app is not in memory. Hot is when the app is in memory but the activity is in background, all it does it bring it to the foreground.

@rahulsainani
Copy link
Contributor Author

I was able to resolve this by adding autosignReleaseWithDebugKey to my local.properties. Let me know if that helps!

Somehow that's not working on my machine, the local.properties flag is needed to run from AS also, so I had it enabled. But it's good to know that it works for you 😄

Copy link
Contributor

@MatthewTighe MatthewTighe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good from my perspective, so adding an approval in case it unblocks you. It does sound like maybe you were looking for additional feedback about our CI systems which I am not confident in answering, but it seems like those may be resolved in follow-ups.

@rahulsainani
Copy link
Contributor Author

This looks good from my perspective, so adding an approval in case it unblocks you. It does sound like maybe you were looking for additional feedback about our CI systems which I am not confident in answering, but it seems like those may be resolved in follow-ups.

Thanks @MatthewTighe for taking the time to review and dig into macrobenchmarks 🙌
Yes, I'll wait for some more feedback and still need the solution to the signing issue on the build. 🚧

@rahulsainani rahulsainani force-pushed the 1807324-macrobenchmarks-for-startup-metrics branch 4 times, most recently from e97874f to 2e81c59 Compare March 1, 2023 11:40
@jonalmeida
Copy link
Collaborator

For this unrelated CI failure, please rebase to the latest main to include the fix in your branch:

[task 2023-03-01T17:36:01.392Z] FAILURE: Build failed with an exception.
[task 2023-03-01T17:36:01.392Z] 
[task 2023-03-01T17:36:01.392Z] * What went wrong:
[task 2023-03-01T17:36:01.392Z] Execution failed for task ':support-rusthttp:generateDebugUnitTestStubRFile'.
[task 2023-03-01T17:36:01.392Z] > Could not resolve all files for configuration ':support-rusthttp:debugUnitTestRuntimeClasspath'.
[task 2023-03-01T17:36:01.392Z]    > Failed to transform monitor-1.4.0.aar (androidx.test:monitor:1.4.0) to match attributes {artifactType=android-symbol-with-package-name, org.gradle.category=library, org.gradle.libraryelements=jar, org.gradle.status=release, org.gradle.usage=java-runtime}.
[task 2023-03-01T17:36:01.392Z]       > Could not find monitor-1.4.0.jar (androidx.test:monitor:1.4.0).
[task 2023-03-01T17:36:01.392Z]         Searched in the following locations:
[task 2023-03-01T17:36:01.393Z]             file:/builds/worker/fetches/external-gradle-dependencies/google/androidx/test/monitor/1.4.0/monitor-1.4.0.aar
[task 2023-03-01T17:36:01.393Z]             file:/builds/worker/fetches/external-gradle-dependencies/google/androidx/test/monitor/1.4.0/monitor-1.4.0.jar
[task 2023-03-01T17:36:01.394Z]    > Failed to transform monitor-1.4.0.aar (androidx.test:monitor:1.4.0) to match attributes {artifactType=android-symbol-with-package-name, org.gradle.status=release}.
[task 2023-03-01T17:36:01.394Z]       > Could not find monitor-1.4.0.aar (androidx.test:monitor:1.4.0).
[task 2023-03-01T17:36:01.394Z]         Searched in the following locations:
[task 2023-03-01T17:36:01.394Z]             file:/builds/worker/fetches/external-gradle-dependencies/google/androidx/test/monitor/1.4.0/monitor-1.4.0.aar
[task 2023-03-01T17:36:01.394Z]             file:/builds/worker/fetches/external-gradle-dependencies/google/androidx/test/monitor/1.4.0/monitor-1.4.0.jar
[task 2023-03-01T17:36:01.394Z] 
[task 2023-03-01T17:36:01.394Z] * Try:
[task 2023-03-01T17:36:01.394Z] > Run with --stacktrace option to get the stack trace.
[task 2023-03-01T17:36:01.394Z] > Run with --info or --debug option to get more log output.
[task 2023-03-01T17:36:01.394Z] > Run with --scan to get full insights.

@rahulsainani rahulsainani force-pushed the 1807324-macrobenchmarks-for-startup-metrics branch from d470cb9 to 1211794 Compare March 2, 2023 08:35
@rahulsainani
Copy link
Contributor Author

Thanks for looking into it and fixing it @jonalmeida 🙌

@rahulsainani rahulsainani force-pushed the 1807324-macrobenchmarks-for-startup-metrics branch 4 times, most recently from 330d90b to d930990 Compare March 3, 2023 14:52
@rahulsainani
Copy link
Contributor Author

Although this PR doesn't change anything in the main app, after discussing with @csadilek I ran the local perf tests on main and this branch:

Sharing some test numbers below:

main branch

'max': 390.0,
 'mean': 358.68,
 'median': 358.0,
 'min': 332.0,
 'replicate_count': 25,
 'replicates': [365.0, 358.0, 390.0, 338.0, 332.0, 337.0, 353.0, 360.0, 378.0,
                347.0, 378.0, 342.0, 355.0, 353.0, 367.0, 352.0, 364.0, 365.0,
                356.0, 341.0, 371.0, 369.0, 368.0, 347.0, 381.0],
 'stdev': 14.935193336545732

benchmarks branch (this)

'max': 403.0,
 'mean': 367.44,
 'median': 367.0,
 'min': 336.0,
 'replicate_count': 25,
 'replicates': [361.0, 368.0, 389.0, 354.0, 359.0, 372.0, 376.0, 365.0, 350.0,
                373.0, 383.0, 363.0, 353.0, 353.0, 367.0, 403.0, 356.0, 371.0,
                371.0, 390.0, 356.0, 336.0, 374.0, 358.0, 385.0],
 'stdev': 14.947352050892938

nightly-play (2023-03-03)

'max': 420.0,
 'mean': 374.2,
 'median': 380.0,
 'min': 311.0,
 'replicate_count': 25,
 'replicates': [338.0, 311.0, 365.0, 376.0, 343.0, 351.0, 390.0, 398.0, 361.0,
                391.0, 365.0, 401.0, 392.0, 388.0, 394.0, 365.0, 380.0, 382.0,
                354.0, 420.0, 372.0, 384.0, 361.0, 380.0, 393.0],
 'stdev': 23.475164181179508

@rahulsainani rahulsainani requested a review from jonalmeida March 6, 2023 17:03
Copy link
Collaborator

@jonalmeida jonalmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Left one nit about a doc, but that's not blocking being a nit. 🚀

buildTypes {
// This benchmark buildType is used for benchmarking, and should function like your
// release build (for example, with minification on). It's signed with a debug key
// for easy local/CI testing.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// for easy local/CI testing.
// for easy local testing.

Signing with a debug key is only something that a developer has setup locally. Taskcluster does have a throwaway signing key for CI builds, but I don't think this is the same as the debug one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're looking to iterate this to run on CI builds then we'd need to change this line. For now, it might be prudent to correct docs? I'll leave that up to you to consider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the idea would be to run it on CI builds, with the next steps.
Just mentioning, this build.gradle is for the benchmark module that generates it's own apk that performs the test on the fenix apk. Not sure if taskcluster already have a signing key for that? I'll remove it for now and we can add this again as/if we run it on CI 👍

Comment on lines +42 to +57
/**
* This fixes the dependency resolution issue with Glean Native. The glean gradle plugin does this
* and that's applied to the app module. Since there are no other uses of the glean plugin in the
* benchmark module, we do this manually here.
*/
configurations.all {
resolutionStrategy.capabilitiesResolution.withCapability("org.mozilla.telemetry:glean-native") {
def toBeSelected = candidates.find { it.id instanceof ModuleComponentIdentifier && it.id.module.contains('geckoview') }
if (toBeSelected != null) {
select(toBeSelected)
}
because 'use GeckoView Glean instead of standalone Glean'
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing it out helps me remember better, so this is for myself really!

In a regular GeckoView build we have Glean baked in because there is a client within Gecko already so we really want to use that. So when build dependency resolution happens, we want to tell gradle "choose this one, don't worry":

graph TD
A[GeckoView + Glean] --> B[Fenix APK]
C[Rust Components] --> B
Loading

In a GeckoView Lite build, which third party consumers may use because they don't want Mozilla telemetry in it, they could still use a separate compiled component for Glean but that would mean there is a large amount of duplication in there, or they have to figure out how to deliver telemetry to and fro from the engine to this separate component (which the path we didn't take because it's quite a bit more complicated):

graph TD
A[GeckoView] --> B[Fenix APK]
C[Rust Components] --> B
D["Glean Component (optional)"] --> B
Loading

@rahulsainani rahulsainani force-pushed the 1807324-macrobenchmarks-for-startup-metrics branch from d930990 to e09b5d4 Compare March 14, 2023 09:37
@rahulsainani rahulsainani added approved PR that has been approved and removed 🕵️‍♀️ needs review PRs that need to be reviewed labels Mar 14, 2023
@rahulsainani rahulsainani force-pushed the 1807324-macrobenchmarks-for-startup-metrics branch from e09b5d4 to 2638432 Compare March 14, 2023 16:54
@rahulsainani rahulsainani added 🛬 needs landing PRs that are ready to land and removed 🛬 needs landing PRs that are ready to land labels Mar 15, 2023
@JohanLorenzo
Copy link
Collaborator

Chatted with @rahulsainani. At the moment, the Mergify merge queue doesn't work anymore because of Mergifyio/mergify#5075. I'm merging this PR manually while we figure out a way to put the merge queue back in a working state.

@JohanLorenzo JohanLorenzo merged commit 17eb261 into mozilla-mobile:main Mar 15, 2023
@JohanLorenzo
Copy link
Collaborator

Permanent fix will be tracked in https://bugzilla.mozilla.org/show_bug.cgi?id=1822491

@rahulsainani rahulsainani deleted the 1807324-macrobenchmarks-for-startup-metrics branch March 16, 2023 08:07
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved PR that has been approved
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants