Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MNTOR-3096 - add monitor glean backend #4544

Merged
merged 33 commits into from
Jun 27, 2024

Conversation

rhelmer
Copy link
Collaborator

@rhelmer rhelmer commented May 16, 2024

References:

Jira: MNTOR-3096

Description

Screenshot (if applicable)

Not applicable.

How to test

Checklist (Definition of Done)

  • Localization strings (if needed) have been added.
  • Commits in this PR are minimal and have descriptive commit messages.
  • I've added or updated the relevant sections in readme and/or code comments
  • I've added a unit test to test for potential regressions of this bug.
  • Product Owner accepted the User Story (demo of functionality completed) or waived the privilege.
  • All acceptance criteria are met.
  • Jira ticket has been updated (if needed) to match changes made during the development process.
  • Jira ticket has been updated (if needed) with suggestions for QA when this PR is deployed to stage.

@rhelmer rhelmer self-assigned this May 16, 2024
@rhelmer rhelmer marked this pull request as draft May 16, 2024 14:03
@rhelmer rhelmer requested a review from sean-rose May 21, 2024 23:22
Copy link

Copy link
Contributor

@sean-rose sean-rose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+wc

In bug 1896992 I also asked @akkomar to take a look at this, since he has more expertise on Glean backend telemetry.

src/telemetry/backend-metrics.yaml Outdated Show resolved Hide resolved
@rhelmer rhelmer requested a review from sean-rose May 22, 2024 00:23
@rhelmer rhelmer marked this pull request as ready for review May 22, 2024 00:23
Copy link

@akkomar akkomar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics definitions look good to me, apart from missing send_in_pings declaration.

We'll now want to integrate glean_parser in the build and generate logging code.
We have some docs now at https://mozilla.github.io/glean/book/user/adding-glean-to-your-project/server.html#how-to-add-glean-server-side-collection-to-your-service and you can also compare how FxA did this.

src/telemetry/backend-metrics.yaml Show resolved Hide resolved
Copy link
Contributor

@sean-rose sean-rose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+wc

The updated subscription.cancel event description looks good. Just the one remaining send_in_pings issue that should be fixed.

src/telemetry/backend-metrics.yaml Show resolved Hide resolved
@rhelmer rhelmer marked this pull request as draft June 3, 2024 16:08
@rhelmer
Copy link
Collaborator Author

rhelmer commented Jun 10, 2024

Metrics definitions look good to me, apart from missing send_in_pings declaration.

We'll now want to integrate glean_parser in the build and generate logging code. We have some docs now at https://mozilla.github.io/glean/book/user/adding-glean-to-your-project/server.html#how-to-add-glean-server-side-collection-to-your-service and you can also compare how FxA did this.

Thanks! I will review the docs and example, just FYI we split this part of the work to a separate ticket to fit better into our sprints.

@rhelmer rhelmer marked this pull request as ready for review June 10, 2024 23:16
@rhelmer
Copy link
Collaborator Author

rhelmer commented Jun 14, 2024

After taking another look at this I would strongly recommend moving user_id, session_id, monitor_user_id, and plan_tier out of extras and making them string metrics.
This way it would be easier to work with them - they would have specific columns in BigQuery and would be easier to discover in Glean Dictionary. We don't have any written guidelines for this, but it seems like measures that are sent with all the events and generally stay unchanged across a session (@rhelmer this seems to be the case here?) should be defined as ping level metrics. cc @quiiver with whom I talked about this yesterday.
For comparison, here's how FxA defined user_id metric.

OK I went ahead and made this change - it used extra_keys because that's the way the front-end metrics work, which was at the request of DS. As long as we won't have a problem piecing together these things in analysis I think it's fine.

@rhelmer, after discussing this with @akkomar some more, we'd recommend going back to the previous extra_keys approach to keep things consistent with the frontend events. Sorry for the runaround!

No problem, thanks! I'll revert that commit.

@rhelmer rhelmer requested review from akkomar and sean-rose June 14, 2024 23:20
@sean-rose
Copy link
Contributor

@rhelmer, after discussing this with @akkomar some more, we'd recommend going back to the previous extra_keys approach to keep things consistent with the frontend events. Sorry for the runaround!

No problem, thanks! I'll revert that commit.

I hate to flip-flop on this again, but I just realized there's another factor that probably tips the scales back toward having those IDs as string metrics: data access/deletion requests. It'd be much more straightforward to handle those if the IDs are simple columns rather than being in the array structure the extra keys values end up in.

So now I'm thinking we should use string metrics for the user/session-level properties in both the backend and the frontend (which would involve changing frontend telemetry, with LookML overrides to fall back to the previous location for those values in earlier frontend events), but with the following naming/hierarchy so the final metric names/columns make sense and aren't repeated with different categories/prefixes:

  • mozilla
    • account_id (metric column name metrics.string.mozilla_account_id)
  • monitor
    • user_id (metric column name metrics.string.monitor_user_id)
    • session_id (metric column name metrics.string.monitor_session_id)
    • plan_tier (metric column name metrics.string.monitor_plan_tier)

@akkomar what do you think?

@akkomar
Copy link

akkomar commented Jun 17, 2024

I hate to flip-flop on this again, but I just realized there's another factor that probably tips the scales back toward having those IDs as string metrics: data access/deletion requests. It'd be much more straightforward to handle those if the IDs are simple columns rather than being in the array structure the extra keys values end up in.

@sean-rose Good point, these request (DSR) would be easier to handle with ping-level metrics.

So now I'm thinking we should use string metrics for the user/session-level properties in both the backend and the frontend (which would involve changing frontend telemetry, with LookML overrides to fall back to the previous location for those values in earlier frontend events) (...)

Naming/hierarchy looks good to me. Ideally, to make this even easier for DSAR we could have some standardized naming conventions for these identifiers. But I wouldn't split hairs over this as we'll already need to support customizations because of https://bugzilla.mozilla.org/show_bug.cgi?id=1889123.

As for switching the frontend part too - do you know how much effort this requires? It would be great to have these fields consistently in string metrics, but I'm not sure a migration in frontend is worth it. Also it seems to me that unless we rewrite the old data, we'll need to support identfiers in extras for DSRs anyway until this data expires.

FYI @ksiegler1

@sean-rose
Copy link
Contributor

sean-rose commented Jun 18, 2024

As for switching the frontend part too - do you know how much effort this requires? It would be great to have these fields consistently in string metrics, but I'm not sure a migration in frontend is worth it.

I'd hope most of the additional complexity could be contained in centralized code similar to how FxA is doing it, in which case it probably wouldn't be a ton of effort, but that's optimistic speculation on my part. @rhelmer what do you think?

Also it seems to me that unless we rewrite the old data, we'll need to support identfiers in extras for DSRs anyway until this data expires.

It should be pretty easy to populate the new string metric columns for historical records using a simple UPDATE query.

IMO it'd be worth some reasonable effort now to put things in a more supportable state going forward.

@rhelmer
Copy link
Collaborator Author

rhelmer commented Jun 25, 2024

As for switching the frontend part too - do you know how much effort this requires? It would be great to have these fields consistently in string metrics, but I'm not sure a migration in frontend is worth it.

I'd hope most of the additional complexity could be contained in centralized code similar to how FxA is doing it, in which case it probably wouldn't be a ton of effort, but that's optimistic speculation on my part. @rhelmer what do you think?

Also it seems to me that unless we rewrite the old data, we'll need to support identfiers in extras for DSRs anyway until this data expires.

It should be pretty easy to populate the new string metric columns for historical records using a simple UPDATE query.

IMO it'd be worth some reasonable effort now to put things in a more supportable state going forward.

I think migrating the front-end is doable, I'd need to do it in a separate ticket though. I can convert the metrics.yaml back again.

@rhelmer
Copy link
Collaborator Author

rhelmer commented Jun 25, 2024

So now I'm thinking we should use string metrics for the user/session-level properties in both the backend and the frontend (which would involve changing frontend telemetry, with LookML overrides to fall back to the previous location for those values in earlier frontend events), but with the following naming/hierarchy so the final metric names/columns make sense and aren't repeated with different categories/prefixes:

* `mozilla`
  
  * `account_id` (metric column name `metrics.string.mozilla_account_id`)

* `monitor`
  
  * `user_id` (metric column name `metrics.string.monitor_user_id`)
  * `session_id` (metric column name `metrics.string.monitor_session_id`)
  * `plan_tier` (metric column name `metrics.string.monitor_plan_tier`)

@sean-rose I reverted back to the previous version of the backend-metrics.yaml. Do I need to modify it to account for this too or is that something you're proposing handling on the Glean side?

@sean-rose
Copy link
Contributor

So now I'm thinking we should use string metrics for the user/session-level properties in both the backend and the frontend (which would involve changing frontend telemetry, with LookML overrides to fall back to the previous location for those values in earlier frontend events), but with the following naming/hierarchy so the final metric names/columns make sense and aren't repeated with different categories/prefixes:

  • mozilla
    • account_id (metric column name metrics.string.mozilla_account_id)
  • monitor
    • user_id (metric column name metrics.string.monitor_user_id)
    • session_id (metric column name metrics.string.monitor_session_id)
    • plan_tier (metric column name metrics.string.monitor_plan_tier)

@sean-rose I reverted back to the previous version of the backend-metrics.yaml. Do I need to modify it to account for this too or is that something you're proposing handling on the Glean side?

Yes, you'll need to modify backend-metrics.yaml to do that. With the current backend-metrics.yaml the resulting Monitor backend Glean events table in BigQuery would have the following set of duplicative metric columns:

  • metrics.string.account_monitor_user_id
  • metrics.string.account_plan_tier
  • metrics.string.account_session_id
  • metrics.string.account_user_id
  • metrics.string.page_monitor_user_id
  • metrics.string.page_plan_tier
  • metrics.string.page_session_id
  • metrics.string.page_user_id
  • metrics.string.subscription_monitor_user_id
  • metrics.string.subscription_plan_tier
  • metrics.string.subscription_session_id
  • metrics.string.subscription_user_id

@rhelmer
Copy link
Collaborator Author

rhelmer commented Jun 26, 2024

OK thanks @sean-rose lmk if this is what you were expecting.

@sean-rose
Copy link
Contributor

OK thanks @sean-rose lmk if this is what you were expecting.

Not exactly:

  • The string metrics should be specified once directly under top-level mozilla and monitor category keys.
  • The event metrics should be specified directly under the appropriate top-level category keys (multiple levels of nesting are not supported).

I tried pushing a commit to correct those issues myself, but it turns out I don't have access (which makes sense), so I'll add a separate comment with the changes I'd recommend as one big code suggestion.

@rhelmer rhelmer requested review from akkomar and removed request for akkomar June 26, 2024 22:39
@rhelmer rhelmer merged commit 589bf8c into main Jun 27, 2024
16 checks passed
@rhelmer rhelmer deleted the MNTOR-3096/add-monitor-glean-backend branch June 27, 2024 17:37
Copy link

Cleanup completed - database 'blurts-server-pr-4544' destroyed, cloud run service 'blurts-server-pr-4544' destroyed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants