From a5a9b43851bc6c69114decf8eeae7ff32848b1bc Mon Sep 17 00:00:00 2001 From: Travis Long Date: Thu, 24 Oct 2024 07:22:01 -0500 Subject: [PATCH] Bug 1926095 - Add how-to with tips on investigating data anomalies --- .dictionary | 3 +- docs/user/SUMMARY.md | 1 + docs/user/user/howto/index.md | 14 +++- .../investigating-data-issues.md | 67 +++++++++++++++++++ 4 files changed, 82 insertions(+), 3 deletions(-) create mode 100644 docs/user/user/howto/investigating-data-issues/investigating-data-issues.md diff --git a/.dictionary b/.dictionary index cd56c3d518..3cf7b16fc8 100644 --- a/.dictionary +++ b/.dictionary @@ -1,4 +1,4 @@ -personal_ws-1.1 en 290 utf-8 +personal_ws-1.1 en 291 utf-8 AAR AARs ABI @@ -41,6 +41,7 @@ Gradle Grapheme Hotfix Howtos +ISPs JDK JNA JNI diff --git a/docs/user/SUMMARY.md b/docs/user/SUMMARY.md index 94c7de3483..722d5db656 100644 --- a/docs/user/SUMMARY.md +++ b/docs/user/SUMMARY.md @@ -47,6 +47,7 @@ - [Walkthroughs and How-tos](user/howto/index.md) - [Server Knobs Walkthrough](user/howto/server-knobs-walkthrough/server-knobs-walkthrough.md) - ["Real-Time" Events](user/howto/real-time-events/real-time-events.md) + - [Telemetry/Data Bug Investigation Recommendations](user/howto/investigating-data-issues/investigating-data-issues.md) # API Reference diff --git a/docs/user/user/howto/index.md b/docs/user/user/howto/index.md index 6791411fec..9bc41edbbc 100644 --- a/docs/user/user/howto/index.md +++ b/docs/user/user/howto/index.md @@ -4,6 +4,16 @@ This chapter contains various how-tos and walkthroughs to help aid you in using ### [Server Knobs Walkthrough] -A step-by-step guide in setting up and launching a Server Knobs Experiment +A step-by-step guide in setting up and launching a Server Knobs Experiment. -[Server Knobs Walkthrough]: ./server-knobs-walkthrough/server-knobs-walkthrough.md \ No newline at end of file +### ["Real-Time" Events] + +A guide describing the different methods to collect and transmit data in a "real-time" fashion using Glean. + +### [Telemetry/Data Bug Investigation Recommendations] + +Recommendations and tips on investigating data anomalies. + +[Server Knobs Walkthrough]: ./server-knobs-walkthrough/server-knobs-walkthrough.md +["Real-Time" Events]: ./real-time-events/real-time-events.md +[Telemetry/Data Bug Investigation Recommendations]: ./investigating-data-issues/investigating-data-issues.md diff --git a/docs/user/user/howto/investigating-data-issues/investigating-data-issues.md b/docs/user/user/howto/investigating-data-issues/investigating-data-issues.md new file mode 100644 index 0000000000..a62898fe03 --- /dev/null +++ b/docs/user/user/howto/investigating-data-issues/investigating-data-issues.md @@ -0,0 +1,67 @@ +# Telemetry/Data Bug Investigation Recommendations + +This document outlines several diagnostic categories and the insights they may offer when investigating unusual telemetry patterns or data anomalies. + +### 1\. Countries + +* Purpose: Identify geographical patterns that could explain anomalies. +* Column Name: `metadata.geo.country` +* Considerations: + * Are there ongoing national holidays or similar events that could affect data? + * Is the region known for bot activity or unusual behavior? + +### 2\. ISP (Internet Service Provider) + +* Purpose: Analyze data at a more granular level than countries to identify potential automation or bot activity. +* Column Name: `metadata.isp.name` +* Considerations: + * Could the anomaly be traced back to a single ISP, potentially indicating automation? + * Be mindful of the large number of ISPs; consider applying filters (e.g., `HAVING` clause) to exclude smaller ISPs. + +### 3\. Product Version / Build ID + +* Purpose: Check if issues began with a specific product version or build. +* Column Names: `client_info.app_display_version`, `client_info.app_build` +* Considerations: + * Did the issue arise after a particular version update? If so, collaborate with the product team to identify changes. + * Ensure that the build ID matches a known Mozilla build. If not, it could be a clone, fork, or side-load build. + +### 4\. Glean SDK Version + +* Purpose: Determine whether the issue is tied to a specific Glean SDK version. +* Column Name: `client_info.telemetry_sdk_build` +* Considerations: + * Did the anomaly start after an update to Glean? Work with the Glean team to verify version changes. + +### 5\. Other Library Version Changes + +* Purpose: Identify possible regressions due to library updates. +* Considerations: + * Review updates to Application Services, Gecko, and other dependencies (e.g., Viaduct, rkv) that could affect telemetry collection. + +### 6\. OS SDK Version (Android, iOS) + +* Purpose: Check if platform SDK changes are impacting data collection. +* Column Names: `client_info.os_version` (Android only: `client_info.android_sdk_version`) +* Considerations: + * Have there been changes to platform lifecycle events or background task behaviors (e.g., 0-duration pings, or ping submission issues)? + +### 7\. Time Differences: start/end\_time vs. submission\_timestamp + +* Purpose: Assess the delay between telemetry collection and submission. +* Column Names: `ping_info.parsed_start_time`, `ping_info.parsed_end_time`, `submission_timestamp` +* Considerations: + * Are the recorded timestamps reasonable, both in terms of the ping time window and the delay from collection to submission? + +### 8\. Glean Errors + +* Purpose: Identify [telemetry or network errors](../../metrics/error-reporting.md) related to data collection. +* Considerations: + * Are there networking errors, ingestion issues, or other telemetry failures that could be related to the anomaly? + +### 9\. Hardware Details (Manufacturer/Version) + +* Purpose: Determine if the issue is hardware-specific. +* Column Names: `client_info.device_manufacturer`, `client_info.device_model` +* Considerations: + * Does the anomaly occur primarily on older or newer hardware models?