Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1926095 - Add how-to with tips on investigating data anomalies #2979

Merged
merged 1 commit into from
Oct 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .dictionary
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
personal_ws-1.1 en 290 utf-8
personal_ws-1.1 en 291 utf-8
AAR
AARs
ABI
Expand Down Expand Up @@ -41,6 +41,7 @@ Gradle
Grapheme
Hotfix
Howtos
ISPs
JDK
JNA
JNI
Expand Down
1 change: 1 addition & 0 deletions docs/user/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@
- [Walkthroughs and How-tos](user/howto/index.md)
- [Server Knobs Walkthrough](user/howto/server-knobs-walkthrough/server-knobs-walkthrough.md)
- ["Real-Time" Events](user/howto/real-time-events/real-time-events.md)
- [Telemetry/Data Bug Investigation Recommendations](user/howto/investigating-data-issues/investigating-data-issues.md)

# API Reference

Expand Down
14 changes: 12 additions & 2 deletions docs/user/user/howto/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,16 @@ This chapter contains various how-tos and walkthroughs to help aid you in using

### [Server Knobs Walkthrough]

A step-by-step guide in setting up and launching a Server Knobs Experiment
A step-by-step guide in setting up and launching a Server Knobs Experiment.

[Server Knobs Walkthrough]: ./server-knobs-walkthrough/server-knobs-walkthrough.md
### ["Real-Time" Events]

A guide describing the different methods to collect and transmit data in a "real-time" fashion using Glean.

### [Telemetry/Data Bug Investigation Recommendations]

Recommendations and tips on investigating data anomalies.

[Server Knobs Walkthrough]: ./server-knobs-walkthrough/server-knobs-walkthrough.md
["Real-Time" Events]: ./real-time-events/real-time-events.md
[Telemetry/Data Bug Investigation Recommendations]: ./investigating-data-issues/investigating-data-issues.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Telemetry/Data Bug Investigation Recommendations

This document outlines several diagnostic categories and the insights they may offer when investigating unusual telemetry patterns or data anomalies.

### 1\. Countries

* Purpose: Identify geographical patterns that could explain anomalies.
* Column Name: `metadata.geo.country`
* Considerations:
* Are there ongoing national holidays or similar events that could affect data?
* Is the region known for bot activity or unusual behavior?

### 2\. ISP (Internet Service Provider)

* Purpose: Analyze data at a more granular level than countries to identify potential automation or bot activity.
* Column Name: `metadata.isp.name`
* Considerations:
* Could the anomaly be traced back to a single ISP, potentially indicating automation?
* Be mindful of the large number of ISPs; consider applying filters (e.g., `HAVING` clause) to exclude smaller ISPs.

### 3\. Product Version / Build ID

* Purpose: Check if issues began with a specific product version or build.
* Column Names: `client_info.app_display_version`, `client_info.app_build`
* Considerations:
* Did the issue arise after a particular version update? If so, collaborate with the product team to identify changes.
* Ensure that the build ID matches a known Mozilla build. If not, it could be a clone, fork, or side-load build.

### 4\. Glean SDK Version

* Purpose: Determine whether the issue is tied to a specific Glean SDK version.
* Column Name: `client_info.telemetry_sdk_build`
* Considerations:
* Did the anomaly start after an update to Glean? Work with the Glean team to verify version changes.

### 5\. Other Library Version Changes

* Purpose: Identify possible regressions due to library updates.
* Considerations:
* Review updates to Application Services, Gecko, and other dependencies (e.g., Viaduct, rkv) that could affect telemetry collection.

### 6\. OS SDK Version (Android, iOS)

* Purpose: Check if platform SDK changes are impacting data collection.
* Column Names: `client_info.os_version` (Android only: `client_info.android_sdk_version`)
* Considerations:
* Have there been changes to platform lifecycle events or background task behaviors (e.g., 0-duration pings, or ping submission issues)?

### 7\. Time Differences: start/end\_time vs. submission\_timestamp

* Purpose: Assess the delay between telemetry collection and submission.
* Column Names: `ping_info.parsed_start_time`, `ping_info.parsed_end_time`, `submission_timestamp`
* Considerations:
* Are the recorded timestamps reasonable, both in terms of the ping time window and the delay from collection to submission?

### 8\. Glean Errors

* Purpose: Identify [telemetry or network errors](../../metrics/error-reporting.md) related to data collection.
* Considerations:
* Are there networking errors, ingestion issues, or other telemetry failures that could be related to the anomaly?

### 9\. Hardware Details (Manufacturer/Version)

* Purpose: Determine if the issue is hardware-specific.
* Column Names: `client_info.device_manufacturer`, `client_info.device_model`
* Considerations:
* Does the anomaly occur primarily on older or newer hardware models?
Loading