-
Notifications
You must be signed in to change notification settings - Fork 5.5k
fix(native): Fix OS metrics to report cumulative values for AVG type #26517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
All 6 OS-related metrics were defined as **AVG** type but reported as **delta values**, causing incorrect averaging and potential data loss in Prometheus monitoring. Changed metrics to report **cumulative values** since process start: - presto_cpp.os_user_cpu_time_micros - presto_cpp.os_system_cpu_time_micros - presto_cpp.os_num_soft_page_faults - presto_cpp.os_num_hard_page_faults - presto_cpp.os_num_voluntary_context_switches - presto_cpp.os_num_forced_context_switches This ensures: 1. Alignment with other AVG metrics in the system (task counts, cache sizes, etc.) 2. Proper rate calculations in monitoring systems and no data loss regardless of scraping intervals
Reviewer's guide (collapsed on small PRs)Reviewer's GuideThis PR converts six OS-related metrics from reporting delta values to reporting cumulative values by eliminating subtraction of previous readings and removing obsolete state variables used for delta calculations. Class diagram for updated PeriodicTaskManager OS metrics logicclassDiagram
class PeriodicTaskManager {
-lastHttpClientNumConnectionsCreated_: int64_t
+updateOperatingSystemStats()
+addOperatingSystemStatsUpdateTask()
}
%% Removed attributes for OS metric deltas
%% lastUserCpuTimeUs_, lastSystemCpuTimeUs_, lastSoftPageFaults_, lastHardPageFaults_, lastVoluntaryContextSwitches_, lastForcedContextSwitches_ are no longer present
Flow diagram for OS metrics reporting change (delta to cumulative)flowchart TD
A["Collect OS metric (e.g., user CPU time)"] --> B["Report cumulative value since process start"]
B --> C["RECORD_METRIC_VALUE(metric, cumulative_value)"]
%% Previously: A --> D["Subtract previous value (delta)"] --> C
%% Now: direct cumulative reporting
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@majetideepak Could you help review this PR? Thanks. |
|
@majetideepak @karteekmurthys @aditi-pandit Kindly ping. Could you please review this PR? This issue affects the accuracy of Prometheus monitoring metrics. |
Fixes #26516
All 6 OS-related metrics were defined as AVG type but reported as
delta values, causing incorrect averaging and potential data loss
in Prometheus monitoring.
Changed metrics to report cumulative values since process start:
This ensures:
cache sizes, etc.)
regardless of scraping intervals