You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#430 switched process_cpu_seconds_total from an (f64) Counter to an IntCounter, for performance reasons.
Issue is that processes that don't use a lot of CPU will now have 1 second CPU usage spikes every so often instead of a more or less flat usage with spikes occurring when they actually do, not when 1 second of CPU usage has accumulated. I.e. this has turned the metric from a smooth growing floating point value into a step function
E.g. a process using 1% CPU on average, with a 10% spike every minute will now look as if it is using 0% CPU most of the time and spike every 2 minutes out of 3.
Considering that most processes export hundreds if not thousands of metrics, I really don't think the marginal performance improvement is worth this regression. Please keep using Counter for cpu_total.
Happy to make the changes myself, if they're likely to be accepted.
The text was updated successfully, but these errors were encountered:
Oh, almost forgot: #430 also switched process_start_time_seconds from a Gauge to an IntGauge. While all other changes in that PR were reasonable (file descriptors, memory usage), process_start_time_seconds is also not an integer. Linux (at least) provides the process start time as a floating point value and truncating it to an integer discards useful information (e.g. if I want to know the order in which two processes on the same machine got started; or how long it took to start one vs. the other).
This is more of a nice to have as compared to process_cpu_seconds_total above, but still useful.
#430 switched
process_cpu_seconds_total
from an (f64
)Counter
to anIntCounter
, for performance reasons.Issue is that processes that don't use a lot of CPU will now have 1 second CPU usage spikes every so often instead of a more or less flat usage with spikes occurring when they actually do, not when 1 second of CPU usage has accumulated. I.e. this has turned the metric from a smooth growing floating point value into a step function
E.g. a process using 1% CPU on average, with a 10% spike every minute will now look as if it is using 0% CPU most of the time and spike every 2 minutes out of 3.
Considering that most processes export hundreds if not thousands of metrics, I really don't think the marginal performance improvement is worth this regression. Please keep using Counter for cpu_total.
Happy to make the changes myself, if they're likely to be accepted.
The text was updated successfully, but these errors were encountered: