[BUG] Error getting metrics profile: period between metrics collection is too small #2863

SvenGasterstaedt · 2024-09-13T12:17:30Z

When our service is terminating, the corresponding dd profiler is terminated as well via profiler.Stop()

This results in an interupt of the sleep period of the profiler in (which may on some occasions be smaller then 1 sec)

Version of dd-trace-go
v1.67.1

Describe what happened:
Datadog Tracer v1.67.1 ERROR: Error getting metrics profile: period between metrics collection is too small min=1s observed=573.458102ms; skipping. (occurred: 13 Sep 24 09:49 UTC)

Describe what you expected:
No error when the service is shutting down and the profiler is terminating (even if it just reported its metrics)

Steps to reproduce the issue:

Additional environment details (Version of Go, Operating System, etc.):
go 1.23.1

The text was updated successfully, but these errors were encountered:

nsrip-dd · 2024-09-13T12:22:00Z

Thanks for the report. I'll see if we can handle this better.

The metrics profiler wants to have at least one second between collections to get meaningful numbers. It will return an error if it ran less than a second from the last time. However, it's normal to stop it early when shutting down the profiler. The error in this case isn't useful. Don't even bother enabling the metrics profiler if the period is shorter than one second. And if the profiler is stopped, don't log that error. Fixes #2863

The metrics profiler insisted on at least one second between collections for two reasons: - To avoid a division by zero because it was doing integer division to convert a time.Duration to seconds, which will truncate to 0, as a ratio in a subsequent computation - In case "a system clock issue causes time to run backwards" The profiler would report an error if less than one second elapsed between collections. In practice, this resulted in misleading error logs because it's entirely likely for profiling to be stopped less than a second after the last profile collection. The restriction was not really even needed. For one, we can just do floating-point division rather than integer division to avoid the truncation problem. Also, Go has had monotonic time support by default since 2017, added in Go 1.9, and time comparison operations including time.Time.Sub, work with respect to monotonic time. We shouldn't have any issues with negative periods. We can ensure the period is positive just as a defensive measure, and fail if it's negative since this may indicate a bug in the Go runtime if it's violating the monotonicity guarantees. Fixes #2863

…taDog#2865) The metrics profiler insisted on at least one second between collections for two reasons: - To avoid a division by zero because it was doing integer division to convert a time.Duration to seconds, which will truncate to 0, as a ratio in a subsequent computation - In case "a system clock issue causes time to run backwards" The profiler would report an error if less than one second elapsed between collections. In practice, this resulted in misleading error logs because it's entirely likely for profiling to be stopped less than a second after the last profile collection. The restriction was not really even needed. For one, we can just do floating-point division rather than integer division to avoid the truncation problem. Also, Go has had monotonic time support by default since 2017, added in Go 1.9, and time comparison operations including time.Time.Sub, work with respect to monotonic time. We shouldn't have any issues with negative periods. We can ensure the period is positive just as a defensive measure, and fail if it's negative since this may indicate a bug in the Go runtime if it's violating the monotonicity guarantees. Fixes DataDog#2863

SvenGasterstaedt added the bug unintended behavior that has to be fixed label Sep 13, 2024

github-actions bot added the needs-triage New issues that have not yet been triaged label Sep 13, 2024

nsrip-dd added profiler and removed needs-triage New issues that have not yet been triaged labels Sep 13, 2024

nsrip-dd self-assigned this Sep 13, 2024

nsrip-dd mentioned this issue Sep 13, 2024

profiler: avoid metrics profile log noise when stopping profiling #2865

Merged

nsrip-dd closed this as completed in #2865 Sep 24, 2024

nsrip-dd closed this as completed in 101d4da Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Error getting metrics profile: period between metrics collection is too small #2863

[BUG] Error getting metrics profile: period between metrics collection is too small #2863

SvenGasterstaedt commented Sep 13, 2024 •

edited

Loading

nsrip-dd commented Sep 13, 2024

[BUG] Error getting metrics profile: period between metrics collection is too small #2863

[BUG] Error getting metrics profile: period between metrics collection is too small #2863

Comments

SvenGasterstaedt commented Sep 13, 2024 • edited Loading

Steps to reproduce the issue:

nsrip-dd commented Sep 13, 2024

SvenGasterstaedt commented Sep 13, 2024 •

edited

Loading