Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Error getting metrics profile: period between metrics collection is too small #2863

Closed
SvenGasterstaedt opened this issue Sep 13, 2024 · 1 comment · Fixed by #2865
Closed
Assignees
Labels
bug unintended behavior that has to be fixed profiler

Comments

@SvenGasterstaedt
Copy link
Contributor

SvenGasterstaedt commented Sep 13, 2024

When our service is terminating, the corresponding dd profiler is terminated as well via profiler.Stop()

This results in an interupt of the sleep period of the profiler in (which may on some occasions be smaller then 1 sec)
image
image
image

Version of dd-trace-go
v1.67.1

Describe what happened:
Datadog Tracer v1.67.1 ERROR: Error getting metrics profile: period between metrics collection is too small min=1s observed=573.458102ms; skipping. (occurred: 13 Sep 24 09:49 UTC)

Describe what you expected:
No error when the service is shutting down and the profiler is terminating (even if it just reported its metrics)

Steps to reproduce the issue:

Additional environment details (Version of Go, Operating System, etc.):
go 1.23.1

@SvenGasterstaedt SvenGasterstaedt added the bug unintended behavior that has to be fixed label Sep 13, 2024
@github-actions github-actions bot added the needs-triage New issues that have not yet been triaged label Sep 13, 2024
@nsrip-dd nsrip-dd added profiler and removed needs-triage New issues that have not yet been triaged labels Sep 13, 2024
@nsrip-dd
Copy link
Contributor

Thanks for the report. I'll see if we can handle this better.

@nsrip-dd nsrip-dd self-assigned this Sep 13, 2024
nsrip-dd added a commit that referenced this issue Sep 13, 2024
The metrics profiler wants to have at least one second between
collections to get meaningful numbers. It will return an error if it ran
less than a second from the last time. However, it's normal to stop it
early when shutting down the profiler. The error in this case isn't
useful.

Don't even bother enabling the metrics profiler if the period is shorter
than one second. And if the profiler is stopped, don't log that error.

Fixes #2863
nsrip-dd added a commit that referenced this issue Sep 19, 2024
The metrics profiler insisted on at least one second between collections
for two reasons:

- To avoid a division by zero because it was doing integer division to
  convert a time.Duration to seconds, which will truncate to 0, as a
  ratio in a subsequent computation
- In case "a system clock issue causes time to run backwards"

The profiler would report an error if less than one second elapsed
between collections. In practice, this resulted in misleading error logs
because it's entirely likely for profiling to be stopped less than a
second after the last profile collection.

The restriction was not really even needed. For one, we can just do
floating-point division rather than integer division to avoid the
truncation problem.

Also, Go has had monotonic time support by default since 2017, added in
Go 1.9, and time comparison operations including time.Time.Sub, work
with respect to monotonic time. We shouldn't have any issues with
negative periods. We can ensure the period is positive just as a
defensive measure, and fail if it's negative since this may indicate a
bug in the Go runtime if it's violating the monotonicity guarantees.

Fixes #2863
MNThomson pushed a commit to agilebits/dd-trace-go that referenced this issue Oct 21, 2024
…taDog#2865)

The metrics profiler insisted on at least one second between collections
for two reasons:

- To avoid a division by zero because it was doing integer division to
  convert a time.Duration to seconds, which will truncate to 0, as a
  ratio in a subsequent computation
- In case "a system clock issue causes time to run backwards"

The profiler would report an error if less than one second elapsed
between collections. In practice, this resulted in misleading error logs
because it's entirely likely for profiling to be stopped less than a
second after the last profile collection.

The restriction was not really even needed. For one, we can just do
floating-point division rather than integer division to avoid the
truncation problem.

Also, Go has had monotonic time support by default since 2017, added in
Go 1.9, and time comparison operations including time.Time.Sub, work
with respect to monotonic time. We shouldn't have any issues with
negative periods. We can ensure the period is positive just as a
defensive measure, and fail if it's negative since this may indicate a
bug in the Go runtime if it's violating the monotonicity guarantees.

Fixes DataDog#2863
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unintended behavior that has to be fixed profiler
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants