Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Histogram metrics are much larger in v1.23.0 #3959

Open
colincadams opened this issue Jun 7, 2024 · 4 comments
Open

Histogram metrics are much larger in v1.23.0 #3959

colincadams opened this issue Jun 7, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@colincadams
Copy link

Describe your environment Describe any aspect of your environment relevant to the problem, including your Python version, platform, version numbers of installed dependencies, information about your cloud hosting provider, etc. If you're reporting a problem with a specific version of a library in this repo, please check whether the problem has been fixed on main.

We noticed a very large increase in our GCM cost due to an increase in metrics bytes ingested for our base histogram metrics (e.g. http.client.duration). This coincided with an upgrade to v1.23.0. A subsequent downgrade to v1.22.0 led to a decrease in the bytes ingested and cost increases back to their prior levels.

Screenshot 2024-06-06 at 5 49 44 PM

This commit is the revert: Recidiviz/pulse-data@d321a4e

Steps to reproduce
Describe exactly how to reproduce the error. Include a code sample if applicable.

Upgrade to v1.23.0 or later (only tested up to v1.24.0, so it is possible it has been fixed)

What is the expected behavior?

No increase in bytes ingested by GCM for histogram metrics.

What is the actual behavior?

Order of magnitude increase in cost.

Additional context

I haven't taken the time to fully understand the changes here, but if this PR led to all of the buckets always being created, and before that was not the case, this could be the culprit: #3429

@colincadams colincadams added the bug Something isn't working label Jun 7, 2024
@aabmass
Copy link
Member

aabmass commented Jun 10, 2024

The fix in #3429 might be the culprit. IIRC the previous behavior (see #3407) was that histograms would not be sent from the SDK to the exporter if there had been no observations since the last export.

Does your app have low QPS or low QPS for certain routes?

@colincadams
Copy link
Author

@aabmass Yes, this is for a quite low traffic application, so that does seem likely to be the root cause

@aabmass
Copy link
Member

aabmass commented Jun 10, 2024

@colincadams what is your export interval? You may be able to achieve similar cost savings by exporting less often

@colincadams
Copy link
Author

Our export interval is 60s, we could certainly reduce it and that would help with cost savings. Did anything about bucket creation change? It seems like a pretty large increase just for reporting frequency, especially given the cardinality of these metrics should be relatively low, but it's possible that's it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants