Histogram metrics are much larger in v1.23.0 #3959

colincadams · 2024-06-07T00:54:51Z

Describe your environment Describe any aspect of your environment relevant to the problem, including your Python version, platform, version numbers of installed dependencies, information about your cloud hosting provider, etc. If you're reporting a problem with a specific version of a library in this repo, please check whether the problem has been fixed on main.

We noticed a very large increase in our GCM cost due to an increase in metrics bytes ingested for our base histogram metrics (e.g. http.client.duration). This coincided with an upgrade to v1.23.0. A subsequent downgrade to v1.22.0 led to a decrease in the bytes ingested and cost increases back to their prior levels.

This commit is the revert: Recidiviz/pulse-data@d321a4e

Steps to reproduce
Describe exactly how to reproduce the error. Include a code sample if applicable.

Upgrade to v1.23.0 or later (only tested up to v1.24.0, so it is possible it has been fixed)

What is the expected behavior?

No increase in bytes ingested by GCM for histogram metrics.

What is the actual behavior?

Order of magnitude increase in cost.

Additional context

I haven't taken the time to fully understand the changes here, but if this PR led to all of the buckets always being created, and before that was not the case, this could be the culprit: #3429

The text was updated successfully, but these errors were encountered:

aabmass · 2024-06-10T16:14:07Z

The fix in #3429 might be the culprit. IIRC the previous behavior (see #3407) was that histograms would not be sent from the SDK to the exporter if there had been no observations since the last export.

Does your app have low QPS or low QPS for certain routes?

colincadams · 2024-06-10T16:46:04Z

@aabmass Yes, this is for a quite low traffic application, so that does seem likely to be the root cause

aabmass · 2024-06-10T18:28:21Z

@colincadams what is your export interval? You may be able to achieve similar cost savings by exporting less often

colincadams · 2024-06-22T00:07:05Z

Our export interval is 60s, we could certainly reduce it and that would help with cost savings. Did anything about bucket creation change? It seems like a pretty large increase just for reporting frequency, especially given the cardinality of these metrics should be relatively low, but it's possible that's it.

colincadams added the bug Something isn't working label Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Histogram metrics are much larger in v1.23.0 #3959

Histogram metrics are much larger in v1.23.0 #3959

colincadams commented Jun 7, 2024

aabmass commented Jun 10, 2024

colincadams commented Jun 10, 2024

aabmass commented Jun 10, 2024

colincadams commented Jun 22, 2024

Histogram metrics are much larger in v1.23.0 #3959

Histogram metrics are much larger in v1.23.0 #3959

Comments

colincadams commented Jun 7, 2024

aabmass commented Jun 10, 2024

colincadams commented Jun 10, 2024

aabmass commented Jun 10, 2024

colincadams commented Jun 22, 2024