Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal - HTTP request count semantic convention #1362

Open
nerdondon opened this issue Nov 23, 2022 · 11 comments
Open

Proposal - HTTP request count semantic convention #1362

nerdondon opened this issue Nov 23, 2022 · 11 comments
Assignees
Labels
enhancement New feature or request

Comments

@nerdondon
Copy link

nerdondon commented Nov 23, 2022

What are you trying to achieve?

Add instruments for HTTP client and server request counts.

Additional context.

The current HTTP semantic conventions only has an instrument for active requests (http.server.active_requests). This proposal is to add counters like http.server.request.count and http.client.request.count. This seemed to be part of the original PR for HTTP semantic conventions as http.{type}.requests but was lost somehow. I think this is a prime candidate to be codified as a semantic convention because of very similar instrumentation across service meshes:

This metric is also important for capturing QPS and deriving error rate.

If this is desirable, I think I would be willing to take a crack at adding it.

@mateuszrzeszutek
Copy link
Member

The http.server.duration metric is a histogram, and histograms capture the total count of observations. Does that not work for you?

@tsloughter
Copy link
Member

An count aggregator would be useful for instances like this.

I've been meaning to send a PR to suggest removing this from the metrics SDK spec:

Customize the aggregation - if the default aggregation associated with the Instrument does not meet the needs of the user. For example, an HTTP client library might expose HTTP client request duration as Histogram by default, but the application developer might only want the total count of outgoing requests.

Because it is not currently supported to aggregate to just a count -- I guess except for a 1 bucket histogram, which isn't really intuitive or what this is really saying.

@arminru arminru added the enhancement New feature or request label Nov 28, 2022
@nerdondon
Copy link
Author

nerdondon commented Nov 28, 2022

The http.server.duration metric is a histogram, and histograms capture the total count of observations. Does that not work for you?

@mateuszrzeszutek that seems be fine if not somewhat non-ergonomic as @tsloughter is pointing out. A developer in this case would have to report histogram measurements exactly or override and use a 1 bucket histogram.

@tsloughter I'm not sure what the exact issue is regarding supporting aggregation to a count though. Can you elaborate? I thought the histogram should just expose a property count that represents the total population of points.

Edit: Also, wanted to ask what it would look like for a client that wanted to report different sets of attributes between duration and count. Would they have to keep two different histogram measures of duration? It still seems more ergonomic that there is a dedicated count instrument.

Additionally, from the telemetry consumption side, it seems that there should be a defined count instrument. That way, clients claiming compliance with HTTP semantic conventions would need to provide a request count instrument.

cc: @jsuereth

@tsloughter
Copy link
Member

@nerdondon no issue, I think a count aggregator should be proposed. I guess my side note about the spec muddied that :)

@nerdondon
Copy link
Author

@SergeyKanzhelev or @jsuereth sorry for the bump but i just wanted to see ask if there was something I could do to get more movement on this?

@chameleon82
Copy link

In my preference to name that metric as rate: http.server.rate, http.client.rate

As user I'm interested in current successful rate and response rate. Means every metric should send both values one with tag status=success and with tag status=fail corresponding to both Success Rate and Error Rate metrics I can visualize with tools like Grafana.

as an alternative naming can follow http.{server|client}.rate.{success|error}

However definition of the error may be vary ( timeouts / 5xx errors or status >= 400 ) and must be specified as well.

Example of metrics with grafana tooling: https://grafana.com/grafana/plugins/novatec-sdg-panel/

Response Time            | in_timesum
Request Rate             | in_count
Error Rate               | error_in
Response Time (Outgoing) | out_timesum
Request Rate (Outgoing)  | out_count
Error Rate (Outgoing)    | error_out

@nerdondon
Copy link
Author

In my understanding of the current structure of semantic conventions, a proposal would involve the standardization of a particular instrument. In the case of my proposed the count, this is just a counter from the OTel API. I'm not sure what that would look like with a rate. In any case, rate is an aggregation over the instrument. Having the base instrument would allow other aggregations as desired. Also, note the prior art that I linked with regard to this proposal. Using a count, would enable an easier path to adoption of the conventions.

I don't want to muddy the waters here by bringing in a discussion on an attribute for status class (there's already another issue regarding that IIRC). This is specifically about an operations instrument count instrument that can be used to fulfill part of the use case you mentioned and others.

@tsloughter
Copy link
Member

I don't think an instrument is needed, only a Count aggregation.

@RangelReale
Copy link

+1 for this, I'm using Datadog and I'm seeing no way of extracting the count of the duration to do this metric.

@trask
Copy link
Member

trask commented Feb 9, 2023

+1 for this, I'm using Datadog and I'm seeing no way of extracting the count of the duration to do this metric.

it may be worth asking Datadog about this, since I believe other backends are getting the request count from the http.server.duration metric #1362

@SergeyKanzhelev SergeyKanzhelev removed their assignment Feb 18, 2023
@lmolkova lmolkova transferred this issue from open-telemetry/opentelemetry-specification Aug 22, 2024
@chameleon82
Copy link

With current conventions most of our concerns are solved with:

Metric PromQL example
Latency, P99 histogram_quantile(0.99, http.server.request.duration{})
Request Rate count(http.server.request.duration{})
Rate Increase rate(http.server.request.duration{})
Error Rate (count(http.server.request.duration{ http.response.status_code =~ "5.*"}) or vector(0)) / count(http.server.request.duration})
Inflight requests http.server.active_requests{}

I think it worth to have some documentation on how OTEL metric can be converted to operational metrics

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants