Exemplar support: early thoughts #39

gagbo · 2023-05-19T16:35:32Z

gagbo
May 19, 2023
Maintainer

Hello,

I hope this post can be the start of a discussion around supporting exemplars in the data produced by autometrics, since I discovered a few "gotchas" when trying my hand at it in Go.

Feature availability

At least in Golang, openTelemetry exporters do not support adding exemplars to the metrics that get saved. So that means for the time being, only the Prometheus client implementation can make use of exemplars. Nothing to act on really, it's just the current state of affairs regarding exemplar integration.

Standard naming for the exemplars

Exemplars are arbitrary key-value pairs, so we need to add some specification for exemplars in Autometrics. I just used something simple for the time being:

trace_id, span_id, and parent_id for the labels (exemplar will be absent if not available)
the values are hexstrings, representing the values they have (since TraceID are 16 bytes, and SpanID are 8 bytes, the length of the string is already known too)

The need for middleware libraries

If we want to use exemplar to add trace IDs in the metrics we save, that means we probably want the metrics to transparently reuse tracing information that is already present if detected (so that, for example, the trace_id in an exemplar can be used as a filter in a log search). I think this means we need to look into also shipping middlewares for a few popular libraries, to try and get (or set) a trace identifier in the contexts of a function being executed.

Of course this could only be possible if the context of the request is available as a function argument or some runtime-specific object, currently for Golang I'm thinking about reusing context.Context to read/write the data, but that means 2 things:

autometrics-go now needs to read function signatures to automatically detect if there's something usable for a context (if the parameters of the instrumented function contains a context.Context)
autometrics-go now needs to ship middleware libraries to inject trace information at runtime (in the router/entry point part of the code)

A solution for those 2 points would be to have developers rewrite part of their code so that autometrics can reliably detect those "context variables", imagine:

asking Rust devs to supply an argument that's runtime callable code to the macro

#[autometrics(trace_id = r.headers().get('request_id'))]
fn handler(r Request) -> Result<(), ServerError> { 
    // ...
}

or asking Go devs to modify their code/pass raw code snippets in the go generator arguments

I thought it wasn't a great solution because it seems to hurt the development experience too much, but if there's any other idea I'm all ears (and eyes).

Note that specifically for Go, the use of a middleware library is mandatory, at least to wrap all the HTTP route handlers like what the TS implementation does. It is the only way to (simply and) reliably inject trace context in a request and detect the return code of a route to report error on non-2xx codes

Setting up prometheus server and metric names

Having the prometheus server working with exemplar is done by simply adding a flag to the instance being launched: --enable-feature=exemplar-storage.

The feature is called experimental and I already encountered a tricky issue: if the metric names for counters do not end with _total (case sensitive), then prometheus will fail scrapping the target with an unclear error message:

The solution for this problem was to change the name of the metric (even when using the Prometheus client library) to be function_calls_count_total. This issue must be taken into account when thinking about the naming of the metrics we (autometrics) produce.

I found a few similar issues like prometheus-net/prometheus-net#407 that seemed to find the naming to be the culprit.

Conclusion

So, the main points I want to raise/discuss are:

Prometheus won't work anymore with counter whose names don't end with _total once exemplar support is added. Should we scrap the _count then and just rename the metric function_calls_total?
What naming convention do we want in eventual middleware libraries if we do enable tracing?

To finish on a good note, it can work though, this is the current state of the WIP branch in autometrics-dev/autometrics-go#47

emschwartz · 2023-05-23T10:11:38Z

emschwartz
May 23, 2023

Thanks @gagbo for kicking off this exploration and discussion!

So that means for the time being, only the Prometheus client implementation can make use of exemplars.

Makes sense -- and I think that's fine, honestly. We can just wait until support lands in the OTel versions and then update the autometrics libraries accordingly.

trace_id, span_id, and parent_id for the labels (exemplar will be absent if not available)

Those sounds good to me!

The need for middleware libraries

For the context stuff, I think that'll depend a lot on the language. My general approach would be to figure out what the most popular tracing library is, add that as an (ideally optional) dependency, and pick up the trace and span details if they are present.

For Rust, I would imagine having an exemplars feature flag or something like that, which would include tracing as a dependency. Then, we can look through the fields on the current span and find the trace ID and the other details we need.

Having the prometheus server working with exemplar is done by simply adding a flag to the instance being launched: --enable-feature=exemplar-storage.

We'll need to update the docs on configuring Prometheus for this. Once we start working on this in earnest, we should probably have a whole section in the docs dedicated to exemplars. cc @keturiosakys

Prometheus won't work anymore with counter whose names don't end with _total once exemplar support is added. Should we scrap the _count then and just rename the metric function_calls_total?

I opened an issue to track this here: #21

To finish on a good note, it can work though, this is the current state of the WIP branch in autometrics-dev/autometrics-go#47

That's great!

0 replies

emschwartz · 2023-07-12T13:54:52Z

emschwartz
Jul 12, 2023

The status of exemplar support across the libraries is being tracked in #18

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autometrics

Exemplar support: early thoughts #39

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Autometrics

Exemplar support: early thoughts #39

gagbo May 19, 2023 Maintainer

Feature availability

Standard naming for the exemplars

The need for middleware libraries

Setting up prometheus server and metric names

Conclusion

Replies: 2 comments

emschwartz May 23, 2023

emschwartz Jul 12, 2023

gagbo
May 19, 2023
Maintainer

emschwartz
May 23, 2023

emschwartz
Jul 12, 2023