-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal to Adopt Tokio Tracing as the OTel Tracing API #1689
Comments
Using span kinds is somewhat niche use case that pretty much only middle wares (be it server or client) really care about. What I've been thinking about lately was to create otel::info_span!(kind: server, otel_parent: &extracted_context, target: "target", "span_name", ?tracing.field, %another.field); which would get translated to
This would guarantee that people don't mistype the names and the macro can also check for correct types. This can also be used to provide parent context at the time of span creation so that a spans always either have the same from their creation or they're root spans, but it will not change as it can now. I think that caused some issues with pairing contexts to logs in the appender, right? |
Not sure if that makes it niche! Queueing/Messaging scenarios also need to convey the SpanKind. A lot of users would be just fine with "internal (the default)" Span Kind. So the existing macros can continue to work as-is, but new ones on top can be written for use by those users who need a diff. Span Kind than the default. Very similar to how |
Tagging @open-telemetry/rust-approvers Please share your comments/thoughts. |
I've added this to an upcoming GC meeting agenda. |
How is this expected to work regarding feature development? Is the Tokio community willing to join the OTel spec going forwards? |
I would also point out that OpenTelemetry isn't just a tracing API; How is this going to work with metrics, logs, profiles, events, and any other future API surface area the spec defines? |
@austinlparker This is strictly limited to the tracing API (tracer/span one). |
As I have mentioned in the issue itself, none of this would work without tokio-tracing willing to make the changes required. (not just now, but in the future). The prototype is to explore if we really need changes in tokio-tracing, and if yes, what are the exact changes, and then tokio-tracing can make a decision.
I am not so sure if I understand this part of the question? Could you clarify? Did you mean if tokio-tracing maintainers should join OTel spec as approver and/or attend spec meetings? From what I can tell, that is not required. If there is a new requirement in OTel Tracing API, the OTel Rust community can propose the same for tokio-tracing. |
Thanks. I've briefly mentioned about this informally to Trask, Jack (in a different context). Did not want to bother TC/GC officially, until OTel Rust community and Tokio-tracing maintainers have expressed willingness to move in this direction, but happy to get an early bless/block on the idea. |
Speaking personally, I would much rather the GC gets something like this sooner rather than later so that we can provide feedback and understand it better. :) |
Thanks. Noted! By the way, there is no harm if GC/TC or Tracing or OTel Rust rejects this proposal! It'd be a very valuable learning for us. Also this is just option2 from the original issue: #1571 and I volunteered to write down option2 in detail, while other maintainers volunteered to write down option3 in detail. None has signed up for option1 and option4 yet - most likely I'll take a stab at them. |
I can take a stab at Option 4, if no one has yet started on this. |
Basing an OTel SDK implementation on another tool (Tokio for Rust, Activity for .NET) doesn't feel right to me. The .NET SDK has had to make concessions to support being based on .NETs Activity framework, including deviating from the OTel spec. When combing tools like this, we nearly always see conflicting concepts and wording because they share a similar problem space. This results in implementation nuance where the same term means something different in different places. I believe the OTel SDKs should be independent & spec compliant tools that can leverage languages features and popular frameworks through instrumentation. This may be a naive question but what is stopping the Rust SDK from providing a rich Tokio instrumentation experience without adopting it's tracing API? |
@MikeGoldsmith I had the same question. In the other discussion thread I mentioned that perhaps the bridge packages in the opentelemetry-go SDK might be a good pattern to follow if feasible.
💯 |
@MikeGoldsmith, thank you for your insights! I've referenced the situation with OTel .NET in the parent issue as well, particularly highlighting the challenges with spec compliance. If OTel Rust were to follow a similar path, we might encounter comparable challenges. However, the prototypes suggest that the impact might be less severe, as Tokio Tracing already aligns closely with the OTel Tracing API spec—right down to using the same terminology, like "Span". (avoiding .NET's biggest confusion by calling "Activity" to mean "Span"!) @diurnalist, thanks a lot for taking time to share your thoughts! Indeed, the bridge between Tokio Tracing and OpenTelemetry already exists, which is a testament to the integration efforts between these two ecosystems. Based on my discussions with the OTel Rust community during SIG calls and on Slack, I got the impression that nearly everyone is already leveraging Tokio Tracing alongside this bridge. This significant community adoption is what initially triggered the discussion about potentially rethinking the role of the OTel Tracing API in favor of a more integrated approach with Tokio Tracing. We've discussed at least four different options, and this issue focuses on one where the OTel sacrifices its tracing API to officially recognize and recommend Tokio Tracing. (We are yet to write details about other options.) One of the key principles we have been striving to achieve was to ensure a single instrumentation API for end-users. |
@cijothomas very cool about the bridge! Perhaps it would make sense to eventually pull the bridge under the OTel GitHub organization, but that is up to the community and the maintainers.
If I may briefly make the argument :) - there are a few classes of end-users. One class is who we might typically think of, the developers who are building something and want to utilize traces, and for those users I don't think they care too much about which client they use to emit traces as long as they are shipped off as OTLP. These are SDK users--they might not really care as much about the APIs they're using, but have the responsibility of providing some concrete implementation that outputs OTLP. The other class of end-users consists of library authors who want to implement instrumentation hooks. Library authors who wish to instrument their library with logs, metrics, and/or traces will have to implement hooks in tokio (for traces) and some mixture of other libraries and/or the OTel core API (for metrics and logs.) From a library author's perspective, I would guess they would prefer to do that w/ one single API, hence an OTel API crate will likely always exist in practice, hence it may make sense to consider that one the standard one for tracing in Rust. Library authors are API users as opposed to SDK users--they don't care about the API's concrete implementation, they just want to integrate against it. Tokio can continue to maintain its own trace interface and developers may opt to use that SDK in conjunction w/ the bridge. I can imagine that over time it may appear that it's simpler to not use Tokio+bridge and developers will prefer to use a single SDK for everything, and that will be the OTel SDK. And, library maintainers may prefer to switch to integrating against the OTel API crate rather than requiring their end-users set up this bridge. I have been seeing this happen in the Go ecosystem, oddly enough w/ Google's own client libraries, which for a long time still outputted data via OpenCensus, but have now finally moved to OTel, removing the need for end-users to (a) know about and (b) configure the opencensus bridge if they are using a Google client in their Go project. At least, that's what I'm reading in the tea leaves :) |
Well, this proposal is doing exactly that. The prototype is a "mini" version of that bridge. The additional thing proposed is to deprecate the OTel Tracing API, and officially bless Tokio-tracing. (even without blessing, that is where Rust community is already.) |
I am not sure I follow this part...
I fully agree that users would prefer to see a single API. But the "OTel API crate will always" exist part is not clear. If this proposal moves forward, there won't be OTel Tracing API. (the crate itself exists, as it need to do other things like out-of-proc-propagation/baggage/metrics/logs etc., but it won't contain span APIs.)
To be clear: tokio-tracing is a facade only. (similar to OTel APIs). A subscriber needs to be enabled for things to light up..
The current state is that there is a single SDK (but 2 APIs), and this proposal does not change the SDK part, but reduce APIs from 2 to just 1. I'd be happy if there was a single instrumentation API, but multiple SDK implementations (Alternate SDK implementation is something OpenTelemetry explicitly allows)
The goal of this proposal is same. Users need not have to pick between 2 competing APIs, as there is only one. There is no bridge either!. (Sorry I may not have fully understood your comments.) We'll be discussing this proposal (and alternates) more in our weekly SIG calls. I would be happy to continue the discussion there if you can join us. The meetings are scheduled for Tuesdays at 9 AM Pacific Time, but we're always open to different times to accommodate more folks. |
There are ~144k creates in Cargo(Rust official package management tool), ~7k creates dependent on This leaves us two choice:
There are required changes needed on the tracing part and he long term commitment to support new Otel features from |
While tokio tracing is the widely adopted in the Rust community, there is another (relatively small) tracing library minitrace which also provides the OpenTelemetry integration. The decision here is going to affect it, so tagging @zhongzc and @andylokandy - the active developers for visibility, and in case they have any comments to add. |
Thank you @lalitb for pinning minitrace. As minitrace has fewer API coupling to otel comparing to tokio-tracing (no builtin logging, or realtime span reporting, etc), the migration proposing in this issue has little impact to minitrace so long as there is still an API to upload spans to the otel collector. Anyway, I'm glad that it'll be a good chance to refactor the crate |
Few thoughts considering the current Rust ecosystem -
|
Is there a world where we can get Tokio tracing to be donated to OTel, they stay on as the main maintainers, and its the API we adopt for tracing? Then alongside we can fit in the OTel features like Baggage and cross process propagation etc. |
Yes! Infact, I was writing details on how that'd look like. Will open as an issue later today with the details. |
So, I would like to talk to the Rust maintainers, as I do not speak for them. But as a GC member and co-founder of OpenTelemetry, I have opinions. :) First and foremost, OpenTelemetry as a project is deeply committed to long term API support and backwards compatibility. We never break compatibility, and are especially sensitive to and changes that would cause the API to create dependency conflicts. Tracing in particular is a cross-cutting concern, dependency conflicts and broken instrumentation would create a ripple effect across many applications and libraries managed by different teams and OSS communities. Regardless of how we proceed, I don't see any reason why we need to drop support for the existing API. Even in the case where we wanted to retire the SDK in favor of just using the Tokyo implementation, I would want to see support for the OpenTelemetry API continue. btw, I believe that Otel API support is already in there? Sorry, I'm only just getting up to speed on the Rust ecosystem. I do have questions about metrics and logs, as Otel isn't just tracing. Are there suggestions for how these APIs and implementations would be handled? I see some of this discussion in #1571, so I might continue this conversation there. |
That is not the plan! The OTel SDK would continue adhering to OTel SDK specs, but it'll now officially recognize tokio-tracing as it's API. This is partially true today also, as the OTel SDK has done lot of special casing just to support tokio-tracing! One might view this proposal as us saying, "Enough dating—let's get married!" There are larger number of comments on this thread. I think it'd be better to continue this in the SIG/Community calls. |
No change to metrics and logs. They are already following OTel specs. |
Hi folks: sorry the delay in responding. I'll try to go comment-by-comment. Required Changes in tracingCijo:
I don't know how I feel about this change, honestly: this would require a breaking change to
We're considering the future of
I've wanted a notion of "typed targets" in tracing for a while, but I think it still requires some better const evaluation. No opposition to this in principle, the only concern I'd have is "can this be made
Yeah, this is unfortunately a pretty hard requirement. I can elaborate more, but its key to
I think we've done a good job about not breaking users, but I think any work along that risks breakage will be publicized and taken with extreme care for the aforementioned highlander rule if nothing else. Governance/Approaches
I don't believe so/unsure. To be frank, I don't see a future in which
Cijo, Lalit, and Zhongyang explained it better than I could in their respective comments (and with greater tact and care—sorry, I haven't had the time to edit this better...), but the dynamic that exists in Rust today is what you've roughly described, except for "a single SDK for everything, and that will be the OTel SDK": the SDK that people end up using is |
OK... but unless that SDK aligns with the OTel SDK, I don't see how this would work? OpenTelemetry is more than just tracing - it's logs, metrics, events, profiles, and whatever else comes up in the future. |
There might be a disconnect between us: the Is the goal of the OpenTelemetry governing committee to have an OpenTelemetry-complaint SDK be the dominant library providing logging/tracing/metrics functionality in each language? If so, I think that's an ambitious and commendable goal, but I worry that—at least for Rust—the migration to an OpenTelemetry-complaint SDK would cause non-trivial disruption in the larger Rust ecosystem and close the door on language-idiomatic optimizations, both in performance and aesthetics. What was the resolution of the GC meeting on May 2, by the way? The meeting minutes didn't capture that. |
In otel terminology, SDK is the code that is responsible for sending data through otlp and other protocols, not for collecting observability data, right? Then yes, as was said earlier, the SDK would be compliant to the spec and some kind of bridge like
Or am I understanding this whole thing wrong? |
@mladedav You are right. There is some confusion about the proposal, probably due to the word "SDK" having the special meaning in OTel world. I'll try to clarify.
Tokio Tracing
@austinlparker The OTel Rust SDK (named |
@cijothomas thanks for the clarification. |
I think brings up something which I have kind been wondering about for a while now. Along with the pressure to stabilize it feels like we're trying to shove a square peg in a round hole sometimes. And to some extent allowing compatibility provides people the ability to save processing where they need to and have the richer features set where they don't. We are striving for best performance for the library, but there are some inherent things in the standard which won't always lead to the best possible performance. The compile time tags are one example of this in Going back to this timeline I feel that this artificial pressure to have all the languages stable while it's great to have stability it's a bit of a farce to push a specific timeline. This is an open source project which not everyone gets paid to work on. So if we want to have deadlines where's the support? And I don't think the answer is to push more people who are unfamiliar with Rust to ramp up on it just to push this goal. P.s. Sadly since we've started this discussion I have not had as much time to work on the other option which is supporting compatibility between the facades. |
Speaking both personally and as a GC member, I'd be curious where you see pressure coming to stabilize from? We're pushing to stabilize the specification and semantic conventions (for the benefit of both end-users and implementors), but I don't feel like we're trying to get language SIGs to rush towards stability. Personally, I'd be more invested in languages adopting a breadth of spec features, and trying to make idiomatic implementations of them, vs. rushing to hit completeness. More generally, I think there's somewhat of an open question about the goal of the SDK in each language. Certainly, the goal is for OpenTelemetry to be 'built in' to languages, runtimes, libraries, frameworks, etc. What does 'built in' mean? Does it mean that we expect OTLP data to be available? Does it mean that if I, as a developer, write against an OpenTelemetry API that my telemetry 'just works' with other telemetry (e.g., if I'm writing business logic as part of an HTTP route handler and I add an attribute, that attribute gets added to a span that's been created by semconv-compliant telemetry generators)? Part of the rationale behind how OpenTelemetry is run, as a project, is that we expect languages to create idiomatic ways to accomplish these goals -- and those language SIGs should feel like they have the freedom to do so. The practical reality, though, is that we specify the API and SDK because to accomplish some of the aforementioned goals we do need everyone to be on the same page in terms of 'how do I get the current span in context', or 'how do i pass context headers', etc. |
Introduction
This issue builds upon Option 2 from the discussion OTel Tracing vs Tokio-Tracing, proposing a strategic shift in our approach to traces in OpenTelemetry Rust.
Summary
We propose deprecating the OpenTelemetry (OTel) Tracing API in favor of adopting Tokio Tracing as the official Tracing API, requiring re-instrumentation for those apps already using OTel Tracing API. The functionalities already provided by the OpenTelemetry Tracing SDK, such as Sampling, SpanProcessors, and Exporters, would remain largely unchanged.
Details
tracing
macros act as no-ops when no subscriber is configured, consistent with the current OTel API.tracing
macros for span generation, requiring re-instrumentation.tracing
's methods for in-process context propagation (#instrument
attribute or usingspan.in_scope(closure)
), necessitating re-instrumentation. For out-of-process context propagation, OpenTelemetry's abstractions will continue to be used, leveraging the strengths of bothtracing
for in-process and OpenTelemetry for out-of-process propagation.opentelemetry-sdk
crate and configuration of theTracerProvider
will remain mostly unchanged.tracing
as the API, there will no longer be a need fortracing-opentelemetry
, which currently bridges tracing to OTel. Additionally, theopentelemetry-appender-tracing
could be integrated into the OTel SDK itself, allowing users to decide if events fromtracing
should be converted toSpanEvent
s orLogRecord
s resolving issues like this.Advantages
Challenges
tracing
: Necessary modifications intracing
to support OpenTelemetry scenarios are crucial. An initial set of required changes is listed below.tracing
was never written with OTel specs in mind, there would always be some differences compared to other languages OTel based Tracing API. For example, Span macros intracing
has the notion of Level, which is not yet present in OTel.Required Changes in
tracing
tracing-opentelemetry
bridge currently handles this through specially named attributes, serving as a short-term workaround.follow_from
in tracing might be the equivalent, but this requires more prototyping.tracing
is equivalent to instrumentation scope, and could serve as a reasonable alternative, though it lacks support for version, schema URL, and additional descriptive key-value pairs.tracing
macros require attribute keys to be declared at compile time. This is probably manageable as shown by implementations such as reqwest-tracing and axum-tracing-opentelemetry.tracing
is not a 1.0 crate yet. This is not necessarily an issue, but OTel Rust and Tokio Tracing should require some co-ordination when planning future releases.Prototype
A minimal prototype is available here: https://github.com/cijothomas/opentelemetry-tracing/tree/main/src
Please share your thoughts, concerns, and issues we may have overlooked. We recognize that re-instrumentation is a considerable effort and not something anyone looks forward to. However, this small investment now is expected to yield significant long-term benefits, setting OpenTelemetry Rust up for greater success in the future.
The text was updated successfully, but these errors were encountered: