Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor telemetry exporter: no way to report exception from consumer span. #32387

Open
vputsenko opened this issue Dec 31, 2024 · 6 comments
Open
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Monitor - Exporter Monitor OpenTelemetry Exporter question The issue doesn't require a change to the product in order to be resolved. Most issues start as that

Comments

@vputsenko
Copy link

vputsenko commented Dec 31, 2024

Due to 21ee8ad#diff-91578a36a9cd93bc063569087dd9e7421dd054203bbf445a117c780ad4192e4aR438 all exception reported from consumer span will be ignored.

In this commit you accept exceptions either from remote span or internal span. However consider scenario:

  1. Kafka message arrives
  2. you extract context from headers -> context with no recording span is created. Span's isRemote property is true;
  3. you call startActiveSpan to create span for message handling using extracted context and kind = SpanKind.Consumer.
  4. newly created span is no longer "remote" because it is child of remote span - so exceptions are ignored.

Yes, we can set kind to SpanKind.Internal, but it breaks visual span representation in UI and semantics overall...

Note, that in this context we have enough information to report "resolved" exception.

@github-actions github-actions bot added customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Dec 31, 2024
@xirzec xirzec added the Monitor - Exporter Monitor OpenTelemetry Exporter label Dec 31, 2024
@xirzec xirzec added the Client This issue points to a problem in the data-plane of the library. label Dec 31, 2024
@github-actions github-actions bot removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Dec 31, 2024
@JacksonWeber
Copy link
Member

@vputsenko I believe based on the scenario you mentioned above, in point 4 you say the "newly created span is no longer "remote" because it is a child of a remote span". The logic from line 438 here specifies that if the parent of the span is remote, that the exception will be created, regardless of if the span the exception is recorded on is INTERNAL. Please let me know if that logic covers your case as expected. Thanks!

@vputsenko
Copy link
Author

vputsenko commented Jan 1, 2025

@JacksonWeber in this code parentSpanContext is not a context of parent span of current span, it is just simply context of current span (look here )

Note that in Open Telemetry setting span to context creates new context and current context just used as parent of newly context (look here https://github.com/open-telemetry/opentelemetry-js-api/blob/main/src/trace/context-utils.ts#L51 ) . So in my scenario remote span stays in parent context of current context (which has newly created span as active), but your code checks current span's context.

BTW, may be i really miss something. here is pseudo code i use

  1. const remoteCtx = propagation.extract(ROOT_CONTEXT, message.headers, headersAccessor);
  2. tracer.startActiveSpan(topic, spanOptions, remoteCtx , (newSpan) => { newSpan.recordException(new Error('Ooops'); } )

So at step #2 context of "newSpan" is no longer remote. however remoteCtx has really remote span set as active.

In another words you can create remote context using propagation.extract but you cannot use assign it to span, you can create span with this context as parent. On server side such spans are usually Server or Consumer.

You don't catch this in your tests because you create spans using constructor, not by OpenTelemetry API. Look here https://opentelemetry.io/docs/specs/otel/trace/api/#span-creation
"There MUST NOT be any API for creating a Span other than with a Tracer."

So this test is incorrect because using OTel APi you never create remote spans to use for scope of some job.

@JacksonWeber
Copy link
Member

@vputsenko Just so I can understand your usage a little better, why are you pulling the remote context from the headers and creating this new child span? Assuming you're using the Kafka instrumentation, I would expect that instrumentation to create a remote span that you could then record exceptions on (which would be logged to application insights).

I understand your point about us not pulling the parentSpan's context. However, we'd expect the isRemote property on the parentSpanContext to be true so long as the SpanContext was propagated from a remote parent, which it should be assuming the parent context is that of the span being generated by the incoming message from Kafka.

If you have a repro application, I'd be happy to debug and take a look at how this logic is working. Thanks!

@vputsenko
Copy link
Author

Hi @JacksonWeber

Thank you for an answer and questions. To create remote span use can only use propagation.extract. That is what intrumentation calls. It accepts headers to extract value of "traceparent" header. However this function creates "NoopSpan" which just stores SpanContext portion and all methods are stubbed.

You will have the repro using just these two lines

const remoteCtx = propagation.extract(ROOT_CONTEXT, message.headers, headersAccessor);
tracer.startActiveSpan(topic, spanOptions, remoteCtx , (newSpan) => { newSpan.recordException(new Error('Ooops'); } )

If you have problems with this - i can provide test sample later.

Why cannot we just port GO version of exporter for collector ? https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/azuremonitorexporter/trace_to_envelope.go#L143 - it has right implementation.

If OTEL exporter behaves differently when used with collector - this will be and issue with migration to collector.

@JacksonWeber
Copy link
Member

@vputsenko Thank you for helping explain your issue. After investigation, you're correct, the current implementation is a logical bug on the Node.js side for the attempted implementation of suppressing unactionable exception telemetry. Part of this issue is that while the Java azure-monitor SDK has access to the parentSpanContext via their OpenTelemetry SDK, OpenTelemetry JS appears to have a spec inconsistency here. I'll work on getting that inconsistency resolved so we can correctly implement the logic.

As for why we don't just use the same implementation as the GO exporter, it's to resolve customer's issues with instrumentations pushing span exceptions that cannot be suppressed by the customer. This creates expensive and un-addressable exception telemetry for customers. These issues have been identified and we should hopefully have a change to OTEP that will remove the need for this kind of special logic.

@vputsenko
Copy link
Author

Cool, thank you. Waiting for the fix... Let me add few thoughts. Of course i don't fully know customer case, but it looks like you are trying to adapt standard to customer's issue. If they get extra telemetry - they should no report it. exporter should report everything it gets from SDK. filtering, sampling, batching, etc... should be covered by span processors, not exporters... Later you may enter situation when same case is bug for one customer and feature for another. in this case i would say OTEL specs should win )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Monitor - Exporter Monitor OpenTelemetry Exporter question The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Projects
None yet
Development

No branches or pull requests

3 participants