Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Messaging: should receive spans be CLIENT? #1366

Open
lmolkova opened this issue Aug 23, 2024 · 3 comments
Open

Messaging: should receive spans be CLIENT? #1366

lmolkova opened this issue Aug 23, 2024 · 3 comments

Comments

@lmolkova
Copy link
Contributor

lmolkova commented Aug 23, 2024

Receive spans describe pulling messages from a topic/queue.

E.g. AWS SQS example looks like

List<Message> messages = sqs.receiveMessage(queueUrl).getMessages();

Kafka example

ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));

This operation fits into a vague CLIENT span definition - it's a logical client call to the remote service. It's initiated by the application itself, ends once the corresponding method return received messages and does not account for any message handling or processing time.

But we currently specify that receive spans should be CONSUMER -

| `receive` | `CONSUMER` |

Why it's CONSUMER?

The receive operation is the only messaging span that instrumentation libraries can guarantee to be created on the consumer side when messages are pulled.

If there is a higher level framework that is used to process messages (such as Spring or Apache Camel) it may create processing SERVER spans, otherwise they may be created by user applications.

The CONSUMER kind on the receive spans

  • describes message flow direction - from broker to service (rather than call direction from service to broker)
  • provides an indication to tracing tools that links on this span represent incoming messages

See https://github.com/open-telemetry/oteps/blob/main/text/trace/0220-messaging-semantic-conventions-span-structure.md#span-kind for the context

@lmolkova
Copy link
Contributor Author

lmolkova commented Aug 23, 2024

I think we have two options:

Option 1: Keep CONSUMER span kind

We'd need to add more wiggle room in already vague span kind definition to make this more legit.

By using CONSUMER we create ambiguity: the receive span does not describe external request, its latency does not represent processing duration, errors don't represent processing errors. But any tool that makes generic assumptions based on the span kind alone will think that it describes message consumption.

Option 2: Use CLIENT kind

Possible drawbacks:

  • some consumer applications will not have any CONSUMER or SERVER spans. i.e. service maps will not detect any incoming calls to the service. This could happen in other cases (when there is no server instrumentation), so tracing systems should be prepared for it.
  • there will be no CONSUMER span matching PRODUCER spans - that's also does not seem like a trace visualization/analysis problem

We can try to address any possible drawbacks with additional semantics:

  • we already capture messaging.operation.type = receive attribute, so messaging-aware visualizations/queries should be able to special-case it
  • assuming we need generic solution, we can look into alternatives such as adding span link direction.

My proposal is to do Option 2.

Applications that only report receive spans have poor observability - they need to instrument message consumption anyway.
We're trying to cover it up by reporting CONSUMER span, but it does not solve the bigger problem.

@joaopgrassi
Copy link
Member

We discussed this in the meeting on 30-08-2024 and reached the consensus to use CLIENT for the receive span and keep CONSUMER for when process spans are created.

@pyohannes
Copy link
Contributor

Given changes in open-telemetry/opentelemetry-specification#4178, this makes sense.

With those changes, we don't see the consumer span as the end point of an asynchronous communication channel (from the point of view of application code), but as "processing of an operation initiated by a producer".

This brings some limitations, but reduces ambiguity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: V1 - Stable Semantics
Development

No branches or pull requests

3 participants