Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async spans are being reported as sync #1996

Closed
dgieselaar opened this issue Mar 12, 2021 · 5 comments · Fixed by #2326
Closed

Async spans are being reported as sync #1996

dgieselaar opened this issue Mar 12, 2021 · 5 comments · Fixed by #2326
Assignees
Labels
agent-nodejs Make available for APM Agents project planning. bug

Comments

@dgieselaar
Copy link
Member

Some clearly async spans are being reported as sync, and show up in the UI as blocking. I'm not sure when this was first introduced. Here's an example:

image

I don't think the problem is exclusive to elasticsearch spans, I've seen it happen with custom spans as well. It seems like it was introduced in a fairly recent version of the nodejs agent:

image

@dgieselaar dgieselaar added the bug label Mar 12, 2021
@github-actions github-actions bot added the agent-nodejs Make available for APM Agents project planning. label Mar 12, 2021
@astorm
Copy link
Contributor

astorm commented Mar 12, 2021

@dgieselaar Thanks for reporting this behavior.

First -- looking through the changelog, 3.9 includes a fix where we were incorrectly setting the sync property on a span. Could this be the version where you've seen this change in behavior?

Second -- what about the spans involved makes you believe they should be marked as sync=false? Related, can you provide some isolated code/a repro that demonstrates the behavior?

With the above information we'll be able to have a better information conversation about all this :)

Thanks again for reporting!

@dgieselaar
Copy link
Member Author

@astorm:

First -- looking through the changelog, 3.9 includes a fix where we were incorrectly setting the sync property on a span. Could this be the version where you've seen this change in behavior?

Could be, yeah. I haven't tested all versions.

Second -- what about the spans involved makes you believe they should be marked as sync=false?

It's an elasticsearch span, and it was previously reported as async. Maybe I'm missing something, but it seems like it (obviously?) should not be reported as blocking?

Related, can you provide some isolated code/a repro that demonstrates the behavior?

The issue seems fairly obvious to me and I'm not sure why a repro here is necessary. I probably don't have time for that, but let me know if I can help out otherwise, happy to chat on Slack about it next week if needed.

@astorm
Copy link
Contributor

astorm commented Mar 12, 2021

That's fair @dgieselaar -- here's my take on things.

  1. We fixed a bug in 3.9 where spans that began and ended in the same Node.js async context were incorrectly reported as async when they should have been reported as sync. (an async span is a span that starts execution in one async context, but ends in another)

  2. You've reported seeing this change in behavior

  3. From the evidence provided so far whether this is a bug or not is ambiguous. My assumption would be that the specific work that these spans are measuring is synchronous, and was incorrectly reported previously. (One possible example is that this is the span that measures the creation of the the elastic search API request -- but not its actual execution)

Without knowing more about the code that produced these spans we can't say whether this is a bug or just a nuance of the library being instrumented. With a reproduction we'll

  1. know which elastic search library produced these spans (elasticsearch or @elastic/elasticsearch)
  2. know how the code uses that library to produce the spans (usage that might effect the sync/async-ness of a particular span)
  3. be able to see the span in context with the other spans and reason about which parts of the code are asynchronous or not.

So a reproduction will give us the context we need to diagnose this.

No worries if you can't provide a reproduction -- I know how roadmaps are. :) However, without a reproduction we'll need to come up with one on our own. In addition to the time this will take us we may end up diagnosing a similar case that's unrelated to the case that you're reporting -- we don't have your valuable context. That's why, ideally, we prefer to get reproductions with out bug reports.

Finally, because of the above mentioned ambiguity and our own roadmaps we're not likely to immediately prioritize this. With a clear reproduction a quick fix may be immediately apparent and we won't need to defer this to our next bit of maintenance work.

Does that make sense? Or am I missing something that's obvious to you but not to me?

@dgieselaar
Copy link
Member Author

dgieselaar commented Mar 12, 2021

@astorm I won't tell you what to prioritise :) but a hard disagree from me that there is any ambiguity about whether this is a bug. This is out-of-the-box behaviour. The consequence is that the Elasticsearch span (and other custom spans) show up as "blocking" in the UI. I don't think that users expect these spans to be identified as blocking. Personally I'd look at this from a user's perspective, instead of arguing upwards from the current implementation. I'll leave further action up to your discretion, though.

Edit, re:

This is out-of-the-box behaviour

I should be more specific. This is happening with Kibana's instrumentation, and other custom instrumentation other teams have added outside of Kibana.

@trentm
Copy link
Member

trentm commented Aug 18, 2021

Pretty sure (though only and educated guess, I haven't confirmed) this happens on Elasticsearch spans because there is the synchronously-created HTTP span underneath. I.e. this is the same issue as described and worked around for S3 spans here:

https://github.com/elastic/apm-agent-nodejs/blob/v3.20.0/lib/instrumentation/modules/aws-sdk/s3.js#L112-L118

I believe this will end up getting fixed as part of #2181 and #2000

trentm added a commit that referenced this issue Sep 15, 2021
Current `span.sync` is tracked via:
- the "before" async hook setting it false on the "active" span, and
- Instrumentation#bindFunction's wrapper setting it false, on the
  fair assumption that all usages of bindFunction are for async
  callbacks.
The former has issues when there are multiple active spans within a
single async task -- as is the case with Elasticsearch instrumentation
(issue #1996) and the aws-sdk instrumentations (which have manual
workarounds).

This changes to set sync=false if the executionAsyncId() at end-time
is different than at start-time. This works for whatever `asyncHooks`
config var value.

Fixes: #1996
@trentm trentm self-assigned this Sep 15, 2021
trentm added a commit that referenced this issue Sep 15, 2021
Also add a test for span.sync=false for ES and HTTP spans from
@elastic/elasticsearch instrumentation for #1996.

Move the span.sync test from async-hooks.test.js over to span.test.js
because its impl no longer deps on lib/instrumentation/async-hooks.js.

Fixes: #2292
trentm added a commit that referenced this issue Sep 20, 2021
…ync (#2326)

Currently `span.sync` is tracked via:
- the "before" async hook setting it false on the "active" span, and
- Instrumentation#bindFunction's wrapper setting it false, on the
  fair assumption that all usages of bindFunction are for async
  callbacks.
The former has issues when there are multiple active spans within a
single async task -- as is the case with Elasticsearch instrumentation
(issue #1996) and the aws-sdk instrumentations (which have manual
workarounds).

This changes to set sync=false if the executionAsyncId() at end-time
is different than at start-time. This decouples from async context
management, so works for whatever value of the `asyncHooks`
config var.

This also drops the `transaction.sync` field which was never used.

Fixes: #1996
Fixes: #2292
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-nodejs Make available for APM Agents project planning. bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants