Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spans not shown in the UI when their parent is missing #2160

Open
MartinKolarik opened this issue Jul 15, 2021 · 8 comments
Open

Spans not shown in the UI when their parent is missing #2160

MartinKolarik opened this issue Jul 15, 2021 · 8 comments
Labels
agent-nodejs Make available for APM Agents project planning.

Comments

@MartinKolarik
Copy link

First, note that I'm not sure if this is a bug in the UI or an issue with the approach suggest in the node.js docs, which is why I'm opening this issue here.

Describe the bug

We use a span filter to only store spans above a certain duration. The logic behind that is rather simple: most of the spans are very short and of little value. Storing only spans above a certain duration saves a lot of space in the ES. This approach has been suggested in the node.js agent repo (#1701) and is now included in the performance tuning tips.

We found, however, that often even long spans are missing from the APM UI (i.e., there would be a transaction but without spans), even though the spans are actually stored and possible to find via Discover.

I believe this happens when a short span is dropped by the filter, while it was set as a parent of some much longer span. The UI doesn't show the long span, because it can't find its parent (the following message suggests that):

image

Expected behavior

We need a way to filter out short spans. I'm not sure, however, if the UI can be changed to show spans with a missing parent, or if the filtering method needs to be smarter and for example replace the parent id when the real parent is dropped (this currently seems hard to do, without support directly in the agent lib).

Debug logs

Two captured spans. The first is dropped by our filter. The second is stored but doesn't show up in the UI.

Click to expand
{
  id: '492b3e17825250d4',
  transaction_id: 'ae0b52e3a6c60f64',
  parent_id: 'ae0b52e3a6c60f64',
  trace_id: '413dfce475cad1107182651ea9abab46',
  name: 'EXEC',
  type: 'cache',
  subtype: 'redis',
  action: null,
  timestamp: 1626368299979725,
  duration: 1.616,
  context: {
    db: undefined,
    http: undefined,
    tags: undefined,
    destination: { service: [Object], address: '127.0.0.1', port: 6379 },
    message: undefined
  },
  stacktrace: undefined,
  sync: false,
  outcome: 'success',
  sample_rate: 1
}

{
  id: '6523f3b42ff3c563',
  transaction_id: 'ae0b52e3a6c60f64',
  parent_id: '492b3e17825250d4',
  trace_id: '413dfce475cad1107182651ea9abab46',
  name: 'GET registry.npmjs.org/jquery',
  type: 'external',
  subtype: 'http',
  action: 'GET',
  timestamp: 1626368299989391,
  duration: 108.439,
  context: {
    db: undefined,
    http: {
      method: 'GET',
      status_code: 200,
      url: 'https://registry.npmjs.org/jquery'
    },
    tags: undefined,
    destination: { service: [Object], address: 'registry.npmjs.org', port: 443 },
    message: undefined
  },
  stacktrace: undefined,
  sync: false,
  outcome: 'success',
  sample_rate: 1
}
@github-actions github-actions bot added the agent-nodejs Make available for APM Agents project planning. label Jul 18, 2021
@eyalkoren eyalkoren transferred this issue from elastic/apm Jul 18, 2021
@eyalkoren
Copy link
Contributor

@MartinKolarik I think your analysis makes a lot of sense, but I moved the issue to this repo, so that the Node JS agent experts can provide input and fix docs if required.

Take notice that we just recently started an initiative around this general problem of reducing the amount of spans while losing as little valuable information as possible, you can see what we are planning in elastic/apm#453.
One of the methods we are going to implement is the ability to filter out very fast exit spans, which won't suffer from this problem of orphan spans. Unfortunately, I don't think there is an easy way to safely drop non-exit spans in the Node JS agent, given the asynchronous nature of the platform.

Hopefully, one of the proposed methods in the linked issue will cover your use case. If not, feel free to propose an alternative/additional mechanism.

@MartinKolarik
Copy link
Author

Thanks for the update @eyalkoren. I think a solution that would work here is replacing the parent id by a transaction id, or span id higher up the chain. Basically, go up the chain and find the closes thing that wasn't deleted.

I'd definitely like to hear some thoughts on this from the agent developers though, as it seems it would be possible but not easy to do via the currently exposed apis.

@MartinKolarik
Copy link
Author

Related issue #2301 - but that only solves part of the problem.

@trentm
Copy link
Member

trentm commented Aug 26, 2021

@MartinKolarik

I think a solution that would work here is replacing the parent id by a transaction id ... it would be possible but not easy to do via the currently exposed apis.

It may be possible with the currently exposed APIs. I agree that it wouldn't be easy. I say "may" because there is some non-determinism in the order in which ended spans are given to span filters. When a span is ended there are some async tasks that gather a stacktrace (depending on captureSpanStackTraces and the recent spanFramesMinDuration config options) and encode the span before passing that encoded span object to the span filters. Generally they will be passed to the span filters in the order in which they ended, but that isn't a guarantee.

Here is an (untested) attempt at a span filter to fix-up parent_ids:

const LRU = require("lru-cache")
const parentIdFromDroppedSpanId = new LRU(500)
const THRESHOLD_MS = 10

function dropShortSpans(payload) {
    if (payload.duration < THRESHOLD_MS) {
        parentIdFromDroppedSpanId.set(payload.id, payload.parent_id)
        return null
    }

    // If this span's parent has been dropped, then attempt to find its first
    // ancestor that has not been dropped to fix-up its parent_id.
    let parentId = parentIdFromDroppedSpanId.get(payload.parent_id)
    if (parentId) {
        let DEPTH_SENTINEL = 3 // Limit how far up ancestry we will traverse.
        while (true) {
            const nextParentId = parentIdFromDroppedSpanId.get(parentId)
            if (nextParentId) {
                if (--DEPTH_SENTINEL <= 0) {
                    // Too many dropped parents, fallback to transaction id.
                    parentId = payload.transaction_id
                    break
                }
                parentId = nextParentId
            } else {
                break
            }
        }
        // We've found an undropped parent id, or our LRU cache is too small for
        // current load, or a parent hasn't ended yet but *will* be dropped.
        payload.parent_id = parentId
    }

    return payload
}

apm.addSpanFilter(dropShortSpans)

If that looks interesting to you and you try it out, I'd love to hear if it works out. If so, I'll update the performance tips doc.

Related issue #2301 - but that only solves part of the problem.

Your example span above is a redis span which will be an exit span when we implement #2301, so IIUC that will handle that particular case. Do you have non-exit short spans that you tend to drop as well?


One other thing: I think it is possible that your example HTTP span should not be a child of your example Redis span. There is a bug in the current APM agent expressed in #1964:

I'm running into an issue where if the active span for the current async context ends, and a new span immediately starts, the ended span is the parent of the new span.

I'm currently working on that issue

@MartinKolarik
Copy link
Author

MartinKolarik commented Aug 26, 2021

@trentm thanks, that seems exactly like what I had in mind - I hoped that if this was done in the agent, it could be "cleaner" and avoid the LRU but after reading your description of how it's processed, maybe it wouldn't be possible without a LRU even directly in the agent.

Your example span above is a redis span which will be an exit span when we implement #2301, so IIUC that will handle that particular case. Do you have non-exit short spans that you tend to drop as well?

Not sure if I understand now - I thought exit span means there are no child spans, which there are in this case so #2301 wouldn't help on its own. Did you mean that first it would have to be fixed by #1964 and then after #2301 it would be dropped?

@trentm
Copy link
Member

trentm commented Aug 26, 2021

Did you mean that first it would have to be fixed by #1964 and then after #2301 it would be dropped?

Yes, that (though my fix for the parent/child issue will be a different change than #1964 -- my current draft work is at #2181).

@MartinKolarik
Copy link
Author

I see, thanks. That may indeed be enough. Our main issue is that we have many short redis calls and they are often assigned as parents of the long HTTP calls.

One thing I'm still not sure about though. How will the mentioned changes affect chained short calls, e.g.:

redis.multi().set(rKey, value).get(rKey).exec()

The above results in four super-short chained spans - even if the last one is considered exit span and dropped, I'm not sure if the dropping is then meant to be applied recursively - i.e., what happens to the first three?

@trentm
Copy link
Member

trentm commented Aug 26, 2021

How will the mentioned changes affect chained short calls

In that case (again when we've implemented it), "span compression" should help. Those similar and consecutive redis spans should get grouped into a single span that represents all of them. The "drop fast exit spans" issue discussed above is a part of a larger recently spec'd effort to have an answer for huge traces. Probably the best current starting point for reading is: https://github.com/elastic/apm/tree/master/specs/agents/handling-huge-traces

#2100 is the implementation issue for compressed spans for the Node.js APM agent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-nodejs Make available for APM Agents project planning.
Projects
None yet
Development

No branches or pull requests

3 participants