-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spans not shown in the UI when their parent is missing #2160
Comments
@MartinKolarik I think your analysis makes a lot of sense, but I moved the issue to this repo, so that the Node JS agent experts can provide input and fix docs if required. Take notice that we just recently started an initiative around this general problem of reducing the amount of spans while losing as little valuable information as possible, you can see what we are planning in elastic/apm#453. Hopefully, one of the proposed methods in the linked issue will cover your use case. If not, feel free to propose an alternative/additional mechanism. |
Thanks for the update @eyalkoren. I think a solution that would work here is replacing the parent id by a transaction id, or span id higher up the chain. Basically, go up the chain and find the closes thing that wasn't deleted. I'd definitely like to hear some thoughts on this from the agent developers though, as it seems it would be possible but not easy to do via the currently exposed apis. |
Related issue #2301 - but that only solves part of the problem. |
It may be possible with the currently exposed APIs. I agree that it wouldn't be easy. I say "may" because there is some non-determinism in the order in which ended spans are given to span filters. When a span is ended there are some async tasks that gather a stacktrace (depending on Here is an (untested) attempt at a span filter to fix-up const LRU = require("lru-cache")
const parentIdFromDroppedSpanId = new LRU(500)
const THRESHOLD_MS = 10
function dropShortSpans(payload) {
if (payload.duration < THRESHOLD_MS) {
parentIdFromDroppedSpanId.set(payload.id, payload.parent_id)
return null
}
// If this span's parent has been dropped, then attempt to find its first
// ancestor that has not been dropped to fix-up its parent_id.
let parentId = parentIdFromDroppedSpanId.get(payload.parent_id)
if (parentId) {
let DEPTH_SENTINEL = 3 // Limit how far up ancestry we will traverse.
while (true) {
const nextParentId = parentIdFromDroppedSpanId.get(parentId)
if (nextParentId) {
if (--DEPTH_SENTINEL <= 0) {
// Too many dropped parents, fallback to transaction id.
parentId = payload.transaction_id
break
}
parentId = nextParentId
} else {
break
}
}
// We've found an undropped parent id, or our LRU cache is too small for
// current load, or a parent hasn't ended yet but *will* be dropped.
payload.parent_id = parentId
}
return payload
}
apm.addSpanFilter(dropShortSpans) If that looks interesting to you and you try it out, I'd love to hear if it works out. If so, I'll update the performance tips doc.
Your example span above is a redis span which will be an exit span when we implement #2301, so IIUC that will handle that particular case. Do you have non-exit short spans that you tend to drop as well? One other thing: I think it is possible that your example HTTP span should not be a child of your example Redis span. There is a bug in the current APM agent expressed in #1964:
I'm currently working on that issue |
@trentm thanks, that seems exactly like what I had in mind - I hoped that if this was done in the agent, it could be "cleaner" and avoid the LRU but after reading your description of how it's processed, maybe it wouldn't be possible without a LRU even directly in the agent.
Not sure if I understand now - I thought exit span means there are no child spans, which there are in this case so #2301 wouldn't help on its own. Did you mean that first it would have to be fixed by #1964 and then after #2301 it would be dropped? |
I see, thanks. That may indeed be enough. Our main issue is that we have many short redis calls and they are often assigned as parents of the long HTTP calls. One thing I'm still not sure about though. How will the mentioned changes affect chained short calls, e.g.: redis.multi().set(rKey, value).get(rKey).exec() The above results in four super-short chained spans - even if the last one is considered exit span and dropped, I'm not sure if the dropping is then meant to be applied recursively - i.e., what happens to the first three? |
In that case (again when we've implemented it), "span compression" should help. Those similar and consecutive redis spans should get grouped into a single span that represents all of them. The "drop fast exit spans" issue discussed above is a part of a larger recently spec'd effort to have an answer for huge traces. Probably the best current starting point for reading is: https://github.com/elastic/apm/tree/master/specs/agents/handling-huge-traces #2100 is the implementation issue for compressed spans for the Node.js APM agent. |
First, note that I'm not sure if this is a bug in the UI or an issue with the approach suggest in the node.js docs, which is why I'm opening this issue here.
Describe the bug
We use a span filter to only store spans above a certain duration. The logic behind that is rather simple: most of the spans are very short and of little value. Storing only spans above a certain duration saves a lot of space in the ES. This approach has been suggested in the node.js agent repo (#1701) and is now included in the performance tuning tips.
We found, however, that often even long spans are missing from the APM UI (i.e., there would be a transaction but without spans), even though the spans are actually stored and possible to find via Discover.
I believe this happens when a short span is dropped by the filter, while it was set as a parent of some much longer span. The UI doesn't show the long span, because it can't find its parent (the following message suggests that):
Expected behavior
We need a way to filter out short spans. I'm not sure, however, if the UI can be changed to show spans with a missing parent, or if the filtering method needs to be smarter and for example replace the parent id when the real parent is dropped (this currently seems hard to do, without support directly in the agent lib).
Debug logs
Two captured spans. The first is dropped by our filter. The second is stored but doesn't show up in the UI.
Click to expand
The text was updated successfully, but these errors were encountered: