Performance: filter spans based on duration #852

cdalexndr · 2019-09-18T20:48:44Z

Using the APM, my span index is full of db spans (span.type:db) containing lots of fast execution queries that are of little interest.
A method to filter these spans based on duration will help to ignore fast queries and trace only important long running queries.
Using sampling can ignore important long queries.

I suggest a dynamic configuration option based on span type and subtype:
span_min_duration_[type]_[subtype]=[ms]

Example:

span_min_duration_db=500ms will record only spans with type db and any subtype longer than 500ms
span_min_duration_app_maintenance=100ms will record spans with type app and subtype maintenance longer than 100ms

Related: #440

The text was updated successfully, but these errors were encountered:

cdalexndr · 2019-09-18T21:06:25Z

In my apm span index, from 150.000 db spans, 200 are above 10ms, and 50 are above 50ms.
Using a filter for 50ms will eliminate ‭99.9666666666666% of db spans.

eyalkoren · 2019-09-19T04:19:44Z

This is doable, but some pitfalls to be aware of (see discussion in #440):

We can't discard some spans, like such that transfer execution context (internal- async or external- distributed tracing)
We can't discard spans leading to non-discarded spans, otherwise we will have orphan spans
We should collect metrics for discarded spans (eg breakdown metrics)
This will lead to inconsistent captures- traces of exactly the same execution will look different. You may have transactions doing lots of DB queries, all short, where you won't have any indication of that. Seems a valid concern for the numbers described above.
Once we add the service map feature, this will introduce the risk of missing a DB (or other service) if all queries to it in the queried time range are below the threshold. This depends on the service map implementation, but according to the current plan this is a valid risk. Seems a valid concern for the numbers described above.

In addition, not sure whether this dynamic naming of config option is the right way to go.
Maybe adding a threshold without regard to its type is good enough as a start.

cdalexndr · 2019-09-25T08:30:50Z

My example scales, in a 2GB span index with 11.5M docs, 95% spans are db type and more than 99% of db span duration is below 4ms. Such a waste of disk space...

Thinking to disable db spans and use postgres logging feature of threshold duration queries, then parse with logstash and manual crossreference log time with transaction to find the source, until this issue is resolved.

SylvainJuge · 2021-09-20T13:06:52Z

Hi @cdalexndr , we have an upcoming feature called "compressed spans" that should cover your use-case here.

Here are the related issues:

At APM product level : Compressed spans apm#432
at Java-agent level (not yet implemented as of writing) : [META 432] Implement compressed spans algorithm #1847

Thus for now I would suggest that you subscribe to those issues and I'll close this one.
Feel free to comment/re-open if relevant though.

SylvainJuge · 2021-09-20T13:19:11Z

Also related, there is another related upcoming feature to handle fast exit spans (which include DB spans):

At APM product level : Dropping fast exit spans apm#496
At Java-agent level : [META 496] Dropping fast exit spans #2083

lreuven added the [zube]: Backlog label Oct 7, 2019

SylvainJuge removed the [zube]: Backlog label Feb 1, 2021

SylvainJuge added the triage label Aug 30, 2021

SylvainJuge closed this as completed Sep 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: filter spans based on duration #852

Performance: filter spans based on duration #852

cdalexndr commented Sep 18, 2019 •

edited

Loading

cdalexndr commented Sep 18, 2019

eyalkoren commented Sep 19, 2019

cdalexndr commented Sep 25, 2019

SylvainJuge commented Sep 20, 2021

SylvainJuge commented Sep 20, 2021

Performance: filter spans based on duration #852

Performance: filter spans based on duration #852

Comments

cdalexndr commented Sep 18, 2019 • edited Loading

cdalexndr commented Sep 18, 2019

eyalkoren commented Sep 19, 2019

cdalexndr commented Sep 25, 2019

SylvainJuge commented Sep 20, 2021

SylvainJuge commented Sep 20, 2021

cdalexndr commented Sep 18, 2019 •

edited

Loading