-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cardinality for Elasticsearch span names is too high #439
Comments
@russcam does the suggestion to chop the path segments after the first segment containing an underscore sound sane to you? Are there some endpoints in ES that don't have underscores? Could other parts of the URL that don't contain an action contain underscores? |
The rest API specs contain all the Elasticsearch API path patterns. For example, for the search API "paths":[
{
"path":"/_search",
"methods":[
"GET",
"POST"
]
},
{
"path":"/{index}/_search",
"methods":[
"GET",
"POST"
],
"parts":{
"index":{
"type":"list",
"description":"A comma-separated list of index names to search; use `_all` or empty string to perform the operation on all indices"
}
}
}
]
}, could these be used to create an agent spec for how to name spans, for example, using the API name (first property of the spec JSON object)? |
Applications might search time-based (foo-2021-05-04) or ILM indices (foo-00001), which won't have a stable name that's useful in aggregations. I'm not sure if |
We had to solve mapping Elasticsearch urls to api names in the clients team before: We feed all the urls from the spec to a url router and feeding it the raw url gets us the |
Currently, the span name is
Elasticsearch: ${http.method} ${http.url.path}
. The issue is that the path can contain document IDs, such asGET /customers/_doc/42
.This is problematic if we want to roll up metrics based on
span.name
.Agents may only have the access to the HTTP request info that the low-level RestClients expose.
An idea that needs further validation is to remove any path segments that come after a path segment that begins with an underscore. Examples:
GET /customers/_doc/42
->GET /customers/_doc
GET /_cluster/health/my-index-000001
->GET /_cluster
GET /_alias/my-alias
->GET /_alias
While this strategy might sometimes result in what feels like too low cardinality (
GET /_cluster/health/my-index-000001
orGET /_cluster/health
is not problematic), it's an easy strategy that should work quite generically.As this would chop off some parts of the URL, it becomes important for agents to collect the full URL in
context.http.url
. This overlaps with elastic/apm-agent-nodejs#2019While at it, we should also consider dropping the
Elasticsearch:
prefix as suggested here: #420 (comment)The text was updated successfully, but these errors were encountered: