Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cardinality for Elasticsearch span names is too high #439

Open
felixbarny opened this issue May 3, 2021 · 4 comments
Open

Cardinality for Elasticsearch span names is too high #439

felixbarny opened this issue May 3, 2021 · 4 comments

Comments

@felixbarny
Copy link
Member

felixbarny commented May 3, 2021

Currently, the span name is Elasticsearch: ${http.method} ${http.url.path}. The issue is that the path can contain document IDs, such as GET /customers/_doc/42.

This is problematic if we want to roll up metrics based on span.name.

Agents may only have the access to the HTTP request info that the low-level RestClients expose.

An idea that needs further validation is to remove any path segments that come after a path segment that begins with an underscore. Examples:

  • GET /customers/_doc/42 -> GET /customers/_doc
  • GET /_cluster/health/my-index-000001 -> GET /_cluster
  • GET /_alias/my-alias -> GET /_alias

While this strategy might sometimes result in what feels like too low cardinality (GET /_cluster/health/my-index-000001 or GET /_cluster/health is not problematic), it's an easy strategy that should work quite generically.

As this would chop off some parts of the URL, it becomes important for agents to collect the full URL in context.http.url. This overlaps with elastic/apm-agent-nodejs#2019

While at it, we should also consider dropping the Elasticsearch: prefix as suggested here: #420 (comment)

@felixbarny
Copy link
Member Author

@russcam does the suggestion to chop the path segments after the first segment containing an underscore sound sane to you? Are there some endpoints in ES that don't have underscores? Could other parts of the URL that don't contain an action contain underscores?

@russcam
Copy link
Contributor

russcam commented May 3, 2021

The rest API specs contain all the Elasticsearch API path patterns. For example, for the search API

      "paths":[
        {
          "path":"/_search",
          "methods":[
            "GET",
            "POST"
          ]
        },
        {
          "path":"/{index}/_search",
          "methods":[
            "GET",
            "POST"
          ],
          "parts":{
            "index":{
              "type":"list",
              "description":"A comma-separated list of index names to search; use `_all` or empty string to perform the operation on all indices"
            }
          }
        }
      ]
    },

could these be used to create an agent spec for how to name spans, for example, using the API name (first property of the spec JSON object)?

@axw
Copy link
Member

axw commented May 4, 2021

Applications might search time-based (foo-2021-05-04) or ILM indices (foo-00001), which won't have a stable name that's useful in aggregations. I'm not sure if GET /customers/_doc/42 -> GET /customers/_doc is going to be good enough. Russ's suggestion seems reasonable to me.

@Mpdreamz
Copy link
Member

Mpdreamz commented May 5, 2021

We had to solve mapping Elasticsearch urls to api names in the clients team before:

https://github.com/elastic/clients-flight-recorder/blob/016bcf39b1411515ddaf2515ed5ec8319a0364df/scripts/parsed-alternative-report-recorder/index.js#L162

We feed all the urls from the spec to a url router and feeding it the raw url gets us the api.name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants