Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: Update GraphQL span name convention #1389

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kaylareopelle
Copy link

Fixes #1361

Changes

Due to concerns about high cardinality with the current span name format, use only the graphql.operation.type attribute for the span name. The higher cardinality attribute, graphql.operation.name can still be used in queries, as it is an attribute on the span. The original fallback span name, GraphQL Operation, is preserved.

Merge requirement checklist

  • CONTRIBUTING.md guidelines followed.
  • Change log entry added, according to the guidelines in When to add a changelog entry.
    • If your PR does not need a change log, start the PR title with [chore]
  • schema-next.yaml updated with changes to existing conventions. (I don't think this needs to be updated because it's a span name and not an attribute)

Due to concerns about high cardinality with the current span name format
use only the <graphql.operation.type> attribute for the span name.
The higher cardinality attribute, <graphql.operation.name> can still be
used in queries, as it is an attribute on the span.
The original fallback, "GraphQL Operation" is preserved.
@kaylareopelle kaylareopelle requested review from a team September 4, 2024 18:19
.chloggen/update-graphql-span-name.yaml Show resolved Hide resolved
docs/graphql/graphql-spans.md Outdated Show resolved Hide resolved
`graphql.operation.type` and `graphql.operation.name` are available. If `graphql.operation.name` is not available, the
span SHOULD be named `<graphql.operation.type>`. When `<graphql.operation.type>` is not available, `GraphQL Operation`
MAY be used as span name.
The **span name** MUST be of the format `<graphql.operation.type>` provided that
Copy link
Contributor

@lmolkova lmolkova Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that the main reason for this change is avoiding high cardinality.

What's the problem with operation name cardinality? Is it theoretically high or high in practice?

E.g. for HTTP server spans, you could have 100s of http.route values, but we use it in the span name given that usually cardinality is much lower and that route is such a useful grouping key.

Or for databases we say the following:

The **span name** SHOULD be `{db.operation.name} {target}` if there is a
(low-cardinality) `{db.operation.name}` available (see below for the exact definition of the [`{target}`](#target-placeholder) placeholder).
If there is no (low-cardinality) `db.operation.name` available, database span names
SHOULD be [`{target}`](#target-placeholder).

would it be reasonable to follow the same pattern here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it right, the problem is not so much of cardinality but because it could be unbounded, given it's user specified? https://graphql.org/learn/queries/#operation-name and I'm not sure how an instrumentation can know if a operation name is "low cardinality".

Copy link
Contributor

@lmolkova lmolkova Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems query name is something user optionally provides

https://stackoverflow.com/questions/34011964/what-is-the-point-of-naming-queries-and-mutations-in-graphql

Specifically the second answer https://stackoverflow.com/a/52542928/3239947

We use named queries so that they can be monitored consistently, and so that we can do persistent storage of a query. The duplication is there for query variables to fill the gaps.

The advantage is that the same query each time, not a different string because the query variables are the bit that differs. This means you can build tools on top of those queries because you can treat them as immutable.

If that's accurate, then it's intended for exactly this purpose and intended to have lowish/user-controlled cardinality. Seems like a perfect thing for the span name - for databases we actually would want if user provided us something like this so we could provide better diagnostics.

It doesn't even make sense to make opt-in, because users already provided name as such.

We should keep it.

Copy link

@robertlaurin robertlaurin Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's accurate, then it's intended for exactly this purpose and intended to have lowish/user-controlled cardinality.

Unfortunately the intended use and reality are not aligned. If an API client decides that it wants to generate an operation name based off of an incrementing id, there isn't much the server can do as it is technically a valid label.

Recording that operation name as an attribute on the span can be quite useful. Setting your span name to something with unbounded cardinality is not.

There seems to be an incorrect assumption here the systems being instrumented here control the client callers.

This is not a hypothetical concern.

Copy link
Contributor

@lmolkova lmolkova Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an API client decides that it wants to generate an operation name based off of an incrementing id,

Is it a client library or a an application that's using this client library?

If application user decides to use dynamic operation names, they would experience consequences of it in their telemetry systems. Primarily grouping by span name will be useless. It's unlikely that anything will break.

If they configured span->metrics, these metrics will reach cardinality limit and will be useless.


If my assumptions are correct, then removing operation name would be bad for everyone who uses operation name properly in order to protect those who use it incorrectly - this is a weird design choice to make.

Please help me understand where (if) my assumptions are wrong.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a client library or a an application that's using this client library?

The instrumentation in question here is for server side spans generated from a graphql query/mutation/subscription, it receives client requests.

If application user decides to use dynamic operation names, they would experience consequences of it in their telemetry systems.

The server and client can be different entities. Imagine a public graphql api, anyone can issue queries, and you have no guarantee that operation name will conform to any pattern or bounded cardinality.

Primarily grouping by span name will be useless.

I completely agree, the current specification is directing instrumentation authors to make the graphql execute span useless for any grouping functions.

If they configured span->metrics, these metrics will reach cardinality limit and will be useless

Agree here as well, it means that this span is no longer a viable option metric generation.

In general it needlessly complicates any aggregate analysis of this span.

If my assumptions are correct, then removing operation name would be bad for everyone who uses operation name properly in order to protect those who use it incorrectly - this is a weird design choice to make.

  • Operation name can be of unbounded cardinality (unless you control all of your client requests)
  • Operation name is recorded as an attribute
  • Operation name is also denormalized into the span name in the form "#{operation_type} #{operation_name}"

By injecting the operation name into the span name it means for any graphql api that doesn't control all of the client consumers it is vulnerable to the issues that come with unbounded cardinality on a span name.

By not injecting the operation name into the span name, you are no longer vulnerable to those issue, and you can still optionally do any grouping or analysis based off of the span name and graphql.operation.name attribute.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can reach a compromise here, that by default, operation name is not added, but maybe can be configured/added in a optional manner? This then solves the problem for both categories of grapql servers - those who control the clients and those who don't.

Copy link
Contributor

@lmolkova lmolkova Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instrumentation in question here is for server side spans generated from a graphql query/mutation/subscription, it receives client requests.

that's the part that I missed, I assumed these semantic conventions are for client apps. Thanks for pointing this out!

I agree that operation name is problematic then, it would also be problematic on metrics (if metrics are introduced).

The suggestion @joaopgrassi made

Maybe we can reach a compromise here, that by default, operation name is not added, but maybe can be configured/added in a optional manner?

would be useful for metrics as well.

I.e. instrumentation MAY provide configuration option to add operation name to span name (and populate it on the future hypothetical metrics). Otherwise, if option is not provided or not enabled, the instrumentation would only use operation-type in the span name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Discussions
Development

Successfully merging this pull request may close these issues.

Concerns about high cardinality in GraphQL span names
5 participants