Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extends GraphQL Spans #562

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 61 additions & 7 deletions model/trace/instrumentation/graphql.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
groups:
- id: graphql
prefix: graphql
- id: graphql.server.request
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This are just the graphql.server spans. Should we also specific the graphql.client spans?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is desired, yes we should. But most likely not as part of this PR. Also, maybe it's good to create an issue so maybe discussions can happen there.

prefix: graphql.request
type: span
brief: >
This document defines semantic conventions to apply when instrumenting the GraphQL implementation. They map
Expand All @@ -9,7 +9,7 @@ groups:
- id: operation.name
brief: "The name of the operation being executed."
type: string
examples: 'findBookById'
examples: "findBookById"
- id: operation.type
brief: "The type of the operation being executed."
type:
Expand All @@ -24,9 +24,63 @@ groups:
- id: subscription
value: "subscription"
brief: "GraphQL subscription"
examples: ['query', 'mutation', 'subscription']
- id: document
brief: "The GraphQL document being executed."
examples: ["query", "mutation", "subscription"]
- id: document.content
brief: "The content of the GraphQL document being executed."
type: string
note: The value may be sanitized to exclude sensitive information.
examples: 'query findBookById { bookById(id: ?) { name } }'
examples: "query findBookById { bookById(id: ?) { name } }"
- id: document.sha256
brief: "The SHA256 hash of the document.content"
type: string
examples: "400483f38c08e8a3d3b972409c9dfb8e4a326e1b1940864932acd9f873d8664c"
- id: document.id
brief: >
The id of the document beeing executed. This is a hash of the document provided by the user to identitfy
persisted queries.
type: string
examples: "aa3e37c1bf54708e93f12c137afba004"
- id: error.count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a generic "request count/duration" metric? With the error type attribute then one can find out how many errors occurred. https://github.com/open-telemetry/semantic-conventions/blob/v1.23.0/docs/http/http-metrics.md#metric-httpserverrequestduration

brief: "The number of errors that occurred during the operation."
type: int
examples: 3
# TODO should we have something like outcome (success, failure)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure if we should have a property that signals if the execution was a success or a failure.
Then we would have to specify aswell what a 'success' and what a 'failure' is.
As a operation can fail completely ('data' is null) or partially (there are errors)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's an exception we are talking about here, there's already https://github.com/open-telemetry/semantic-conventions/blob/v1.23.0/docs/exceptions/exceptions-spans.md

In general if the request fails on the client side, the span status then is marked as error. See here for some example (for HTTP) https://github.com/open-telemetry/semantic-conventions/blob/main/docs/http/http-spans.md#status

# TODO how do we specify errors?
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A GraphQL Operation can have multiple errors in the response:

{
  "data": null,
  "errors": [
    {
      "message": "Cannot return null for non-nullable field User",
      "locations": [
        {
          "line": 3,
          "column": 5
        }
      ],
      "path": [
        "createUser",
        "user"
      ],
      "extensions": {
        "code": "USERNAME_INVALID",
        "message": "username is invalid"
      }
    }
  ]
}

What is the best way to represent this in OpenTelemetry?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# TODO Should we ref network.transport, network.type, server.address etc?
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would have to be done with caution though. as graphql is protocol agonstic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But those are still very protocol agnostic, no?

# TODO There are more spans like validation, parsing, variable coercion and response formatting. Should we add them as separate span types?
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It provides a lot of value to know how long a specific operation is spending validating, parsing or formatting.
Should we add specific spans for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can model how the trace should like as we want - One thing to always keep in mind is verbosity. Spans costs money in the end. Creating too many may be a problem in some cases, so if it's not essential, maybe they can be turned off or opt-in.

If so, please open a separated issue explaining the value/what it should do so we can tackle it in a separate PR.


- id: graphql.server.resolver
prefix: graphql
type: span
brief: >
This document defines semantic conventions to apply when instrumenting the GraphQL implementation.
They map GraphQL resolvers to attributes on a Span.
attributes:
- id: selection.name
brief: "The name of the selection that is being resolved. Either the field name or an alias."
type: string
examples: "findBookById"
- id: selection.type # selection.field.type ?
brief: "The type of the field that is beeing resolved"
type: string
examples: "Query"
PascalSenn marked this conversation as resolved.
Show resolved Hide resolved
- id: selection.path
brief: "The path of the selection that is beeing resolved."
type: string
examples: "/foo/bar/0/baz"
- id: selection.field.name
brief: "The name of the field that is beeing resolved."
type: string
examples: "findBookById"
- id: selection.field.declaringType
brief: "The type that declares the field that is beeing resolved."
type: string
examples: "Book"
PascalSenn marked this conversation as resolved.
Show resolved Hide resolved
- id: selection.field.coordinate
brief: "The coordinate of the field that is beeing resolved."
type: string
examples: "Query.findBookById"
- id: selection.field.isDeprecated
brief: "Whether the field that is beeing resolved is deprecated."
type: bool
examples: true
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we do with validation errors? If someone sends a invalid GraphQL requests that e.g. could not be parsed. Which span should be emitted

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same span as you would do, but in the case of an error, you set the exception attributes where the error can be recorded (as well as the span status). See HTTP for example https://github.com/open-telemetry/semantic-conventions/blob/main/docs/http/http-spans.md

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing that is not yet specified are the subscriptions.

A subscription is a long running operation that returns an event stream. Similar to websockets/signalR etc.

Is there prior art in open telemetry how this could be handled?

A subscription starts with a "Subscribe" call that returns a event stream. Each event in this event stream is mapped to a graphql result. This means that, the subscribe call and the execution of each event, give different insights. These insights are more important than a arbitrary long root spans that spans the whole graphql execution.