-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace OpenTracing extension with one for OpenTelemetry? #649
Comments
Yup! That would be helpful, thanks! |
here is my basic implementation based on OpenTracing: from copy import deepcopy
from functools import partial
from inspect import isawaitable
from typing import Any, Callable, Dict, Optional
from graphql import GraphQLResolveInfo
from opentelemetry import trace
from opentelemetry.trace.status import Status, StatusCode
from ariadne.types import Extension, Resolver
from ariadne.contrib.tracing.utils import format_path, should_trace
ArgFilter = Callable[[Dict[str, Any], GraphQLResolveInfo], Dict[str, Any]]
class OpentelemetryExtension(Extension):
_arg_filter: Optional[ArgFilter]
_tracer: trace.Tracer
def __init__(
self,
*,
arg_filter: Optional[ArgFilter] = None,
tracer_provider: Optional[trace.TracerProvider] = None,
):
self._arg_filter = arg_filter
self._tracer = trace.get_tracer(__name__, "0.1.0", tracer_provider)
async def resolve(self, next_: Resolver, parent: Any, info: GraphQLResolveInfo, **kwargs):
if not should_trace(info):
result = next_(parent, info, **kwargs)
if isawaitable(result):
result = await result
return result
with self._tracer.start_as_current_span(info.field_name, kind=trace.SpanKind.SERVER) as span:
span.set_attribute("graphql.parentType", info.parent_type.name)
graphql_path = ".".join(map(str, format_path(info.path))) # pylint: disable=bad-builtin
span.set_attribute("graphql.path", graphql_path)
if kwargs:
filtered_kwargs = self.filter_resolver_args(kwargs, info)
for kwarg, value in filtered_kwargs.items():
span.set_attribute(f"graphql.param.{kwarg}", value)
result = next_(parent, info, **kwargs)
if isawaitable(result):
result = await result
span.set_status(Status(StatusCode.OK))
return result
def filter_resolver_args(self, args: Dict[str, Any], info: GraphQLResolveInfo) -> Dict[str, Any]:
if not self._arg_filter:
return args
return self._arg_filter(deepcopy(args), info)
def opentracing_extension(*, arg_filter: Optional[ArgFilter] = None):
return partial(OpentelemetryExtension, arg_filter=arg_filter) |
I would suggest extracting the context headers from incoming request and adding that to the trace as well. Very useful for context propagating across services. My team uses it with a federated GQL setup. with self._tracer.start_as_current_span(info.field_name, kind=trace.SpanKind.SERVER) as span: to something like from opentelemetry.propagate import extract
with self._tracer.start_as_current_span(info.field_name, kind=trace.SpanKind.SERVER, context=extract(request.headers)) as span: You have to change |
It's ctx = extract(
carrier=info.context["request"].headers
)
with self._tracer.start_as_current_span(
info.field_name, kind=trace.SpanKind.SERVER, context=ctx
) as span: |
Gotta love open-source. Everyone contributing and suddenly we have something super useful. |
This implementation by @kamilglod doesn't seem quite right. The spans are not correctly grouped together since there's not root span created at |
Here's my corrected implementation: from __future__ import annotations
from copy import deepcopy
from functools import partial
from inspect import isawaitable
from typing import Any, Callable
from ariadne.contrib.tracing.utils import format_path, should_trace
from ariadne.types import Extension, Resolver
from graphql import GraphQLResolveInfo
from opentelemetry import trace
from opentelemetry.propagate import extract
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.resources import get_aggregated_resources
from opentelemetry.sdk.trace.export import (
ConsoleSpanExporter,
SimpleSpanProcessor,
)
from opentelemetry.trace.status import Status, StatusCode
def opentracing_extension(*, arg_filter: ArgFilter | None = None):
# REPLACE THESE BASED ON YOUR ENVIRONMENT
span_processor = SimpleSpanProcessor(ConsoleSpanExporter())
resources = []
id_generator = None
return partial(
OpentelemetryExtension,
arg_filter=arg_filter,
tracer_provider=TracerProvider(
# Processes traces in batches as opposed to immediately one after the other
active_span_processor=span_processor,
resource=get_aggregated_resources(resources),
id_generator=id_generator,
),
)
ArgFilter = Callable[[dict[str, Any], GraphQLResolveInfo], dict[str, Any]]
class OpenTelemetryExtension(Extension):
"""OpenTelemetry extension for Ariadne"""
_root_span: trace.Span
def __init__(
self,
*,
arg_filter: _ArgFilter | None = None,
tracer_provider: trace.TracerProvider | None = None,
):
self._arg_filter = arg_filter
self._tracer = trace.get_tracer(__name__, "0.1.0", tracer_provider)
def request_started(self, context: Any) -> None:
"""Begin the root span for this request and add any request headers as context"""
self._root_span_manager = self._tracer.start_as_current_span(
_graphql_span_name(),
kind=trace.SpanKind.SERVER,
context=extract(carrier=context["request"].headers),
)
self._root_span = self._root_span_manager.__enter__()
def request_finished(self, context: Any) -> None:
"""Close the root span"""
self._root_span_manager.__exit__(Exception, Exception(), None)
async def resolve(
self, next_: Resolver, parent: Any, info: GraphQLResolveInfo, **kwargs: Any
) -> Any:
"""Begin a new span at each query resolution step"""
if not should_trace(info):
result = next_(parent, info, **kwargs)
if isawaitable(result):
result = await result
return result
try:
with self._tracer.start_as_current_span(_graphql_span_name(info)) as span:
for kwarg, value in self.filter_args(kwargs, info).items():
span.set_attribute(f"graphql.param.{kwarg}", value)
result = next_(parent, info, **kwargs)
if isawaitable(result):
result = await result
span.set_status(Status(StatusCode.OK))
return result
except Exception:
self._root_span.set_status(Status(StatusCode.ERROR))
def filter_args(self, args: dict[str, Any], info: GraphQLResolveInfo) -> dict[str, Any]:
return self._arg_filter(args, info) if self._arg_filter else args |
@rmorshea That's a great snippet! Could you open a pull request and place this extension in |
There's still little problems I'm discovering. For example |
I've used this approach in production but unfortunately had to back down due to a negative impact in performance, even for simple things like the introspection query. Beware of middlewares and extensions #708 |
I'm unsure whether the aforementioned performance issue is specific to Ariadne though. Looking at other frameworks built on top of graphql-python don't seem to have issues with it |
@dacevedo12 I don't know how they don't have issues with it if its same query executor underneath 🤔 Maybe the skip resolver logic at Ariadne is bogged? I'll be more than happy to investigate this properly when I get 0.18 out of the door sometime next week. I know we will need to expand our benchmarks suite first so we have big and repeatable datasets and schema closer to what real apps are doing. |
It's weird indeed. After debugging a bit I've managed to track it down to the That codepath made the introspection query take more than 2 seconds in my schema, which usually takes around 200-600ms. Will try to put together a reproducible demo this weekend |
@dacevedo12 any luck with a demo? I'm currently releasing Ariadne 0.18 where default middleware manager can be swapped for custom one. But I don't know yet if there's much space for performance improvement there. |
Sorry for the delay, it's been a busy couple of weeks. Yes, I've managed to put together a minimal reproducible demo here: https://github.com/dacevedo12/ariadne-perf-demo Changing Things I've noticed so far:
|
Thanks @dacevedo12. We should separate the performance degradation due to middleware from that due to async execution. I did not look deeply into this, but it looks like Ariadne's default exensions are async middleware, so they may turn every resolver into an async one. Note that middleware is one of the features that do not exist in GraphQL.js and was taken over from GraphQL-core 2 in order to stay compatible, not because it is a good solution except for a few special cases. I can well imagine that it causes a performant loss because it is run on every resolver, and this gets worse when every resolver is made async due to the issues mentioned above. |
Yeah, it's not middlewares but the way those are implemented that turns all resolvers into async ones, which in the current state of affairs with graphql-core means a significant slowdown. The dots have been connected 🙌 In the meantime, it seems unfeasible to implement this kind of tracing properly with async python... even things like https://github.com/DataDog/dd-trace-py/blob/1.x/ddtrace/contrib/graphql/patch.py rely on middleware, but it doesn't bother awaiting the result, which yields inaccurate span durations |
I've figured that you can have a
Note that this approach will create another span. Still, this is an ariadne-specific instrumentation though. I'll be working on a graphql-core instrumentor for opentelemetry (that doesn't rely on middlewares) so that all gql python libraries can benefit from it Will link the PR once I have it |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Just a quick update that I've got Open Telemetry working with Ariadne 0.19 and I am hoping to wrap up extensions work soonish so Ariadne 0.20 with both OpenTelemetry and OpenTracing extensions seeing performance improvements based on feedback from @Cito <3 |
Looking great, kudos @rafalp My attempt at submitting a gql core instrumentor wasn't successful. They say they're short staffed and prefer each library adding OTel support on their own. |
https://opentracing.io/ now carries a banner announcing that it's been supplanted by OpenTelemetry
Are you interested in a PR that adds OpenTelemetry based tracing?
The text was updated successfully, but these errors were encountered: