You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The router provides a query planning metric: apollo_router_query_planning_time as a histogram.
This information is tracked from within the router and is in fact tracking all of the time that elapses between submitting the query to the router bridge until the router bridge returns.
That duration may only be a short amount of time for query planning and in fact may be mostly queueing whilst waiting for query planning to be performed. This is because the router-bridge actually maintains a queue of queries to be planned and the actual query planner works synchronously when pulling new jobs from a queue. Let's imaging a scenario where we have two queries to be planned:
complex: takes 30s
simple: takes 1 ms
If they arrive at times (seconds):
0.000: complex
0.001: simple
Then they will finish at:
30.000: complex
30.001: simple
and these times will be reported as apollo_router_query_planning_time which can cause confusion when wondering why a simple query takes so long to plan.
If we want to address this, there are several things we could do within the router/router-bridge:
We'd need to make the amount of jobs which can queue in the QP less than the currently hard-coded 10_000
We'd need to modify the Query planner service so that it didn't just always return Poll::Ready(Ok()) and propagate that back-pressure (somehow)
We could modify the documentation for the existing metric and clarify what the histogram is actually measuring
We could add new metrics to break out queuing time from actual planning time
...
The text was updated successfully, but these errors were encountered:
The router provides a query planning metric:
apollo_router_query_planning_time
as a histogram.This information is tracked from within the router and is in fact tracking all of the time that elapses between submitting the query to the router bridge until the router bridge returns.
That duration may only be a short amount of time for query planning and in fact may be mostly queueing whilst waiting for query planning to be performed. This is because the
router-bridge
actually maintains a queue of queries to be planned and the actual query planner works synchronously when pulling new jobs from a queue. Let's imaging a scenario where we have two queries to be planned:complex: takes 30s
simple: takes 1 ms
If they arrive at times (seconds):
0.000: complex
0.001: simple
Then they will finish at:
30.000: complex
30.001: simple
and these times will be reported as
apollo_router_query_planning_time
which can cause confusion when wondering why a simple query takes so long to plan.If we want to address this, there are several things we could do within the router/router-bridge:
Poll::Ready(Ok())
and propagate that back-pressure (somehow)The text was updated successfully, but these errors were encountered: