[router-bridge] misleading information about query planning time #455

garypen · 2024-03-06T09:11:11Z

The router provides a query planning metric: apollo_router_query_planning_time as a histogram.

This information is tracked from within the router and is in fact tracking all of the time that elapses between submitting the query to the router bridge until the router bridge returns.

That duration may only be a short amount of time for query planning and in fact may be mostly queueing whilst waiting for query planning to be performed. This is because the router-bridge actually maintains a queue of queries to be planned and the actual query planner works synchronously when pulling new jobs from a queue. Let's imaging a scenario where we have two queries to be planned:
complex: takes 30s
simple: takes 1 ms
If they arrive at times (seconds):
0.000: complex
0.001: simple
Then they will finish at:
30.000: complex
30.001: simple
and these times will be reported as apollo_router_query_planning_time which can cause confusion when wondering why a simple query takes so long to plan.

If we want to address this, there are several things we could do within the router/router-bridge:

We'd need to make the amount of jobs which can queue in the QP less than the currently hard-coded 10_000
We'd need to modify the Query planner service so that it didn't just always return Poll::Ready(Ok()) and propagate that back-pressure (somehow)
We could modify the documentation for the existing metric and clarify what the histogram is actually measuring
We could add new metrics to break out queuing time from actual planning time
...

The text was updated successfully, but these errors were encountered:

Geal · 2024-03-06T10:24:25Z

we could make a queue on the router side and make sure there's only 1 element in the planning queue on the router bridge side?

xuorig mentioned this issue Mar 12, 2024

poc: PooledPlanner + choice of 2 load balancing #456

Draft

xuorig mentioned this issue Mar 26, 2024

Metric combining apollo_router_processing_time and apollo_router_query_planning_time for requests that require query planning apollographql/router#4851

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[router-bridge] misleading information about query planning time #455

[router-bridge] misleading information about query planning time #455

garypen commented Mar 6, 2024 •

edited

Loading

Geal commented Mar 6, 2024

[router-bridge] misleading information about query planning time #455

[router-bridge] misleading information about query planning time #455

Comments

garypen commented Mar 6, 2024 • edited Loading

Geal commented Mar 6, 2024

garypen commented Mar 6, 2024 •

edited

Loading