Querier distributed mode much slower than normal mode #8451
Unanswered
vincent-olivert-riera
asked this question in
Questions & Answers
Replies: 1 comment
-
That shouldn't happen at all. Perhaps traces would show what is actually taking such a long time? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hoping to improve the performance of my Thanos Querier cluster, I have tried migrating it to distributed execution mode.
In normal mode, I have 2 Thanos Querier servers (12CPU 32GB) behind a load balancer.
Each Querier is connected to 176 end points (Sidecars and Stores).
The new distributed mode cluster has 2 central Thanos Querier servers (28CPU 56GB) behind a load balancer.
Each central Querier is connected to 14 end points (7 x 2 local Queriers, running in pairs).
The new distributed mode cluster has 7 local Thanos Querier clusters (each cluster consists of 2 Queriers, 28CPU 56GB).
Each local Querier is connected roughly to 25 (176/7) end points (Sidecars and Stores).
I have observed that queries in the new distributed mode cluster are way slower. To the point that almost all the panels in a Grafana dashboard always fail to load due to "query timeout" errors.
However, with the normal mode cluster the dashboard manages to load completely.
Is that normal?
Beta Was this translation helpful? Give feedback.
All reactions