Querier distributed mode much slower than normal mode #8451

vincent-olivert-riera · 2025-08-28T07:17:05Z

vincent-olivert-riera
Aug 28, 2025

Hoping to improve the performance of my Thanos Querier cluster, I have tried migrating it to distributed execution mode.

In normal mode, I have 2 Thanos Querier servers (12CPU 32GB) behind a load balancer.
Each Querier is connected to 176 end points (Sidecars and Stores).

The new distributed mode cluster has 2 central Thanos Querier servers (28CPU 56GB) behind a load balancer.
Each central Querier is connected to 14 end points (7 x 2 local Queriers, running in pairs).

The new distributed mode cluster has 7 local Thanos Querier clusters (each cluster consists of 2 Queriers, 28CPU 56GB).
Each local Querier is connected roughly to 25 (176/7) end points (Sidecars and Stores).

I have observed that queries in the new distributed mode cluster are way slower. To the point that almost all the panels in a Grafana dashboard always fail to load due to "query timeout" errors.
However, with the normal mode cluster the dashboard manages to load completely.

Is that normal?

GiedriusS · 2025-08-28T12:06:18Z

GiedriusS
Aug 28, 2025
Maintainer

That shouldn't happen at all. Perhaps traces would show what is actually taking such a long time?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Querier distributed mode much slower than normal mode #8451

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Querier distributed mode much slower than normal mode #8451

Uh oh!

vincent-olivert-riera Aug 28, 2025

Replies: 1 comment

Uh oh!

GiedriusS Aug 28, 2025 Maintainer

vincent-olivert-riera
Aug 28, 2025

GiedriusS
Aug 28, 2025
Maintainer