Using thanos query components as federation layer for multiple mimir clusters #8220

sarthaktyagi-505 · 2025-04-23T10:43:16Z

sarthaktyagi-505
Apr 23, 2025

Hello community,

Intro

We operate a real large mimir cluster. Although we are able to support a billion+ active series in the cluster, we are growing at a rate which might challenge the limits of Mimir for supporting all the series in 1 cluster. So we started exploring the case of sharding data into more than 1 mimir clusters. We can do it by exposing different read endpoints to the clusters but that way we loose a unified view of our dashboards and functionality to easily look for all the data in one place. For this case we thought of federating the data using Thanos.

Thanos Read components for federation

We did an initial POC and we were able to connect 2 mimir clusters with a federation layer using thanos and were also able to correctly aggregate the results across 2 clusters. But as we started putting heavier queries we are started seeing an increase in the query latency and some limitations related to remote read api. Here are the couple of limitations we found out.

Remote read queries are not sharded.
Limitation of 50MB for a single stream of data in remote read.
Unable to use query and query_range apis instead of remote read wherever possible.

The first 2 we can still live through since the 2nd point has already been suggested to make a user designed flag, but the third point being querier being smart enough to be able to use query and query_range apis wherever necessary can absolutely make things alot faster. This is present in the latest version of promxy and yeilds better performance in terms of query latency, but promxy still doesnt support native histograms.

Questions

Is there a reason why we dont use query and query_range apis instead of remote_read?
Do you have any suggestions how we can optimize the current setup or any ideas in which we can contribute to improving the project which can help support usecases like these.

Thanks in advance, hopefully it also helps the others in the community.
-Sarthak

@sarthaktyagi-505 · 2025-04-23T10:47:01Z

dosubot[bot]
bot Apr 23, 2025

Hello @sarthaktyagi-505! I'm Dosu, your friendly neighborhood bot here to help you tackle bugs, answer questions, and dive into contributions while we wait for a human maintainer. Let's make this journey smoother together! 🚀

I found some relevant information that might help address your questions:

Reason for not using query and query_range APIs instead of remote_read: Unfortunately, I couldn't find specific reasons within the repository for preferring remote_read over query and query_range APIs. This might be an area where further investigation or discussion with the community could provide more insights.
Optimizing the current setup: There is a proposal in the Thanos repository that suggests adding a "selector.relabel-config" command-line flag to the Query component. This would allow the Querier to apply a relabel config against each Store's external label set to decide whether to keep or drop a TSDB from the query. This mechanism aims to optimize query execution by targeting only relevant TSDBs, potentially reducing query latency ^[1].

Additionally, there is ongoing work in Thanos to support native histograms, which might be relevant to your use case since you mentioned promxy's lack of support for native histograms. Thanos is actively testing and providing experimental support for native histograms, which could be beneficial for your setup ^[2]^[3]^[4]^[5].

These insights might help you optimize your current setup or guide your contributions to the project.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other}

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using thanos query components as federation layer for multiple mimir clusters #8220

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Using thanos query components as federation layer for multiple mimir clusters #8220

Uh oh!

Uh oh!

sarthaktyagi-505 Apr 23, 2025

Intro

Thanos Read components for federation

Questions

Replies: 1 comment

Uh oh!

dosubot[bot] bot Apr 23, 2025

sarthaktyagi-505
Apr 23, 2025

dosubot[bot]
bot Apr 23, 2025