You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When lazy loading is enabled, the store gateway should become ready very fast after it starts, because the blocks should be loaded on-demand.
Instead that store-gateways can take up to 2.5h to start, slowing down deployments.
We have been experiencing this in older versions of Mimir in one or 2 instances sporadically, during normal restarts.
Now it happened also with the new version of Mimir, 2.15. The last time it happened not only with one instance, but with all of them. The difference was that we deleted all store-gateways PVs in one zone and we were expecting that zone to become ready very fast, but all instances in the zone took hours to load all their blocks from S3.
We are running in K8s and the following is the configuration of the store-gateway
Not sure, it doesn't happen in our dev clusters but maybe because they don't have much data to load.
What did you think would happen?
the store gateway should be ready immediately
What was your environment?
It happened in Kubernetes 1.28 and below
Mimir 2.14 and 2.15 (but if I remember correctly it happened also in 2.13).
Both on x86 and ARM.
Both when the persistent volume during the restart is kept as is or it's deleted.
Any additional context to share?
No response
The text was updated successfully, but these errors were encountered:
Store-gateways don't load the TSDB index-header into memory until needed when lazy loading is enabled, but they still must download the index-header (a subset of the TSDB index) to local disk. This should only happen with an empty disk such as when new store-gateways are started - not when existing ones are restarted.
Hi Nick, thank you for your reply. I see, so this line in the logs ts=2025-02-14T11:25:04.953879005Z caller=bucket.go:452 level=info user=default msg="loaded new block" elapsed=19.921888892s id=REDACTED
is about the index-header being loaded on disk, not the block itself?
There are different things going on here. "Loading a block" involves several different pieces of work happening. The index-header is a part of that. Lazy loading controls whether the index-header is downloaded and immediately loaded into memory or just downloaded and loaded into memory when a query involves that particular block.
So regardless of the lazy loading setting, starting a store-gateway is going to involve some work. That's what that log message is about: all the work required for a store-gateway to load a block.
Thanks for the clarification. That's all good on the last event when the PV were deleted.
It remains only the case when it happens occasionally also during normal restarts on isolated pods. I'll see what I can get when we encounter it again.
What is the bug?
When lazy loading is enabled, the store gateway should become ready very fast after it starts, because the blocks should be loaded on-demand.
Instead that store-gateways can take up to 2.5h to start, slowing down deployments.
We have been experiencing this in older versions of Mimir in one or 2 instances sporadically, during normal restarts.
Now it happened also with the new version of Mimir, 2.15. The last time it happened not only with one instance, but with all of them. The difference was that we deleted all store-gateways PVs in one zone and we were expecting that zone to become ready very fast, but all instances in the zone took hours to load all their blocks from S3.
We are running in K8s and the following is the configuration of the store-gateway
How to reproduce it?
Not sure, it doesn't happen in our dev clusters but maybe because they don't have much data to load.
What did you think would happen?
the store gateway should be ready immediately
What was your environment?
It happened in Kubernetes 1.28 and below
Mimir 2.14 and 2.15 (but if I remember correctly it happened also in 2.13).
Both on x86 and ARM.
Both when the persistent volume during the restart is kept as is or it's deleted.
Any additional context to share?
No response
The text was updated successfully, but these errors were encountered: