Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: store-gateway loads all its block before being ready with lazy loading option enabled #10649

Open
agardiman opened this issue Feb 14, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@agardiman
Copy link

What is the bug?

When lazy loading is enabled, the store gateway should become ready very fast after it starts, because the blocks should be loaded on-demand.
Instead that store-gateways can take up to 2.5h to start, slowing down deployments.
We have been experiencing this in older versions of Mimir in one or 2 instances sporadically, during normal restarts.
Now it happened also with the new version of Mimir, 2.15. The last time it happened not only with one instance, but with all of them. The difference was that we deleted all store-gateways PVs in one zone and we were expecting that zone to become ready very fast, but all instances in the zone took hours to load all their blocks from S3.

We are running in K8s and the following is the configuration of the store-gateway

apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    rollout-max-unavailable: "50"
  labels:
    rollout-group: store-gateway
    zone: a
  name: store-gateway-zone-a
  namespace: cortex
spec:
  podManagementPolicy: Parallel
  replicas: 55
  selector:
    matchLabels:
      name: store-gateway-zone-a
      rollout-group: store-gateway
  serviceName: store-gateway-zone-a
  template:
    metadata:
      labels:
        gossip_ring_member: "true"
        name: store-gateway-zone-a
        rollout-group: store-gateway
        zone: a
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: cortex.pharos.inday.io/zone
                operator: In
                values:
                - mimir-a
              - key: kubernetes.io/arch
                operator: In
                values:
                - arm64
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                name: store-gateway-zone-a
            topologyKey: kubernetes.io/hostname
      containers:
      - args:
        - -auth.multitenancy-enabled=true
        - -blocks-storage.bucket-store.chunks-cache.backend=memcached
        - -blocks-storage.bucket-store.chunks-cache.memcached.addresses=dnssrvnoa+memcached.cortex.svc.cluster.local.:11211
        - -blocks-storage.bucket-store.chunks-cache.memcached.max-async-concurrency=50
        - -blocks-storage.bucket-store.chunks-cache.memcached.max-get-multi-batch-size=500
        - -blocks-storage.bucket-store.chunks-cache.memcached.max-get-multi-concurrency=100
        - -blocks-storage.bucket-store.chunks-cache.memcached.max-idle-connections=50
        - -blocks-storage.bucket-store.chunks-cache.memcached.max-item-size=1048576
        - -blocks-storage.bucket-store.chunks-cache.memcached.min-idle-connections-headroom-percentage=50
        - -blocks-storage.bucket-store.chunks-cache.memcached.timeout=4s
        - -blocks-storage.bucket-store.index-cache.backend=memcached
        - -blocks-storage.bucket-store.index-cache.memcached.addresses=dnssrvnoa+memcached-index-queries.cortex.svc.cluster.local.:11211
        - -blocks-storage.bucket-store.index-cache.memcached.max-async-concurrency=50
        - -blocks-storage.bucket-store.index-cache.memcached.max-get-multi-batch-size=500
        - -blocks-storage.bucket-store.index-cache.memcached.max-get-multi-concurrency=100
        - -blocks-storage.bucket-store.index-cache.memcached.max-idle-connections=50
        - -blocks-storage.bucket-store.index-cache.memcached.max-item-size=5242880
        - -blocks-storage.bucket-store.index-cache.memcached.min-idle-connections-headroom-percentage=50
        - -blocks-storage.bucket-store.index-cache.memcached.timeout=4s
        - -blocks-storage.bucket-store.index-header.lazy-loading-concurrency=0
        - -blocks-storage.bucket-store.metadata-cache.backend=memcached
        - -blocks-storage.bucket-store.metadata-cache.memcached.addresses=dnssrvnoa+memcached-metadata.cortex.svc.cluster.local.:11211
        - -blocks-storage.bucket-store.metadata-cache.memcached.max-async-concurrency=50
        - -blocks-storage.bucket-store.metadata-cache.memcached.max-get-multi-concurrency=100
        - -blocks-storage.bucket-store.metadata-cache.memcached.max-idle-connections=50
        - -blocks-storage.bucket-store.metadata-cache.memcached.max-item-size=1048576
        - -blocks-storage.bucket-store.metadata-cache.memcached.min-idle-connections-headroom-percentage=50
        - -blocks-storage.bucket-store.metadata-cache.memcached.timeout=4s
        - -blocks-storage.bucket-store.sync-dir=/data/tsdb
        - -blocks-storage.bucket-store.sync-interval=15m
        - -blocks-storage.s3.bucket-name=<REDACTED>
        - -blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes=209715200
        - -blocks-storage.tsdb.block-postings-for-matchers-cache-ttl=20s
        - -blocks-storage.tsdb.series-hash-cache-max-size-bytes=1073741824
        - -common.storage.backend=s3
        - -common.storage.s3.endpoint=s3.us-west-2.amazonaws.com
        - -memberlist.bind-port=7946
        - -memberlist.join=dns+gossip-ring.cortex.svc.cluster.local.:7946
        - -runtime-config.file=/etc/mimir/overrides.yaml
        - -server.grpc.keepalive.min-time-between-pings=10s
        - -server.grpc.keepalive.ping-without-stream-allowed=true
        - -server.http-listen-port=80
        - -server.http-read-timeout=5m
        - -server.http-write-timeout=5m
        - -store-gateway.sharding-ring.heartbeat-period=1m
        - -store-gateway.sharding-ring.heartbeat-timeout=4m
        - -store-gateway.sharding-ring.instance-availability-zone=zone-a
        - -store-gateway.sharding-ring.prefix=multi-zone/
        - -store-gateway.sharding-ring.replication-factor=3
        - -store-gateway.sharding-ring.store=memberlist
        - -store-gateway.sharding-ring.tokens-file-path=/data/tokens
        - -store-gateway.sharding-ring.unregister-on-shutdown=false
        - -store-gateway.sharding-ring.wait-stability-min-duration=1m
        - -store-gateway.sharding-ring.zone-awareness-enabled=true
        - -target=store-gateway
        - -tenant-federation.enabled=true
        - -usage-stats.enabled=false
        - -usage-stats.installation-mode=jsonnet
        env:
        - name: GOMAXPROCS
          value: "7"
        - name: GOMEMLIMIT
          valueFrom:
            resourceFieldRef:
              resource: requests.memory
        - name: JAEGER_REPORTER_MAX_QUEUE_SIZE
          value: "1000"
        image: <REDACTED>
        imagePullPolicy: IfNotPresent
        name: store-gateway
        ports:
        - containerPort: 80
          name: http-metrics
        - containerPort: 9095
          name: grpc
        - containerPort: 7946
          name: gossip-ring
        readinessProbe:
          httpGet:
            path: /ready
            port: 80
          initialDelaySeconds: 15
          timeoutSeconds: 5
        resources:
          limits:
            memory: 50Gi
          requests:
            cpu: "3"
            memory: 30Gi
        volumeMounts:
        - mountPath: /data
          name: store-gateway-data
        - mountPath: /etc/mimir
          name: overrides
      securityContext:
        runAsUser: 0
      serviceAccountName: cortex
      terminationGracePeriodSeconds: 120
      tolerations:
      - effect: NoSchedule
        key: arch
        operator: Equal
        value: arm64
      volumes:
      - configMap:
          name: overrides
        name: overrides
  updateStrategy:
    type: OnDelete
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: store-gateway-data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1.1Ti
      storageClassName: gp3

How to reproduce it?

Not sure, it doesn't happen in our dev clusters but maybe because they don't have much data to load.

What did you think would happen?

the store gateway should be ready immediately

What was your environment?

It happened in Kubernetes 1.28 and below
Mimir 2.14 and 2.15 (but if I remember correctly it happened also in 2.13).
Both on x86 and ARM.
Both when the persistent volume during the restart is kept as is or it's deleted.

Any additional context to share?

No response

@agardiman agardiman added the bug Something isn't working label Feb 14, 2025
@56quarters
Copy link
Contributor

Store-gateways don't load the TSDB index-header into memory until needed when lazy loading is enabled, but they still must download the index-header (a subset of the TSDB index) to local disk. This should only happen with an empty disk such as when new store-gateways are started - not when existing ones are restarted.

@agardiman
Copy link
Author

Hi Nick, thank you for your reply. I see, so this line in the logs
ts=2025-02-14T11:25:04.953879005Z caller=bucket.go:452 level=info user=default msg="loaded new block" elapsed=19.921888892s id=REDACTED
is about the index-header being loaded on disk, not the block itself?

@56quarters
Copy link
Contributor

There are different things going on here. "Loading a block" involves several different pieces of work happening. The index-header is a part of that. Lazy loading controls whether the index-header is downloaded and immediately loaded into memory or just downloaded and loaded into memory when a query involves that particular block.

So regardless of the lazy loading setting, starting a store-gateway is going to involve some work. That's what that log message is about: all the work required for a store-gateway to load a block.

@agardiman
Copy link
Author

Thanks for the clarification. That's all good on the last event when the PV were deleted.
It remains only the case when it happens occasionally also during normal restarts on isolated pods. I'll see what I can get when we encounter it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants