[bitnami/rabbitmq] Cluster on Kubernetes Failing with one node go down #32162

JalisDiehl · 2025-02-25T14:47:44Z

Name and Version

3.17.3

What architecture are you using?

amd64

What steps will reproduce the bug?

We are using Rabbitmq on Kubernetes Cluster, we started with spot instances and 3 pods and pvcs, when one node go down the cluster seems to stop

Are you using any custom parameters or values?

  valuesInline:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions: 
              - key: "karpenter.sh/capacity-type"
                operator: "In"
                values: ["on-demand"]  
    nodeSelector: 
      karpenter.sh/capacity-type: on-demand
    clustering:
      forceBoot: true
    podManagementPolicy: "Parallel"
    resources:
      requests:
        cpu: 100m
        memory: 812Mi
      limits:
        cpu: 2
        memory: 3072Mi    
    replicaCount: 3
    persistence:
      size: 40Gi
    auth:
      username: xpto
      password: bar
      erlangCookie: foo
      tls: 
        enabled: true
        failIfNoPeerCert: false
        existingSecret: "certtls"
        existingSecretFullChain: true
        sslOptionsVerify: "verify_none"
    service:
      type: LoadBalancer
      annotations:
        service.beta.kubernetes.io/aws-load-balancer-type: nlb
        service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
        external-dns.alpha.kubernetes.io/hostname: dns-value
        service.beta.kubernetes.io/aws-load-balancer-ssl-cert: ""
        service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "5671"
        prometheus.io/scrape: "true"
        prometheus.io/port: "9419"
        prometheus.io/path: "/metrics/per-object"
      epmdPortEnabled: false
      distPortEnabled: false
    ingress:
      enabled: true
      hostname: xpto
      path: /
      tls: true
      ingressClassName: nginx
      annotations:
        cert-manager.io/cluster-issuer: letsencrypt

What is the expected behavior?

Not stop work

What do you see instead?

Rabbit down

Additional information

The text was updated successfully, but these errors were encountered:

javsalgar · 2025-02-26T08:29:31Z

Hi,

Could you share the logs of the instances?

JalisDiehl · 2025-02-26T10:23:39Z

There are no logs, only pods try to start again...
But when happen again, I will try to collect

JalisDiehl · 2025-02-26T19:14:56Z

Buuut this happen when we have a node switch, because we use karpenter with ec2-spot and ec2-on-demand, my question is, how we can active HA on rabbit, it seems we have 3 pods but each pod has own queue, so it isn't on HA

dgomezleon · 2025-03-03T16:38:34Z

Hi @JalisDiehl ,

Could you please take a look at this previous case where that was discussed?

JalisDiehl · 2025-03-03T16:40:14Z

What discussion?

dgomezleon · 2025-03-03T16:56:18Z

My bad: #6276

JalisDiehl added the tech-issues The user has a technical issue about an application label Feb 25, 2025

github-actions bot added the triage Triage is needed label Feb 25, 2025

github-actions bot assigned javsalgar Feb 25, 2025

javsalgar changed the title ~~Rabbitmq Cluster on Kubernetes Failing with one node go down~~ [bitnami/rabbbitmq] Cluster on Kubernetes Failing with one node go down Feb 26, 2025

javsalgar changed the title ~~[bitnami/rabbbitmq] Cluster on Kubernetes Failing with one node go down~~ [bitnami/rabbitmq] Cluster on Kubernetes Failing with one node go down Feb 26, 2025

javsalgar added the rabbitmq label Feb 26, 2025

javsalgar added the in-progress label Feb 27, 2025

github-actions bot removed the triage Triage is needed label Feb 27, 2025

github-actions bot assigned dgomezleon and unassigned javsalgar Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bitnami/rabbitmq] Cluster on Kubernetes Failing with one node go down #32162

[bitnami/rabbitmq] Cluster on Kubernetes Failing with one node go down #32162

JalisDiehl commented Feb 25, 2025

javsalgar commented Feb 26, 2025

JalisDiehl commented Feb 26, 2025

JalisDiehl commented Feb 26, 2025

dgomezleon commented Mar 3, 2025

JalisDiehl commented Mar 3, 2025

dgomezleon commented Mar 3, 2025

[bitnami/rabbitmq] Cluster on Kubernetes Failing with one node go down #32162

[bitnami/rabbitmq] Cluster on Kubernetes Failing with one node go down #32162

Comments

JalisDiehl commented Feb 25, 2025

Name and Version

What architecture are you using?

What steps will reproduce the bug?

Are you using any custom parameters or values?

What is the expected behavior?

What do you see instead?

Additional information

javsalgar commented Feb 26, 2025

JalisDiehl commented Feb 26, 2025

JalisDiehl commented Feb 26, 2025

dgomezleon commented Mar 3, 2025

JalisDiehl commented Mar 3, 2025

dgomezleon commented Mar 3, 2025