Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bitnami/rabbitmq] Cluster on Kubernetes Failing with one node go down #32162

Open
JalisDiehl opened this issue Feb 25, 2025 · 6 comments
Open
Assignees
Labels
in-progress rabbitmq tech-issues The user has a technical issue about an application

Comments

@JalisDiehl
Copy link

Name and Version

3.17.3

What architecture are you using?

amd64

What steps will reproduce the bug?

We are using Rabbitmq on Kubernetes Cluster, we started with spot instances and 3 pods and pvcs, when one node go down the cluster seems to stop

Are you using any custom parameters or values?

  valuesInline:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions: 
              - key: "karpenter.sh/capacity-type"
                operator: "In"
                values: ["on-demand"]  
    nodeSelector: 
      karpenter.sh/capacity-type: on-demand
    clustering:
      forceBoot: true
    podManagementPolicy: "Parallel"
    resources:
      requests:
        cpu: 100m
        memory: 812Mi
      limits:
        cpu: 2
        memory: 3072Mi    
    replicaCount: 3
    persistence:
      size: 40Gi
    auth:
      username: xpto
      password: bar
      erlangCookie: foo
      tls: 
        enabled: true
        failIfNoPeerCert: false
        existingSecret: "certtls"
        existingSecretFullChain: true
        sslOptionsVerify: "verify_none"
    service:
      type: LoadBalancer
      annotations:
        service.beta.kubernetes.io/aws-load-balancer-type: nlb
        service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
        external-dns.alpha.kubernetes.io/hostname: dns-value
        service.beta.kubernetes.io/aws-load-balancer-ssl-cert: ""
        service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "5671"
        prometheus.io/scrape: "true"
        prometheus.io/port: "9419"
        prometheus.io/path: "/metrics/per-object"
      epmdPortEnabled: false
      distPortEnabled: false
    ingress:
      enabled: true
      hostname: xpto
      path: /
      tls: true
      ingressClassName: nginx
      annotations:
        cert-manager.io/cluster-issuer: letsencrypt

What is the expected behavior?

Not stop work

What do you see instead?

Rabbit down

Additional information

Image

@JalisDiehl JalisDiehl added the tech-issues The user has a technical issue about an application label Feb 25, 2025
@github-actions github-actions bot added the triage Triage is needed label Feb 25, 2025
@javsalgar javsalgar changed the title Rabbitmq Cluster on Kubernetes Failing with one node go down [bitnami/rabbbitmq] Cluster on Kubernetes Failing with one node go down Feb 26, 2025
@javsalgar javsalgar changed the title [bitnami/rabbbitmq] Cluster on Kubernetes Failing with one node go down [bitnami/rabbitmq] Cluster on Kubernetes Failing with one node go down Feb 26, 2025
@javsalgar
Copy link
Contributor

Hi,

Could you share the logs of the instances?

@JalisDiehl
Copy link
Author

There are no logs, only pods try to start again...
But when happen again, I will try to collect

@JalisDiehl
Copy link
Author

Buuut this happen when we have a node switch, because we use karpenter with ec2-spot and ec2-on-demand, my question is, how we can active HA on rabbit, it seems we have 3 pods but each pod has own queue, so it isn't on HA

@github-actions github-actions bot removed the triage Triage is needed label Feb 27, 2025
@github-actions github-actions bot assigned dgomezleon and unassigned javsalgar Feb 27, 2025
@dgomezleon
Copy link
Member

Hi @JalisDiehl ,

Could you please take a look at this previous case where that was discussed?

@JalisDiehl
Copy link
Author

What discussion?

@dgomezleon
Copy link
Member

My bad: #6276

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in-progress rabbitmq tech-issues The user has a technical issue about an application
Projects
None yet
Development

No branches or pull requests

3 participants