Skip to content
This repository has been archived by the owner on Feb 20, 2024. It is now read-only.

Kafka fails to start after node migration in GKE #599

Open
swimand opened this issue Apr 21, 2022 · 2 comments
Open

Kafka fails to start after node migration in GKE #599

swimand opened this issue Apr 21, 2022 · 2 comments

Comments

@swimand
Copy link

swimand commented Apr 21, 2022

GCP performs automatic migrations of nodes when more resources are needed on a specific cluster or on certain updates. This causes the pods to receive new ports which I believe is causing a missmatch and leads to a failed connection between the zookeeper and kafka pods. As you can see in the following log excerpt, the kafka pod finds the zookeeper service but fails to connect:

[main-SendThread(cp-zookeeper-headless:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established, initiating session, client: /10.102.2.98:53984, server: cp-zookeeper-headless/10.102.2.98:2181"
[main] ERROR io.confluent.admin.utils.ClusterStatus - Timed out waiting for connection to Zookeeper server [cp-zookeeper-headless:2181]."
[main-SendThread(cp-zookeeper-headless:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 40001ms for sessionid 0x0"
[main] INFO org.apache.zookeeper.ZooKeeper - Session: 0x0 closed

Is there any way to avoid this issue, when running the cp-helm-charts in GKE? Am I maybe missing some configuration?

@toraxe
Copy link

toraxe commented Nov 29, 2022

I have the same problem but with a on-prem soulution with automatic migration.
Did you found any solution or workaround @swimand ?

@swimand
Copy link
Author

swimand commented Nov 29, 2022

Sadly no, the only method that works, as far as I can see is to manually delete the pods, so they are forced to get new addresses to the individual services. Because of this, and the unstructured startup sequence, we are looking into moving to bitnamis charts instead, as they seem to be configured for a more stable run, but also creating specific node-pools for the kafka so it does not need to migrate so often (theoretically only on node updates).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants