Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

Availability when killing all pods in a cluster #228

Closed
jonatanwulcan opened this issue Sep 11, 2019 · 3 comments
Closed

Availability when killing all pods in a cluster #228

jonatanwulcan opened this issue Sep 11, 2019 · 3 comments

Comments

@jonatanwulcan
Copy link

Hey,
I'm playing around with kubemci to figure out if it's a good match for the product I'm currently working on. I tried the zone-printer demo and then tried out manually going in and delete the pod that was running in the cluster closest to me.

The result was that the service went down until the pod had restarted. Is this expected behaviour? I was hoping the the traffic would fail over to another cluster.

@nikhiljindal
Copy link
Contributor

Yes traffic should fail over to another cluster. Maybe the pod was restarted before GCLB detected that the pod was down?

Can you try changing the health check configuration so that it detects failures faster?
You cannot use kubemci to modify it, but can use gcloud or Google Cloud Console directly to update the Health check created by kubemci.
#135 has some relevant discussion about this.

Many customers run multiple replicas in their cluster to mitigate this issue. Setting up Cluster autoscaling and Pod autoscaling will help as well.

@jonatanwulcan
Copy link
Author

Thanks for your reply Nikhil. I'll look into updating the health check configuration and I'll report back if this solves the problem.

How fast can I expect failover to happen when a cluster goes down?

Also I was wondering about cluster auto scaling and kubemci. Since you're recommending it I suppose it's supported. How fast will GCLB discover new nodes added to the cluster by the auto scaler?

@jonatanwulcan
Copy link
Author

I tried out changing the health check configuration. I set it to 5s interval 5s timeout. Fail on 1 consecutive and succeed on 1 consecutive.

For others reading this. You can find the health check configuration in google cloud console under Compute Engine -> Health Checks.

Works just as expected now! Thanks for the help!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants