-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deploying your own VPA with leader election enabled in GKE conflicts with the GKE system component #7461
Comments
Some thoughts on mitigating this:
|
I see two paths forward to fixing this:
I can't speak for what that lease is being used for in GKE, but I can only assume that changing that lease is difficult or impossible in GKE. Given that the lease(s) in VPA are only used for VPA components, and running multiple recommenders and updaters for a brief period isn't that worst thing in the world, my vote is that we change the default lease name in the VPA. Any VPA configured with the lease enabled will only be running multiple pods for a short period of time, which should be fine. It's obviously not an amazing path forward, but may be worth doing. I'm curios what @voelzmo and @kwiesmueller think, as they may be the ones approving that controversial PR. |
If we do go this path, i suggest we also make PRs into 3rd party Helm charts to ensure they support the new default name. Some of them hardcode the lease name:
|
I'm not sure if changing the default in OSS is the right approach because GKE claims it too. cc. @raywainman |
My concern is that if someone follows the happy path of deployment, and enables the leases without reading documentation, then GKE breaks in ways that are not obvious. It took us 4 days and countless messages with GCP support before we found what GCP was doing with the The pain of renaming the default OSS lease is pretty small. |
/area vertical-pod-autoscaler |
Just thinking out loud - what is the worst that can happen when changing a lease name? On upgrade, we would have two For a short period of time, both of them will try to compute and apply VPA recommendations. Off the top of my head, these should all generally be the same recommendations, we will just have the possibility of racing/duplication. Is there any other risk anyone else can think of? |
Yup! That is my thought too.
Correct! My understanding is that the bad thing that could happen is that 2 VPA recommenders should be giving roughly the same recommendation, for a very short period of time, after which only 1 will be active. And, the VPA recommender's |
@ialidzhikov what do you think? My vote is to change it now before the next release (we could even consider a cherrypick here) given that:
|
cc @voelzmo as well |
Can maybe something go wrong with the checkpoint? If the new version leader writes a checkpoint that the old version leader doesn't understand perhaps? EDIT: Looks like we have support for this case. |
I agree with all these points. If we're going to change the lease name, I think now is the perfect time to do so. |
Hi folks,
The leader election resource name and namespace are configurable. If you are installing a custom VPA to a GKE cluster, please use the |
I am also open to rename the default lease name in the VPA repo to avoid this issue. |
Alright, as it's still new and it's an easy way to avoid issues before too many people use it. I guess changing it now is fine. |
We'll need to update this: https://github.com/cowboysysop/charts/blob/master/charts/vertical-pod-autoscaler/templates/recommender/clusterrole.yaml#L141-L142 which was added in cowboysysop/charts#705. If we combine it with a release then this should be less painful. |
Opened cowboysysop/charts#781 to update the Helm chart. |
Which component are you using?:
vertical-pod-autoscaler
What version of the component are you using?:
Component version:
vertical-pod-autoscaler-1.2.0
+Leader election functionality was added in #6985 and is turned off by default
What k8s version are you using (
kubectl version
)?:Any version 1.27+
What environment is this in?:
GKE
What did you expect to happen?:
The self-deployed VPA recommender and the GKE implementation of HPA to continue working.
What happened instead?:
Both the self-deployed VPA recommender and the GKE version use a lease called
vpa-recommender
inkube-system
.If you deploy your own VPA recommender, it might "steal" the lease and prevent the GKE implementation of HPA.
How to reproduce it (as minimally and precisely as possible):
vpa-up.sh
). Make sure leader election is enabled (leader-elect=true
).Anything else?
This is due to the unfortunate naming collision between GKE's system controller (also called
vpa-recommender
and the one provided here)The text was updated successfully, but these errors were encountered: