-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrading from one Module version to Other Module Version causing Node pool to recreate #2127
Comments
@apeabody @aaron-lane @bharathkkb Can you please check once on this issue and let me know how to overcome? I see issue was opened and closed without resolution previously - #1773 |
Sorry, I have not maintained these modules for years. |
Thanks for the update @apeabody Yeah we reviewed the changes from 17.0.0 to 30.0.0 and its not affecting our workloads too much so we are good!! So for keepers is there any alternative other than updating the statefile? Because if its one off we can do it but if its coming up from version to version then its tedious task if we are managing 10's of clusters. Currently we have 25+ GKE clusters! So want to check if there is any other solution than updating the state file!! Thanks |
Updating the state file should only be required when keepers are modified AND you want to avoid replacing nodepools. While keepers don't change in every major release, there have been a number of changes since v17 was released 3 years ago. |
@apeabody - But i see the changes from 30.0.0 to 32.0.0. I added the changes in above description as well!! So we thought of changing to 32.0.0 from 30.0.0 in one dev cluster and then we again seeing keepers changes, so if its repeats its a pain point for us who is managing 25+ clusters. |
Hi @koushikgongireddy - Curious, is there a reason the nodepool can't be recreated, especially on a dev cluster? Part of the challenge is these new nodepool arguments can force re-creation at the provider level (For example |
@apeabody - We are fine on Dev Clusters, but for PROD clusters its a downtime because new node pool will be created and old will be deleted right away. Which will cause downtime for us. So updating the statefile is an option but doing for 20 PROD clusters is difficult for us, so we want to see if there is any other alternative option because we are using external module provided by GCP |
Also experiencing the issue. I'm upgrading from module v32.0.0 to v33.0.4. + gcfs_config {
+ enabled : false
} This is the only change on a node pool level which makes me think this is what causes the nodes to recreate? |
@KRASSUSS w/r/t the If it's a provider version where it's showing that diff but is not forcing recreation on a node pool, it should also be safe to apply, but you'll probably see a permadiff until the provider is upgraded. |
@koushikgongireddy you could maybe try removing the item from state (after backing up state, of course) and see if reimporting the cluster and nodepools works in your lower envs? Or make sure there aren't any config changes you have to add to the settings (e.g., formerly unsupported values that are now supported). I'm not super familiar with keepers, but it's possible that if you match the existing values in the configuration properly, you won't see a diff? For example, the labels going from set to null makes me think there are some values that might need to be reflected in your Terraform configs?
I'm hoping to eventually get more of the items that either don't support in-place updates, or don't work with updates at all, fixed, at least for the default nodepool case. I would imagine that |
@wyardley The issue is for sure keepers here. Because if you see the above description keepers is the major change in my tf plan and that's causing the random_id to get change and random_id change is causing node pool name to change and id node pool name is changing it needed Node pool to get recreated! The issue is not with enable_confidential_storage has ForceNew: true If you see below lines the keepers are changing and that change is ultimately causing force node pool recreation 17.0.0 - https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/blob/v17.0.0/modules/beta-private-cluster-update-variant/cluster.tf#L313 We tried updating state file with the values and its working fine but we want to know is it the only option or any alternative options to avoid keepers change!! |
Only the For those not making use of the node pool create before destroy behavior, the other modules such as |
@apeabody Thanks for the update, we will definitely test with beta-private-cluster and update you!! Also can you describe more on the difference between the modules we have? I mean what benefits we get with update-variant and not with private-cluster? beta-autopilot-private-cluster Thanks |
All three of these are similar in that they enable The big differences are that: |
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days |
TL;DR
Upgrading from one Module version to Other Module Version causing Node pool to recreate
Expected behavior
When we upgrade GKE module versions we are seeing breaking changes where GKE Node pools are trying to recreate.
Currently we are on old version 17.0.0 and planning to upgrade to 30.0.0 and i see there are changers in keepers which is causing the random_id to change and that's causing the node pool to recreate.
17.0.0 - https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/blob/v17.0.0/modules/beta-private-cluster-update-variant/cluster.tf#L313
30.0.0 - https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/blob/v30.0.0/modules/beta-private-cluster-update-variant/cluster.tf#L500
We also tried from 30.0.0 to 32.0.0 and same happening again as new changes are added in keepers
32.0.0 - https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/blob/v32.0.0/modules/beta-private-cluster-update-variant/cluster.tf#L591
We need help on how to upgrade to higher versions without causing the node pool to recreate
Observed behavior
When we run TF plan after upgrading to 30.0.0 we are seeing below resources are recreated
17.0.0 to 30.0.0
30.0.0 to 32.0.0
Terraform Configuration
Terraform Version
Additional information
No response
The text was updated successfully, but these errors were encountered: