Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Breaking] - Ignore changes on kubernetes_version from outside of Terraform #336

Merged
merged 1 commit into from
Apr 12, 2023

Conversation

lonegunmanb
Copy link
Member

@lonegunmanb lonegunmanb commented Mar 27, 2023

As #335 described, when automatic_channel_upgrade is patch, Azure would upgrade the aks cluster automatically, and that could lead us to an unwanted configuration drift.

By using azapi update resource to update aks kubernetes_version, when user change it at Terraform side an update would be triggered, otherwise all other changes from outside of Terraform would be ignored.

It's still a draft and it could be considered as a breaking change, so I'd like to be steady and with caution.

@zioproto @matt-FFFFFF @grayzu @the-technat @mazilu88 @nlamirault @gzur @digiserg @eyenx @vermacodes @jvelasquez @viters @matelang WDYT?

Describe your changes

Issue number

#335

Checklist before requesting a review

  • The pr title can be used to describe what this pr did in CHANGELOG.md file
  • I have executed pre-commit on my machine
  • I have passed pr-check on my machine

Thanks for your cooperation!

@github-advanced-security
Copy link

You have successfully added a new CodeQL configuration .github/workflows/pr-check.yaml:prepr-check. As part of the setup process, we have scanned this repository and found no existing alerts. In the future, you will see all code scanning alerts on the repository Security tab.

}
}

resource "azapi_update_resource" "aks_cluster_post_create" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So an update of AKS would be done through this resource rather than the actual cluster resource?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@the-technat yes, since in this approach we rely on null_resource's replacement to trigger the update, we cannot execute the update via azurerm resource since the whole resource would be re-created, so it must be a virtual resource like azapi_update_resource.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@the-technat So we're currently upgrading our AKS module from 6.2.0 --> 7.2.0. We're getting a module.aks.azapi_update_resource.aks_cluster_post_create to be added. Can you please make me understand what is the purpose the resource created and is it safe to apply? Will the future updates of AKS take place through the created resource? Also we don't have azapi provider in our code is it necessary to add? Thanks.

@viters
Copy link
Contributor

viters commented Mar 29, 2023

Does setting specific version like 1.25.6 block the Azure from applying automatic_channel_upgrade=patch?

If yes - IMHO it is a problem on the Azure side and I agree with your solution, which fixes the Azure issue. From Terraform's perspective it is a clever workaround, to be frank I am surprised that Terraform does not have lifecycle for such situations.

I expect that setting specific version does not block the automatic upgrades - and therefore I would say "if Azure upgrades your cluster and Terraform informs you that the state has changed, just adjust your Terraform files and state to match up". We should try to resemble the state of infrastructure as close as we can with Terraform, rather than covering it up.

@zioproto
Copy link
Collaborator

@viters I opened Azure/AKS#3573 to give feedback for the AKS API. Please feel free to add feedback for the Azure AKS team there. Thanks !

@viters
Copy link
Contributor

viters commented Mar 29, 2023

@zioproto Thanks for link, but I think we are tackling different problems. Have you maybe tested: if you create a cluster with version 1.25.5 and set automatic_channel_upgrade=patch, will it upgrade to 1.25.6? If it will, then I think AKS works properly (apart from API inconsistency that you mentioned) and I think this PR's change is unwanted and #335 should be marked as won't fix.

@the-technat
Copy link
Contributor

@zioproto Thanks for link, but I think we are tackling different problems. Have you maybe tested: if you create a cluster with version 1.25.5 and set automatic_channel_upgrade=patch, will it upgrade to 1.25.6? If it will, then I think AKS works properly (apart from API inconsistency that you mentioned) and I think this PR's change is unwanted and #335 should be marked as won't fix.

I tested this behavior some months ago, it will detect a state drift and try to downgrade the cluster.

@matt-FFFFFF
Copy link
Member

Good workaround @lonegunmanb

@viters
Copy link
Contributor

viters commented Mar 29, 2023

@the-technat
Ok, so AKS works properly. Now, you run a plan for your AKS and you see that the version has been updated. You have two options: downgrade the cluster to meet your Terraform requirements or update your Terraform requirements to resemble the infrastructure. I see your point and room for error, but I just think that Terraform is here document the infrastructure and you should verify what plan does and adjust your files if needed.

It is just my opinion - if you think that the chance to run into issues is high, Terraform is here to help us rather than get us into trouble - this PR solves this case well.

@zioproto
Copy link
Collaborator

@zioproto Thanks for link, but I think we are tackling different problems. Have you maybe tested: if you create a cluster with version 1.25.5 and set automatic_channel_upgrade=patch, will it upgrade to 1.25.6? If it will, then I think AKS works properly (apart from API inconsistency that you mentioned) and I think this PR's change is unwanted and #335 should be marked as won't fix.

In an ideal situation we should track in the Terraform state only the major and minor version in the Terraform state and let AKS handle the changes in the patch version when the automatic_channel_upgrade=patch.

Because of the behaviour documented in Azure/AKS#3573 it was missed that the patch version is actually returned by the API and there is this collision with the information saved in the Terraform state.

@viters
Copy link
Contributor

viters commented Mar 29, 2023

In an ideal situation we should track in the Terraform state only the major and minor version in the Terraform state and let AKS handle the changes in the patch version when the automatic_channel_upgrade=patch.

Right, now I see and I agree. Your point is that when you set automatic_channel_upgrade=patch, then you care only about major and minor. I missed it before.

The only concern I have left is, as this workaround is a breaking change, assuming that Azure would fix API in the future, reverting this workaround will be a breaking change as well. Although, I guess it is better to have workaround quickly rather than no-ETA waiting for Azure to adjust the API.

@zioproto
Copy link
Collaborator

Although, I guess it is better to have workaround quickly rather than no-ETA waiting for Azure to adjust the API.

The ETA for an eventual AKS API change depends on the quality of feedback that we are able to write as a community on Azure/AKS#3573

@lonegunmanb
Copy link
Member Author

As automatic_channel_upgrade=patch would trigger changes on patch version from the service side, this approach would ignore all these kind of patch upgrades, but once we've changed kubernetes_version on Terraform side, an update would be executed.

On the other hand, this pr should be considered as an unavoidable change, so the community's feedback is critical to us.

@the-technat
Copy link
Contributor

@viters just to give some feedback to your questions. I think that Terraform is here to reflect what's the current state. So when I specify 1.25.6 in Terraform, I expect AKS to be on 1.25.6 and not conflict with each-other, unless you explicitly told it to upgrade itself in the background, but then you are responsible for updating your HCL code as well (this module sort of helps you there as it has checks that prevent you from configuring automatic patch upgrades if k8s patch versions are specified).

The problem for us (and probably other companies) is that we have a pinned version of the Terraform code for most of our clusters which get's only updated during regular releases. So we can't just change the code whenever AKS updated to a new patch version. That's why we only specify 1.25 in our code. So then this is a legitimate problem that Terraform shouldn't be confused when the AKS API returns 1.25.6 instead of 1.25 as it's outside of what Terraform should manage for us.

Does this explain some of our background?

Of course it's up to discussion who should handle that drift and as already discussed in this PR the ideal situation would be if the AKS API doesn't return a patch version (or in a separate field) when we haven't specified one in Terraform.

Thus I think this PR is a good workaround but shouldn't be considered a final implementation right? The final implementation would be done in the AzureRM provider as soon as the AKS API changes it's spec?

But for the meantime I could live with this workaround and as @viters already said it's better to have one rather than being blocked or eventually letting Terraform downgrade/recreate your cluster out of a sudden.

Oh and one last side-node: If you are still on 1.24 as we are, this problem doesn't affect you that much as there are not that frequent (if any) new patches for 1.24 ;)

@the-technat
Copy link
Contributor

@zioproto what sort of feedback should we add to the related AKS API issue?

@zioproto
Copy link
Collaborator

@zioproto what sort of feedback should we add to the related AKS API issue?

+1 the issue Azure/AKS#3573 with a 👍

I see that you wrote on this thread:

this is a legitimate problem that Terraform shouldn't be confused when the AKS API returns 1.25.6 instead of 1.25

This is a good feedback to report on that issue.

@cveld
Copy link

cveld commented Apr 2, 2023

Not sure why it happens, but we are on 1.24 and even then after a while aks has changed the property to 1.24.9 and the azurerm provider detects a change. The azurerm provider does not recreate the cluster, but it does take its time to propagate the change from 1.24.9 to 1.24. This change detection is undesired.

@the-technat
Copy link
Contributor

Not sure why it happens, but we are on 1.24 and even then after a while aks has changed the property to 1.24.9 and the azurerm provider detects a change. The azurerm provider does not recreate the cluster, but it does take its time to propagate the change from 1.24.9 to 1.24. This change detection is undesired.

Exact same behavior here, we're on 1.24 too and nothing happened for weeks until suddenly this morning the change happened (e.g the clusters are updating and reporting the wrong version).

@lonegunmanb
Copy link
Member Author

I'd like to merge this pr as a mitigation method, we have about one month to make the final decision, any objection?

@lonegunmanb lonegunmanb changed the title [WIP] - Ignore changes on kubernetes_version from outside of Terraform [Breaking] - Ignore changes on kubernetes_version from outside of Terraform Apr 10, 2023
@lonegunmanb lonegunmanb added this to the 7.0.0 milestone Apr 10, 2023
…r changed it at Terraform side, ignore all other changes outside of Terraform
Copy link
Collaborator

@jiaweitao001 jiaweitao001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@jiaweitao001 jiaweitao001 merged commit c26e980 into Azure:main Apr 12, 2023
@lonegunmanb lonegunmanb deleted the f-335 branch November 21, 2023 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants