Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ContainerNodePool stuck in infinite update loop #3798

Open
3 tasks done
cypres opened this issue Feb 26, 2025 · 1 comment
Open
3 tasks done

ContainerNodePool stuck in infinite update loop #3798

cypres opened this issue Feb 26, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@cypres
Copy link

cypres commented Feb 26, 2025

Checklist

Bug Description

On Feb 20th, GKE added a new automatic label goog-gke-node-pool-provisioning-model. This causes a constant diff to happen, something that has been fixed in the terraform provider.

Unfortunately, even with cnrm.cloud.google.com/state-into-spec: absent, this causes container node pools created after Feb 20th to constantly be out-of-sync, and config connector keeps trying to reconcile it creating new operations in an infinite loop.

To break the loop we can set it on the kubernetes resource as well

resourceLabels:
  goog-gke-node-pool-provisioning-model: on-demand

Would it be possible to get the terraform provider code in configconnector updated with the fix hashicorp/terraform-provider-google#21082 ?

Additional Diagnostic Information

None needed?

Kubernetes Cluster Version

1.29

Config Connector Version

1.128.0

Config Connector Mode

cluster mode

Log Output

{"error":"summary: googleapi: Error 400: Cluster is running incompatible operation operation-1740527253984-ea460d6c-64f0-44ac-a6aa-e88c23ca1433.
Details:
[
{
"@type": "type.googleapis.com/google.rpc.RequestInfo",
"requestId": "0x7e3eb49ee966916"
},
{
"@type": "type.googleapis.com/google.rpc.ErrorInfo",
"domain": "container.googleapis.com",
"reason": "CLUSTER_ALREADY_HAS_OPERATION"
}
]
, failedPrecondition", "logger":"containernodepool-controller", "msg":"error applying desired state", "resource":{…}, "timestamp":"2025-02-26T00:25:35.148Z"}

Steps to reproduce the issue

Create a node pool in a GKE cluster after Feb 20th 2025, filling out resourceLabels.

YAML snippets

apiVersion: container.cnrm.cloud.google.com/v1beta1
kind: ContainerNodePool
metadata:
  annotations:
    cnrm.cloud.google.com/deletion-policy: abandon
    cnrm.cloud.google.com/management-conflict-prevention-policy: none
    cnrm.cloud.google.com/state-into-spec: absent
  name: my-nodepool
spec:
  nodeConfig:
    labels:
      service_name: my-service
    resourceLabels:
      service_name: my-service
@cypres cypres added the bug Something isn't working label Feb 26, 2025
@cypres
Copy link
Author

cypres commented Feb 27, 2025

The fix should be similar to #3780

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant