ContainerNodePool stuck in infinite update loop #3798

cypres · 2025-02-26T02:46:48Z

Checklist

I did not find a related open issue.
I did not find a solution in the troubleshooting guide: (https://cloud.google.com/config-connector/docs/troubleshooting)
If this issue is time-sensitive, I have submitted a corresponding issue with GCP support.

Bug Description

On Feb 20th, GKE added a new automatic label goog-gke-node-pool-provisioning-model. This causes a constant diff to happen, something that has been fixed in the terraform provider.

Unfortunately, even with cnrm.cloud.google.com/state-into-spec: absent, this causes container node pools created after Feb 20th to constantly be out-of-sync, and config connector keeps trying to reconcile it creating new operations in an infinite loop.

To break the loop we can set it on the kubernetes resource as well

resourceLabels:
  goog-gke-node-pool-provisioning-model: on-demand

Would it be possible to get the terraform provider code in configconnector updated with the fix hashicorp/terraform-provider-google#21082 ?

Additional Diagnostic Information

None needed?

Kubernetes Cluster Version

1.29

Config Connector Version

1.128.0

Config Connector Mode

cluster mode

Log Output

{"error":"summary: googleapi: Error 400: Cluster is running incompatible operation operation-1740527253984-ea460d6c-64f0-44ac-a6aa-e88c23ca1433.
Details:
[
{
"@type": "type.googleapis.com/google.rpc.RequestInfo",
"requestId": "0x7e3eb49ee966916"
},
{
"@type": "type.googleapis.com/google.rpc.ErrorInfo",
"domain": "container.googleapis.com",
"reason": "CLUSTER_ALREADY_HAS_OPERATION"
}
]
, failedPrecondition", "logger":"containernodepool-controller", "msg":"error applying desired state", "resource":{…}, "timestamp":"2025-02-26T00:25:35.148Z"}

Steps to reproduce the issue

Create a node pool in a GKE cluster after Feb 20th 2025, filling out resourceLabels.

YAML snippets

apiVersion: container.cnrm.cloud.google.com/v1beta1
kind: ContainerNodePool
metadata:
  annotations:
    cnrm.cloud.google.com/deletion-policy: abandon
    cnrm.cloud.google.com/management-conflict-prevention-policy: none
    cnrm.cloud.google.com/state-into-spec: absent
  name: my-nodepool
spec:
  nodeConfig:
    labels:
      service_name: my-service
    resourceLabels:
      service_name: my-service

The text was updated successfully, but these errors were encountered:

cypres · 2025-02-27T20:48:31Z

The fix should be similar to #3780

cypres added the bug Something isn't working label Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ContainerNodePool stuck in infinite update loop #3798

ContainerNodePool stuck in infinite update loop #3798

cypres commented Feb 26, 2025 •

edited

Loading

cypres commented Feb 27, 2025

ContainerNodePool stuck in infinite update loop #3798

ContainerNodePool stuck in infinite update loop #3798

Comments

cypres commented Feb 26, 2025 • edited Loading

Checklist

Bug Description

Additional Diagnostic Information

Kubernetes Cluster Version

Config Connector Version

Config Connector Mode

Log Output

Steps to reproduce the issue

YAML snippets

cypres commented Feb 27, 2025

cypres commented Feb 26, 2025 •

edited

Loading