Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

google_cloud_run_v2_service causes faulty state if a manual Cloud Run revision was created between terraform plan and apply #20496

Open
Dragotic opened this issue Nov 27, 2024 · 8 comments

Comments

@Dragotic
Copy link

Dragotic commented Nov 27, 2024

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to a user, that user is claiming responsibility for the issue.
  • Customers working with a Google Technical Account Manager or Customer Engineer can ask them to reach out internally to expedite investigation and resolution of this issue.

Terraform Version & Provider Version(s)

Terraform v1.6.6
on darwin_arm64

  • provider registry.terraform.io/hashicorp/google v6.5.0
  • provider registry.terraform.io/hashicorp/google-beta v6.10.0

Affected Resource(s)

google_cloud_run_v2_service

Terraform Configuration

The only important thing of the configuration is the lifecycle block

resource "google_cloud_run_v2_service" "this" {
  name     = var.service_name
  location = var.location
  ingress  = var.ingress

  dynamic "traffic" {
    for_each = var.traffic
    content {
      percent  = traffic.value.percent
      type     = traffic.value.type
      revision = traffic.value.revision
      tag      = traffic.value.tag
    }
  }

  lifecycle {
    ignore_changes = [
      client,
      client_version,
      template[0].containers[0].image,
      template[0].labels,
      template[0].containers[0].env
    ]
  }
 }

Debug Output

No response

Expected Behavior

Given an existing Cloud Run service with a container image with SHA abcdefg. And I want to update the scaling from 1 to 2 min instances.
Timeline

  1. I run terraform plan -out tf.plan
  2. There is only one change in the scaling configuration correctly in the plan from 1 to 2
  3. I manually change the Cloud Run and deploy a new revision with only change the container image from SHA abcdefg to gfedcba. The new revision is healthy.
  4. I run terraform apply tf.plan (from step 1.)
  5. The new revision is correct with min instance set to 2 (from 1) with container image SHA gfedcba

Actual Behavior

The step 5 is different:
The new revision is correct with min instance set to 2 (from 1) but with container image SHA abcdefg of the revision from step 1 and not gfedcba from step 3 which is the latest healthy revision.

Steps to reproduce

  1. terraform plan -out tf.plan
  2. Manually update the Cloud Revision with a new container image (maybe any value of the ingore_changes block will have same result)
  3. Wait for it to get healthy
  4. terraform apply tf.plan

Important Factoids

No response

References

No response

b/382557869

@Dragotic Dragotic added the bug label Nov 27, 2024
@github-actions github-actions bot added forward/review In review; remove label to forward service/run labels Nov 27, 2024
@ggtisc ggtisc self-assigned this Nov 27, 2024
@ggtisc
Copy link
Collaborator

ggtisc commented Nov 27, 2024

Hi @Dragotic!

In order to replicate this issue we need you to provide us with the variable data to which we do not have access. However, if it is sensitive information, you can provide us with any examples that can be used, but that correspond to the situation you are presenting. Example:

@Dragotic
Copy link
Author

@ggtisc

resource "google_cloud_run_v2_service" "default" {
  name     = "cloudrun-service"
  location = "us-central1"
  deletion_protection = false
  ingress = "INGRESS_TRAFFIC_ALL"
  scaling {
    min_instance_count = 1
  }

  template {
    containers {
      image = "us-docker.pkg.dev/cloudrun/container/hello@sha256:9686c5ac235ee71d02b6072d13e749b7542966a96095d9a9f1d897ea14aeba91"
    }
  }
}

apply this, then include in the resource and re-apply:

  lifecycle {
    ignore_changes = [
      client,
      client_version,
      template[0].containers[0].image,
      template[0].labels
    ]
  }

Then, update the min_instance_count = 2 and run terraform plan -out tf.plan. Manually, through the UI/CLI, update the image to us-docker.pkg.dev/cloudrun/container/hello@sha256:811b577bc868e6e61e235ada8f14c6c53f6e360a2f34ed21637629d00831bea9 and afterwards run terraform apply tf.plan.

@ggtisc
Copy link
Collaborator

ggtisc commented Nov 28, 2024

Thanks for clarifying

The lifecycle block with ignore_changes helps, but it might not capture all console modifications.

If you did changes through console you need to manually compare your Terraform configuration for this resource with the actual resource state in the GCloud console. Look for differences in properties like labels or container image versions.

Here are two options depending on what you want to achieve:

  • Option 1: Reflect console changes in Terraform: If you want Terraform to manage the resource going forward, including the console-made changes, update the relevant properties in your Terraform code to match the console configuration. This brings your Terraform code in sync with the actual resource state.

  • Option 2: Discard console changes and manage with Terraform: If you prefer to manage the resource solely with Terraform and discard the console modifications, update your Terraform code to the desired state you want to manage. This might involve removing labels or specifying the image version you want to use.

After updating your Terraform code, run terraform apply to apply the changes to your infrastructure. Terraform will detect the differences between the desired state (your updated code) and the current state, and make the necessary modifications.

Important Note: Make sure you understand the implications of each option before applying the changes. Option 1 essentially accepts the console changes into your Terraform configuration, while Option 2 discards them and enforces your Terraform-defined state.

Here are some additional tips:

  • Use Terraform's terraform state show command to inspect the current state of the resource and compare it with your code.

  • Consider using a Terraform state management tool like Terraform Cloud or Terraform Enterprise to track changes and avoid conflicts.

By following these steps, you can resolve the conflict between your Terraform code and the resource updated in the GCloud console and continue working with the resource effectively.

@Dragotic
Copy link
Author

Hey @ggtisc thank you so much for the response.

However, I don't see how either option is the ideal for me. You have a cloud run managed by terraform and then something like the image is controlled by Google Cloud Build where a new image is pushed and updates the cloud Cloud Run service.

When you use the lifecycle.ignore_changes block for the container image then this should not be tracked anymore for code changes in terraform. When the google provider sends an update request, I would assume that any parameter on the lifecycle.ignore_changes block would not be sent to the google API. That's why I feel it's a bug because the container image is sent to the API when it shouldn't.

@ggtisc
Copy link
Collaborator

ggtisc commented Nov 29, 2024

@NickElliot could you help with this issue? Is it possible that you have another way to handle it?

@ggtisc ggtisc assigned NickElliot and unassigned ggtisc Nov 29, 2024
@NickElliot
Copy link
Collaborator

NickElliot commented Dec 5, 2024

While this is a non-standard config, it is the point of the ignore_changes block to avoid applying diffs you specifically declare as ignorable.

It seems the cause of this is likely that magic-modules/mmv1/products/cloudrunv2/Service.yaml is not configured to use update_mask: true, so it's sending the entire resource in a patch request body, which is being applied in entirety when the API receives the patch request. Per documentation cloud run service supports an update mask parameter, so if there is no potential problem to existing configs, this problem could be prevented by adding configuration to the resource yaml to use an update mask.

@NickElliot NickElliot removed their assignment Dec 5, 2024
@NickElliot NickElliot removed the forward/review In review; remove label to forward label Dec 5, 2024
@Dragotic
Copy link
Author

@NickElliot so what would be the move forward for this? I believe patching only the particular configuration that terraform tracks makes the most sense and should probably be the default behavior. When explicitly setting ignore changes on specific configuration then it shouldn't be included into the API patch request.

@yanweiguo
Copy link
Contributor

@Dragotic could you just run terraform apply after updating the min_instance_count = 2 instead of terraform plan -out tf.plan and terraform apply tf.plan.

I tried terraform apply and it didn't override the image to the first version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants