Recurring error "400 route operation in progress" when applying 3-network-hub-and-spoke #1228

mromascanu123 · 2024-05-11T01:26:31Z

TL;DR

This happens almost every time when deploying dev, nprod or prod. Have to plan and apply again and everything is fine . But this kind of error will ruin any pipeline deploying automatically the spokes

. . .
module.base_env.module.restricted_shared_vpc[0].module.regular_service_perimeter.google_access_context_manager_service_perimeter.regular_service_perimeter: Creating...
module.base_env.module.restricted_shared_vpc[0].module.regular_service_perimeter.google_access_context_manager_service_perimeter.regular_service_perimeter: Creation complete after 3s [id=accessPolicies/6329355927/servicePerimeters/sp_n_shared_restricted_default_perimeter_e480]
module.base_env.module.restricted_shared_vpc[0].module.regular_service_perimeter.google_access_context_manager_service_perimeter_resource.service_perimeter_resource["115822756025"]: Creating...
module.base_env.module.restricted_shared_vpc[0].module.regular_service_perimeter.google_access_context_manager_service_perimeter_resource.service_perimeter_resource["115822756025"]: Creation complete after 2s [id=accessPolicies/6329355927/servicePerimeters/sp_n_shared_restricted_default_perimeter_e480/projects/115822756025]
module.base_env.module.restricted_shared_vpc[0].google_access_context_manager_service_perimeter.bridge_to_network_hub_perimeter[0]: Creating...
module.base_env.module.restricted_shared_vpc[0].google_access_context_manager_service_perimeter.bridge_to_network_hub_perimeter[0]: Creation complete after 0s [id=accessPolicies/6329355927/servicePerimeters/spb_c_to_n_shared_restricted_bridge_e480]

Error: Error adding network peering: googleapi: Error 400: There is a route operation in progress on the local or peer network. Try again later., badRequest

with module.base_env.module.base_shared_vpc[0].module.peering[0].google_compute_network_peering.peer_network_peering,
on .terraform/modules/base_env.base_shared_vpc.peering/modules/network-peering/main.tf line 50, in resource "google_compute_network_peering" "peer_network_peering":
50: resource "google_compute_network_peering" "peer_network_peering" {

Did not investigate in detail what's going on, might be a race condition / unaccounted for dependency

Expected behavior

Should smoothly deploy - why the 2'nd time succeeds?

Observed behavior

Look at TL;DR*
Error: Error adding network peering: googleapi: Error 400: There is a route operation in progress on the local or peer network. Try again later., badRequest

Terraform Configuration

N/A - default, nothing special

Terraform Version

terraform version
Terraform v1.6.0
on linux_amd64

Additional information

No response

obriensystems · 2024-05-11T23:41:13Z

Reference: last full 3-networks-hub-and-spoke apply - up to 5-app-infra - TF 1.3.10 to avoid the issue running cloudbuild with 1.3 - if we use the default 1.7.5 (since downgraded in 1.5.7 in cloud shell)
Env: cloud shell and CB/CSR

there were several workarounds and harcoding changes I did to get up to step 17 - for nonproduction and production - and I was running the 2 default US based regions.
Canadian Public Sector Secure PBMM Landing Zone reference using Terraform 1.6 (for now 1.3.10) based on the PSO/TOC ready TEF V4 - Full clean organization deployment with mitigation/automation/parameterization modifications GoogleCloudPlatform/pbmm-on-gcp-onboarding#360 (comment)

I will also retest 3-nhas as soon as I finish the TEF upstream sync for 20240511 main in GoogleCloudPlatform/pbmm-on-gcp-onboarding#387 to reverify 3-nhas. There are 2 symlinks in nonproduction that need to be reverted un #1107 but they function with a double symlink ok for now.

sleighton2022 · 2024-05-29T16:34:03Z

If there are too many simultaneous operations on peering, this will occur. It is not occurring in our integration tests. Are environments being deployed in parallel. One workaround is to set in Terraform parallel=1, but it will make the build take a long time, as you are not running in parallel.

mromascanu123 · 2024-06-13T17:56:30Z

@sleighton2022 : there is no parallelism here and deployment is done manually. It does not happen every time, not even often. I've got one of these on 05/30 and another one today. However the problem is deeper and nastier. In both cases when one of these occurred it was associated with tfstate corruption. On 05/30 it occurred during the "apply" for "3-nhas" production and today during tf "apply" for 3-nhas development. Apparently and superficially it seemed that a retry (tf plan then apply) fixed the issue both on 05/30 and today and 3-nhas was apparently deployed without error. In reality the tfstate for the stage where the error occurred (prod on 05/30 and dev today) was corrupted and was missing variables supposed to have been generated by outputs.tf. As a result when deploying 4-projects these variables won't be found and the deployment fails for good.

Example : after today's failed deployment compared the tfstate files under key "networks" and while prod and nprod were containing same output variables (different values) quite a few were missing for dev

more precisely the below were missing, possibly other vars
"base_network_self_link": {
value = module.base_env.base_network_self_link
description = "The URI of the VPC being created"
}

"base_subnets_secondary_ranges": {
value = module.base_env.base_subnets_secondary_ranges
description = "The secondary ranges associated with these subnets"
}

"base_subnets_self_links": {
value = module.base_env.base_subnets_self_links
description = "The self-links of subnets being created"
}

mromascanu123 · 2024-06-14T12:38:33Z

Interestingly, same issue reported with project-factory module but apparently not directly related to TEF. People reporting these think the error points to a race condition

Cloud DNS and Peering - Terraform Providers / Google - HashiCorp Discuss
Peering Fails with "There is a peering operation in progress" · Issue #3026 · hashicorp/terraform-provider-google (github.com)
GCP Peering does not work · Issue #3034 · hashicorp/terraform-provider-google (github.com)

github-actions · 2024-08-13T23:17:47Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions · 2024-10-26T23:19:15Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days

mromascanu123 added the bug Something isn't working label May 11, 2024

mromascanu123 mentioned this issue Jun 14, 2024

Provide mechanism for cleanup after failed deployment to enable re-deployment #1240

Closed

mromascanu123 mentioned this issue Jun 25, 2024

doc improvement: clarify intended usage and level of support #1239

Closed

github-actions bot added the Stale label Aug 13, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 20, 2024

eeaton reopened this Aug 27, 2024

github-actions bot removed the Stale label Aug 27, 2024

github-actions bot added the Stale label Oct 26, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recurring error "400 route operation in progress" when applying 3-network-hub-and-spoke #1228

Recurring error "400 route operation in progress" when applying 3-network-hub-and-spoke #1228

mromascanu123 commented May 11, 2024

obriensystems commented May 11, 2024

sleighton2022 commented May 29, 2024

mromascanu123 commented Jun 13, 2024

mromascanu123 commented Jun 14, 2024

github-actions bot commented Aug 13, 2024

github-actions bot commented Oct 26, 2024

Recurring error "400 route operation in progress" when applying 3-network-hub-and-spoke #1228

Recurring error "400 route operation in progress" when applying 3-network-hub-and-spoke #1228

Comments

mromascanu123 commented May 11, 2024

TL;DR

This happens almost every time when deploying dev, nprod or prod. Have to plan and apply again and everything is fine . But this kind of error will ruin any pipeline deploying automatically the spokes

Expected behavior

Observed behavior

Terraform Configuration

Terraform Version

Additional information

obriensystems commented May 11, 2024

sleighton2022 commented May 29, 2024

mromascanu123 commented Jun 13, 2024

mromascanu123 commented Jun 14, 2024

github-actions bot commented Aug 13, 2024

github-actions bot commented Oct 26, 2024