Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plans with very large number of cloudflare_records take too much time to complete #4887

Closed
3 tasks done
jficz opened this issue Jan 14, 2025 · 5 comments
Closed
3 tasks done
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/debug-log-attached Indicates an issue or PR has a complete Terraform debug log.

Comments

@jficz
Copy link

jficz commented Jan 14, 2025

Confirmation

  • This is a bug with an existing resource and is not a feature request or enhancement. Feature requests should be submitted with Cloudflare Support or your account team.
  • I have searched the issue tracker and my issue isn't already found.
  • I have replicated my issue using the latest version of the provider and it is still present.

Terraform and Cloudflare provider version

Terraform 1.5.7 (we're limited to the last OSS version) / OpenTofu 1.6.2

Provider version 4.39 (because of #4280) but tried with 4.50 with similar results.

Affected resource(s)

cloudflare_record

Terraform configuration files

resource "cloudflare_record" "record0001" {
  content = "192.0.2.1"
  name    = "www"
  proxied = false
  ttl     = 1
  type    = "A"
  zone_id = "<zone_id>
}

# ...
# repeat ^^^ many times
# ...

resource "cloudflare_record" "record4356" {
  content = "www"
  name    = "web"
  proxied = false
  ttl     = 1
  type    = "CNAME"
  zone_id = "<zone_id>
}

Link to debug output

https://gist.github.com/jficz/a0c393bef69720d882dbec8bacba32c2

Panic output

No response

Expected output

Much faster plan execution.

Actual output

takes about an hour to execute a full import of ~5000 records

takes about half that time to update just a single record

Steps to reproduce

  1. create many records (3000+)
  2. run terraform plan/apply
  3. wait....

Additional factoids

Debug log just for a refresh of ~3500 records with very few changes has 19MB. Attached debug log only has two imported records and one change but the information in it is otherwise the same as in the large one (sans the other 3500 records).

Due to API rate limiting, zones with thousands of DNS records take ages to properly refresh and apply which makes the provider highly impractical for large deployments.

Such long runs cause issues with CI, block resources and bandwidth for very long time even for small changes.

The API provides batch operation endpoins for both record list and record change.

It would be great if the provider used these endpoints for refresh and update instead of iterating through the records one by one. That would likely speed up operations noticeably even for small deployments but would be several order of magnitude improvement for large deployments.

-refresh=false is a possible workaround for some use cases but introduces other problems like configuration drift

We in fact use a for_each loop in generating the resources but the example above causes the same issues.

References

No response

@jficz jficz added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 14, 2025
Copy link

Terraform debug log detected ✅

@github-actions github-actions bot added the triage/debug-log-attached Indicates an issue or PR has a complete Terraform debug log. label Jan 14, 2025
Copy link

Community Note

Voting for Prioritization

  • Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

  • If you are interested in working on this issue, please leave a comment.
  • If this would be your first contribution, please review the contribution guide.

@jacobbednarz
Copy link
Member

this is expected behaviour from the provider when it runs into the rate limits. it will back off with jitter until the operation completes (even if it is hours).

there are a few options here:

@jacobbednarz jacobbednarz closed this as not planned Won't fix, can't repro, duplicate, stale Jan 14, 2025
@jficz
Copy link
Author

jficz commented Jan 14, 2025

Unfortunately, restructuring won't help in our case, about 3500 records are in a single zone.

I understand this is expected with the current code base, that's why I suggested changing the approach to use batch processing API endpoints instead of doing each record separately and I already optimized the rate.

@jacobbednarz
Copy link
Member

we won't swap to batch endpoints as terraform operates on a single resource model and it would be working against the intended design of terraform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/debug-log-attached Indicates an issue or PR has a complete Terraform debug log.
Projects
None yet
Development

No branches or pull requests

2 participants