Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Converting DynamoDB from On Demand to Provisioned capacity while ignoring read/write capacity sets a default of 1 #38100

Open
bwhaley opened this issue Jun 24, 2024 · 4 comments
Labels
bug Addresses a defect in current functionality. service/dynamodb Issues and PRs that pertain to the dynamodb service.

Comments

@bwhaley
Copy link

bwhaley commented Jun 24, 2024

Terraform Core Version

1.7.1

AWS Provider Version

5.44

Affected Resource(s)

aws_dynamodb_table

Expected Behavior

When changing a DynamoDB table from On Demand to Provisioned capacity, an initial value capacity value is required for reads and writes. When switching to provisioned capacity, if no value is provided, it should result in an error.

Actual Behavior

The terraform-provider-aws docs state that these are required values:

read_capacity - (Optional) Number of read units for this table. If the billing_mode is PROVISIONED, this field is required.

However, the docs also state:

We recommend using lifecycle ignore_changes for read_capacity and/or write_capacity if there's autoscaling policy attached to the table.

So the guidance is to ignore changes to read/write capacity, which makes sense. However, on the initial change to PROVISIONED, if these are ignored, the code seems to default the value to 1. On a very busy table, this will result in throttling until autoscaling kicks in.

This bit me hard today. It hurt.

Relevant Error/Panic Output Snippet

No response

Terraform Configuration Files

resource "aws_dynamodb_table" "table" {
...
  billing_mode = "PROVISIONED"
  read_capacity = 1000
  write_capacity = 1000
  lifecycle {
    ignore_changes = [
      read_capacity,
      write_capacity,
    ]
  }
...
}

Steps to Reproduce

  1. Create a table with billing_mode = "PAY_PER_REQUEST"
  2. Change billing_mode = "PROVISIONED" and set read_capacity and write_capacity to some value > 1
  3. Set lifecycle { ignore_changes = [read_capacity, write_capacity] }
  4. Apply the change, and observe that the initial provisioned capacity is set to 1 for both reads and writes.

Debug Output

No response

Panic Output

No response

Important Factoids

I believe the culprit is here:

func expandProvisionedThroughputField(id string, data map[string]interface{}, key string, billingMode, oldBillingMode awstypes.BillingMode) int64 {
	v := data[key].(int)
	if v == 0 && billingMode == awstypes.BillingModeProvisioned && oldBillingMode == awstypes.BillingModePayPerRequest {
		log.Printf("[WARN] Overriding %[1]s on DynamoDB Table (%[2]s) to %[3]d. Switching from billing mode %[4]q to %[5]q without value for %[1]s. Assuming changes are being ignored.",
			key, id, provisionedThroughputMinValue, oldBillingMode, billingMode)
		v = provisionedThroughputMinValue
	}
	return int64(v)
}

Where provisionedThroughputMinValue is a constant equal to 1.

References

No response

Would you like to implement a fix?

None

@bwhaley bwhaley added the bug Addresses a defect in current functionality. label Jun 24, 2024
@github-actions github-actions bot added the service/dynamodb Issues and PRs that pertain to the dynamodb service. label Jun 24, 2024
Copy link

Community Note

Voting for Prioritization

  • Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
  • Please see our prioritization guide for information on how we prioritize.
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

  • If you are interested in working on this issue, please leave a comment.
  • If this would be your first contribution, please review the contribution guide.

@terraform-aws-provider terraform-aws-provider bot added the needs-triage Waiting for first response or review from a maintainer. label Jun 24, 2024
@bwhaley
Copy link
Author

bwhaley commented Jun 24, 2024

I'm not sure what the ideal behavior is for this, but it definitely shouldn't be to set a default magic value that is extremely low. I think it should either result in an error, or it should query to find the latest value of consumed read/write capacity units and use that as the initial value.

@hugo-barros
Copy link

hugo-barros commented Feb 12, 2025

Big +1

We also caught this in production recently in a very big table.

PR 22630, a bad call?

I may be biased but I am of the opinion this was a bad call (or things were different back then...).

Reason being, Terraform is not only assuming that (1) changes are being ignored, but also that (2) the app_autoscaling resource is there to save the day.

I agree (1) may be true most of the cases. But I don't agree that (2) should be so naively assumed.

Scenario

I'll give you a scenario, let's assume:

  • Table is PAY_PER_REQUEST, never had autoscaling on.
  • Developer decides to switch to PROVISIONED and create the autoscaling during the same operation.
  • He "forgets" to include RCU and WCU attributes, or lifecycle.ignore_changes.write_capacity (or read) is in place for those tables that had been PROVISIONED in the past, (many times being behind a module that the developer may not be aware of 😉 )
  • Apply
  • aws_dynamodb_table will be Still modifying... for few minutes (since it is under "Updating" state in the API)
  • app_autoscaling_target and app_autoscaling_policy resources were created utilising TF reference (resource_id = "table/${aws_dynamodb_table.example.name}", hence due to dependancy graph they are now waiting to be created. For few minutes my table will have WCU and RCU == 1, until it goes out of Updating state from the AWS API, and my autoscaling resources are created to -- finally -- scale my capacity units.

Impact in real world

The Updating state took 15m40s become Active in my critical production table. 🥲

About documentation

Another point I feel is worth to bring out: in the documentation for write_capacity and read_capacity, it can be read:

(Optional) Number of write units for this table. If the billing_mode is PROVISIONED, this field is required.

But in reality there is nothing "requiring" the field, because the linked MR made it "safe" by defaulting the value to 1.

At the minmal least documentation should be updated to say:

(Optional) Number of write units for this table. If the billing_mode is PROVISIONED, this field defaults to 1.

But then again...

Conclusion

...I think this is a bad call, and yes Terraform should be more pragmatic and just fail if those values are not passed (or ignored) when billing_mode = PROVISIONED.

WDYT? 😄

@hugo-barros
Copy link

I also wonder if anything has changed recently in the way Terraform "unblocks" from the Modifying of the aws_dynamodb_table.
Did it use to be faster / immediate? I can assume could have caused problems in other parts/other operations hence a fixed was proposed (to wait until Active state)?

Another path of potentially preventing this issue would be by making the error in app-autoscaling target creation a Retryable error for cases when "target resource doesn't exist". This would allow for us to define the name of the table manually, which would remove the dependancy in the graph and create the appautoscaling resource asap without waiting for the ddb table to be ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Addresses a defect in current functionality. service/dynamodb Issues and PRs that pertain to the dynamodb service.
Projects
None yet
Development

No branches or pull requests

3 participants