[Bug]: Converting DynamoDB from On Demand to Provisioned capacity while ignoring read/write capacity sets a default of 1 #38100

bwhaley · 2024-06-24T23:01:42Z

Terraform Core Version

1.7.1

AWS Provider Version

5.44

Affected Resource(s)

aws_dynamodb_table

Expected Behavior

When changing a DynamoDB table from On Demand to Provisioned capacity, an initial value capacity value is required for reads and writes. When switching to provisioned capacity, if no value is provided, it should result in an error.

Actual Behavior

The terraform-provider-aws docs state that these are required values:

read_capacity - (Optional) Number of read units for this table. If the billing_mode is PROVISIONED, this field is required.

However, the docs also state:

We recommend using lifecycle ignore_changes for read_capacity and/or write_capacity if there's autoscaling policy attached to the table.

So the guidance is to ignore changes to read/write capacity, which makes sense. However, on the initial change to PROVISIONED, if these are ignored, the code seems to default the value to 1. On a very busy table, this will result in throttling until autoscaling kicks in.

This bit me hard today. It hurt.

Relevant Error/Panic Output Snippet

No response

Terraform Configuration Files

resource "aws_dynamodb_table" "table" {
...
  billing_mode = "PROVISIONED"
  read_capacity = 1000
  write_capacity = 1000
  lifecycle {
    ignore_changes = [
      read_capacity,
      write_capacity,
    ]
  }
...
}

Steps to Reproduce

Create a table with billing_mode = "PAY_PER_REQUEST"
Change billing_mode = "PROVISIONED" and set read_capacity and write_capacity to some value > 1
Set lifecycle { ignore_changes = [read_capacity, write_capacity] }
Apply the change, and observe that the initial provisioned capacity is set to 1 for both reads and writes.

Debug Output

No response

Panic Output

No response

Important Factoids

I believe the culprit is here:

func expandProvisionedThroughputField(id string, data map[string]interface{}, key string, billingMode, oldBillingMode awstypes.BillingMode) int64 {
	v := data[key].(int)
	if v == 0 && billingMode == awstypes.BillingModeProvisioned && oldBillingMode == awstypes.BillingModePayPerRequest {
		log.Printf("[WARN] Overriding %[1]s on DynamoDB Table (%[2]s) to %[3]d. Switching from billing mode %[4]q to %[5]q without value for %[1]s. Assuming changes are being ignored.",
			key, id, provisionedThroughputMinValue, oldBillingMode, billingMode)
		v = provisionedThroughputMinValue
	}
	return int64(v)
}

Where provisionedThroughputMinValue is a constant equal to 1.

References

No response

Would you like to implement a fix?

None

The text was updated successfully, but these errors were encountered:

github-actions · 2024-06-24T23:01:54Z

Community Note

Voting for Prioritization

Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
Please see our prioritization guide for information on how we prioritize.
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

If you are interested in working on this issue, please leave a comment.
If this would be your first contribution, please review the contribution guide.

bwhaley · 2024-06-24T23:03:58Z

I'm not sure what the ideal behavior is for this, but it definitely shouldn't be to set a default magic value that is extremely low. I think it should either result in an error, or it should query to find the latest value of consumed read/write capacity units and use that as the initial value.

hugo-barros · 2025-02-12T04:15:09Z

Big +1

We also caught this in production recently in a very big table.

PR 22630, a bad call?

I may be biased but I am of the opinion this was a bad call (or things were different back then...).

Reason being, Terraform is not only assuming that (1) changes are being ignored, but also that (2) the app_autoscaling resource is there to save the day.

I agree (1) may be true most of the cases. But I don't agree that (2) should be so naively assumed.

Scenario

I'll give you a scenario, let's assume:

Table is PAY_PER_REQUEST, never had autoscaling on.
Developer decides to switch to PROVISIONED and create the autoscaling during the same operation.
He "forgets" to include RCU and WCU attributes, or lifecycle.ignore_changes.write_capacity (or read) is in place for those tables that had been PROVISIONED in the past, (many times being behind a module that the developer may not be aware of 😉 )
Apply
aws_dynamodb_table will be Still modifying... for few minutes (since it is under "Updating" state in the API)
app_autoscaling_target and app_autoscaling_policy resources were created utilising TF reference (resource_id = "table/${aws_dynamodb_table.example.name}", hence due to dependancy graph they are now waiting to be created. For few minutes my table will have WCU and RCU == 1, until it goes out of Updating state from the AWS API, and my autoscaling resources are created to -- finally -- scale my capacity units.

Impact in real world

The Updating state took 15m40s become Active in my critical production table. 🥲

About documentation

Another point I feel is worth to bring out: in the documentation for write_capacity and read_capacity, it can be read:

(Optional) Number of write units for this table. If the billing_mode is PROVISIONED, this field is required.

But in reality there is nothing "requiring" the field, because the linked MR made it "safe" by defaulting the value to 1.

At the minmal least documentation should be updated to say:

(Optional) Number of write units for this table. If the billing_mode is PROVISIONED, this field defaults to 1.

But then again...

Conclusion

...I think this is a bad call, and yes Terraform should be more pragmatic and just fail if those values are not passed (or ignored) when billing_mode = PROVISIONED.

WDYT? 😄

hugo-barros · 2025-02-12T06:46:41Z

I also wonder if anything has changed recently in the way Terraform "unblocks" from the Modifying of the aws_dynamodb_table.
Did it use to be faster / immediate? I can assume could have caused problems in other parts/other operations hence a fixed was proposed (to wait until Active state)?

Another path of potentially preventing this issue would be by making the error in app-autoscaling target creation a Retryable error for cases when "target resource doesn't exist". This would allow for us to define the name of the table manually, which would remove the dependancy in the graph and create the appautoscaling resource asap without waiting for the ddb table to be ready.

bwhaley added the bug Addresses a defect in current functionality. label Jun 24, 2024

github-actions bot added the service/dynamodb Issues and PRs that pertain to the dynamodb service. label Jun 24, 2024

terraform-aws-provider bot added the needs-triage Waiting for first response or review from a maintainer. label Jun 24, 2024

justinretzolk removed the needs-triage Waiting for first response or review from a maintainer. label Jun 25, 2024

YakDriver mentioned this issue Jan 31, 2025

[Bug]: Only one TargetTrackingScaling policy for a given metric (aws_dynamodb_table and aws_appautoscaling_policy) #41179

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Converting DynamoDB from On Demand to Provisioned capacity while ignoring read/write capacity sets a default of 1 #38100

[Bug]: Converting DynamoDB from On Demand to Provisioned capacity while ignoring read/write capacity sets a default of 1 #38100

bwhaley commented Jun 24, 2024

github-actions bot commented Jun 24, 2024

bwhaley commented Jun 24, 2024

hugo-barros commented Feb 12, 2025 •

edited

Loading

hugo-barros commented Feb 12, 2025

[Bug]: Converting DynamoDB from On Demand to Provisioned capacity while ignoring read/write capacity sets a default of 1 #38100

[Bug]: Converting DynamoDB from On Demand to Provisioned capacity while ignoring read/write capacity sets a default of 1 #38100

Comments

bwhaley commented Jun 24, 2024

Terraform Core Version

AWS Provider Version

Affected Resource(s)

Expected Behavior

Actual Behavior

Relevant Error/Panic Output Snippet

Terraform Configuration Files

Steps to Reproduce

Debug Output

Panic Output

Important Factoids

References

Would you like to implement a fix?

github-actions bot commented Jun 24, 2024

Community Note

bwhaley commented Jun 24, 2024

hugo-barros commented Feb 12, 2025 • edited Loading

Big +1

PR 22630, a bad call?

Scenario

Impact in real world

About documentation

Conclusion

hugo-barros commented Feb 12, 2025

hugo-barros commented Feb 12, 2025 •

edited

Loading