Skip to content

WaitForState() leaves Refresh() running in the background on timeout #530

@enool

Description

@enool

SDK version

v1.13.1-1

Relevant provider source code

Taken from: https://github.com/terraform-providers/terraform-provider-aws/blob/master/aws/resource_aws_iam_role.go#L152

var createResp *iam.CreateRoleOutput
err := resource.Retry(30*time.Second, func() *resource.RetryError {
    var err error
    createResp, err = iamconn.CreateRole(request)                <-- Has internally a retry loop, can block more then 30 seconds
    // IAM users (referenced in Principal field of assume policy)
    // can take ~30 seconds to propagate in AWS
    if isAWSErr(err, "MalformedPolicyDocument", "Invalid principal in policy") {
        return resource.RetryableError(err)
    }
    return resource.NonRetryableError(err)
})
if isResourceTimeoutError(err) {                                <-- Goroutine started in Retry (WaitForState) can still be running
    createResp, err = iamconn.CreateRole(request)               <-- Issues another blocking CreateRole
}

Debug Output

[DEBUG] [aws-sdk-go] DEBUG: Request iam/CreateRole Details:
[DEBUG] [aws-sdk-go] DEBUG: Send Request iam/CreateRole failed, attempt 0/25, error RequestError: send request failed
[DEBUG] [aws-sdk-go] DEBUG: Retrying Request iam/CreateRole, attempt 1
[DEBUG] [aws-sdk-go] DEBUG: Request iam/CreateRole Details:
[WARN] WaitForState timeout after 30s
[WARN] WaitForState starting 30s refresh grace period
[DEBUG] [aws-sdk-go] DEBUG: Send Request iam/CreateRole failed, attempt 1/25, error RequestError: send request failed
[DEBUG] [aws-sdk-go] DEBUG: Retrying Request iam/CreateRole, attempt 2
[DEBUG] [aws-sdk-go] DEBUG: Request iam/CreateRole Details:
[ERROR] WaitForState exceeded refresh grace period
[DEBUG] [aws-sdk-go] DEBUG: Request iam/CreateRole Details:

Expected Behavior

resource.Retry() blocks until callback finishes - even on timeout

Actual Behavior

Callback is still running when timeout happens. For example, AWS provider issues another, parallel, CreateRole(). In many cases this results double creation attempt, and eventually a failure in the plugin.

Error: Error creating IAM Role hello-world-ssm_role: EntityAlreadyExists: Role with name hello-world-ssm_role already exists.
status code: 409, request id: removed

on main.tf line 18, in resource "aws_iam_role" "ssm_role":
18: resource "aws_iam_role" "ssm_role"

References

We have been running a crude patch ( #529 ) in production for a few weeks with good results.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions