Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition results in EIP association failing #24

Closed
brodygov opened this issue Nov 27, 2017 · 4 comments · Fixed by 18F/identity-cookbooks#57
Closed

Race condition results in EIP association failing #24

brodygov opened this issue Nov 27, 2017 · 4 comments · Fixed by 18F/identity-cookbooks#57

Comments

@brodygov
Copy link

The EC2 API for associating EIPs appears to be eventually consistent. So if two servers attempt to grab the same EIP at the same time, one of them will succeed and the other will fail but think it succeeded.

Example logs from this happening, with the IP and instance IDs redacted.

2017-11-27 19:50:02,225 - aws-ec2-assign-eip - INFO - Connected to AWS EC2 in us-west-2
2017-11-27 19:50:02,668 - aws-ec2-assign-eip - INFO - Successfully associated Elastic IP x.x.x.105 with i-a77fbf
2017-11-27 19:50:02,064 - aws-ec2-assign-eip - INFO - Connected to AWS EC2 in us-west-2
2017-11-27 19:50:02,909 - aws-ec2-assign-eip - INFO - Successfully associated Elastic IP x.x.x.105 with i-8d7ed1

In reality, i-8d7ed1 won the race and has the EIP associated, while i-a77fbf thinks it got the EIP but has no EIP associated.

One way to fix this would be to loop doing the describe-addresses call until the association with the current instance ID appears. Would you be interested in a PR that does this?

brodygov added a commit to brodygov/aws-ec2-assign-elastic-ip that referenced this issue Nov 28, 2017
Add functionality to check that an EIP was actually associated with the
desired instance. This is important because the EC2 API is eventually
consistent and receiving an HTTP 200 response from the association API
call is not a guarantee that the association will actually happen.

This verification is enabled by default but can be disabled with
--skip-verify.

By default, it will retry up to 10 times, sleeping 0.1 seconds after
each attempt, in order to give the EC2 API plenty of time to achieve
a result.

This allows aws-ec2-assign-elastic-ip to exit nonzero when there is a
race condition that causes EIP association to fail.

Fixes: sebdah#24
@brodygov
Copy link
Author

brodygov commented Jul 5, 2018

Following up here with discussion from the PR:

This issue is actually more like a bug/wart in boto2, which doesn't set the AllowReassociation parameter and doesn't provide any way to explicitly set that parameter to false. Without it, AWS defaults to true, allowing an EIP to be reassociated even if it is currently associated with another instance.

ab/boto@239013b is a start at fixing this for boto 2.

@kylemacfarlane
Copy link

I think the billing issue I said could theoretically exist in the PR thread does already exist.

Imagine two instances fire up due to a scaling event and both have a cron to check they have an Elastic IP every minute. Once booted they will both race for the same Elastic IP as shown in @brodygov's log.

With an EC2-Classic account (pre-2014) only one instance will manage to associate and the other will error. This will incur a $0.10 charge. In the second minute the second instance will associate and incur another $0.10 charge. Total charges are $0.20.

With an EC2-VPC-only account (2014+) both instances could successfully associate in the first minute and incur $0.20 of charges. In the second minute the instance that was stolen from will associate again and incur another $0.10 charge. Total charges are $0.30.

Not a big difference but it escalates quickly. If the bad luck was repeated with 10 instances it would take 10 minutes to properly associate every instance and the difference by then is $1.00 vs $5.50.

I would definitely upgrade to boto 3 and explicitly set AllowReassociation to False for everyone.

If requested features that would require reassociation (e.g. #13 or #19) are ever implemented then they should be done so carefully.

@wilerson
Copy link

wilerson commented Jul 11, 2018

As this issue affected my use case, I went ahead and made a PR - If #13 or #19 are ever implemented, the implementers could add an argument for AllowReassociation.

@sebdah
Copy link
Owner

sebdah commented Jul 17, 2018

The above PR is merged and released with 0.8.0. Please reopen this issue if needed.

Thanks @wilerson for the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants