Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed updater failing to run using CFN stack as a cron job #51

Merged
merged 3 commits into from
May 18, 2021

Conversation

srgothi92
Copy link
Contributor

@srgothi92 srgothi92 commented May 12, 2021

Issue number:
partially #42

Description of changes:
To run updater using stack 3 changes were made:

1. Added minimal required permission to stack to run updater

2. Refactored logs

* Previously, complete HTTP Body of aws-sdk API calls were logged, this change disables
the aws-sdk logging.
* Existing logs were hard to interpret and were formatted incorrectly, this
change fixes those logs which were found incorrect during stack permission update.

3. Fixed a bug where instances are stuck in Drain state

Previously, failure in waiting for instance to become Ok after update would return an error and updater would exit.
However, instance remained in drained state even if it would have become OK after some time.
This change, changes instance state back to active irrespective of success or failure of wait.

Testing done:

  1. Ran updater using CFN template: bottlerocket-ecs-updater.yaml and verified all the instances getting updated.
  2. Ran updater locally and verified all instances getting updated.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@srgothi92 srgothi92 changed the base branch from data-structure-instance to develop May 12, 2021 23:11
@srgothi92 srgothi92 force-pushed the cron-updater branch 2 times, most recently from b985344 to c18de65 Compare May 12, 2021 23:51
updater/aws.go Outdated
}
if len(resp.Failures) != 0 {
return fmt.Errorf("Container instance %s failed to activate: %#v", aws.StringValue(containerInstance), resp.Failures)
return fmt.Errorf("api failures while activating: %#v", resp.Failures)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it looks like in your previous changes to error messages you're dropping "#" from the formatting. Looks like this one got missed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out.
I will just add reason why I removed "#": Adding "#" was making error split in multiple lines in Cloud watch and it was getting hard to read. By removing "#" we only lose field name which anyway did not add much value in my opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, the awslogs driver will split log events on newlines. We can change this to split based on a regular expression with the awslogs-multiline-pattern log option or a strftime pattern with the awslogs-datetime-format log option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would you recommend "%v" or "%#v" with awlogs-multiline-pattern ? I have changed error wrapping to use "%w", but while printing l am still using "%v".

@srgothi92
Copy link
Contributor Author

srgothi92 force-pushed the cron-updater branch from 34ffa8f to 6d25292 14 seconds ago

Addressed @WilboMo comments.

- Effect: Allow
Action:
- 'ecs:ListContainerInstances'
Resource:
- !Sub 'arn:${AWS::Partition}:ecs:${AWS::Region}:${AWS::AccountId}:cluster/${ClusterName}'
# Allows describe container instances to get ec2 instance ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this also allows us to get attributes, which is how we determine that a given container instance is running Bottlerocket

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more details.

Comment on lines 60 to 65
Resource:
- !Sub "arn:aws:ecs:${AWS::Region}:${AWS::AccountId}:container-instance/*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the resource that DescribeContainerInstances operates on is the container instance and since ECS does not enable cross-account actions, the only effect of this resource condition is constraining the permission to the stack region. It would be a better approach to leverage the ecs.cluster condition key and specify the cluster ARN so that the only instances that can be described are in the expected cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Changed as suggested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend removing this Resource since it provides no additional value over the Condition.

stacks/bottlerocket-ecs-updater.yaml Show resolved Hide resolved
updater/aws.go Outdated Show resolved Hide resolved
updater/aws.go Outdated Show resolved Hide resolved
updater/aws.go Outdated Show resolved Hide resolved
updater/aws.go Outdated Show resolved Hide resolved
updater/aws.go Outdated Show resolved Hide resolved
@srgothi92 srgothi92 force-pushed the cron-updater branch 3 times, most recently from 06f240d to f920185 Compare May 17, 2021 16:42
Copy link
Contributor Author

@srgothi92 srgothi92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed Sam's comments.

Comment on lines 60 to 65
Resource:
- !Sub "arn:aws:ecs:${AWS::Region}:${AWS::AccountId}:container-instance/*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend removing this Resource since it provides no additional value over the Condition.

Comment on lines 73 to 74
Resource:
- !Sub "arn:aws:ecs:${AWS::Region}:${AWS::AccountId}:task/${ClusterName}/*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like this Resource doesn't provide any additional constraint over the Condition, and that the Condition is the same one as used for the previous statement. I'd recommend combining this permission into that previous statement and removing the Resource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* Previously, complete HTTP Body of aws sdk api calls were logged, this change disables
the aws sdk logging.
* Exiting logs were hard to interpret and were formatted incorrectly, this
change fixes those logs which were found incorrect during stack permission update.
Previously, when waiting on instance to be ok fails after starting update, error
was returned and updater was exited. However, instance remained in
drained state even if it would have become Ok after some time. This
change, changes instance state back to active irrespective of success
or failure of wait.
@srgothi92
Copy link
Contributor Author

srgothi92 force-pushed the cron-updater branch from f920185 to e3e5213 32 seconds ago

Addressed Sam's review comments and re-tested by deploying new stack.

@@ -47,11 +47,38 @@ Resources:
PolicyDocument:
Version: 2012-10-17
Statement:
# Allows listing all container instances in a cluster
- Effect: Allow
Action:
- 'ecs:ListContainerInstances'
Resource:
Copy link
Contributor Author

@srgothi92 srgothi92 May 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized we can combine this as well with below action. Will try removing and test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on doc here we cannot use cluster condition key with ListContainerInstances. Therefore existing changes looks good.

@srgothi92 srgothi92 merged commit c0c9a98 into develop May 18, 2021
@srgothi92 srgothi92 deleted the cron-updater branch May 19, 2021 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants