-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add instance update and reactivation #38
Conversation
e44ab4b
to
430af79
Compare
7880e70
to
75bf459
Compare
75bf459
to
8ffad26
Compare
eb94b5e
to
1f7371c
Compare
8ffad26
to
43421cf
Compare
af45757
to
e779565
Compare
43421cf
to
a9b2a7d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Some stream of consciousness while reading.
updater/aws.go
Outdated
} | ||
|
||
err := changeInstanceState(u.ecs, aws.String(containerInstance), &u.cluster, "DRAINING") | ||
if err != nil { | ||
err2 := changeInstanceState(u.ecs, aws.String(containerInstance), &u.cluster, "ACTIVE") | ||
if err2 != nil { | ||
return fmt.Errorf("failed to undrain: %w", err2.Error()) | ||
return "", fmt.Errorf("failed to undrain: %w", err2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The saddest outcome.
updater/aws.go
Outdated
@@ -62,20 +62,20 @@ func containsAttribute(attrs []*ecs.Attribute, searchString string) bool { | |||
|
|||
// checkAndDrain drains eligible container instances. Container instances are eligible if all running | |||
// tasks were started by a service, or if there are no running tasks. | |||
func (u *updater) checkAndDrain(containerInstance string) error { | |||
func (u *updater) checkAndDrain(ec2ID string, containerInstance string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this function is taking an ec2ID and then returning it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The EC2ID goes through some filtering to make sure its eligible for update. So the EC2ID It returns is one thats been vetted and cleared to proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, I don't see ec2ID
being used anywhere in the method ? If the function does not return an error doesn't that mean that ec2ID is ready for update ?
updater/aws.go
Outdated
} | ||
log.Printf("Response: %s", *resp.Status) | ||
if *resp.Status == "Success" { | ||
log.Printf("Updated %s succeessfully", instanceID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true? Or does resp.Status
only mean that the SSM Command invocation succeeded? (Typo succeessfully)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, my mistake. I have another idea that may do the trick.
updater/aws.go
Outdated
@@ -218,3 +218,53 @@ func (u *updater) checkCommandOutput(commandID string, instanceIDs []string) ([] | |||
} | |||
return updateCandidates, nil | |||
} | |||
|
|||
func (u *updater) checkUpdateResult(commandID string, instanceID string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is checkSsmCommandResult
a better name, or will this function eventually check more deeply that the update actually occurred?
updater/aws.go
Outdated
return nil | ||
} | ||
|
||
func (u *updater) getVersion(commandID string, instanceIDs []string) (map[string]string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe getActiveVersions
is a better name? Also maybe explain these functions with doc comments.
updater/main.go
Outdated
@@ -78,7 +78,7 @@ func _main() error { | |||
return err | |||
} | |||
|
|||
instancesToUpdate, err := up.checkCommandOutput(commandID, instances) | |||
instancesToUpdate, err := up.checkUpdateStatus(commandID, instances) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is still a little weird. This might read better:
instancesToUpdate, err := up.listInstancesToUpdate(commandID, instances)
updater/main.go
Outdated
if checkedEC2ID == "" { | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using this empty string to carry some meaning is a little strange. You could use the ok bool
pattern instead.
updater/main.go
Outdated
} | ||
|
||
updatedVersions, err := up.getVersion(updateStatus, checkedEC2IDAsSlice) | ||
for _ec2ID, version := range updatedVersions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ec2ID
: the underscore isn't needed since you are using the variable.
updater/aws.go
Outdated
@@ -62,20 +62,20 @@ func containsAttribute(attrs []*ecs.Attribute, searchString string) bool { | |||
|
|||
// checkAndDrain drains eligible container instances. Container instances are eligible if all running | |||
// tasks were started by a service, or if there are no running tasks. | |||
func (u *updater) checkAndDrain(containerInstance string) error { | |||
func (u *updater) checkAndDrain(ec2ID string, containerInstance string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, I don't see ec2ID
being used anywhere in the method ? If the function does not return an error doesn't that mean that ec2ID is ready for update ?
updater/main.go
Outdated
if err != nil { | ||
return err | ||
} | ||
|
||
err = up.checkUpdateResult(updateCommandID, checkedEC2ID) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
err = changeInstanceState(up.ecs, aws.String(containerARN), flagCluster, "ACTIVE") | ||
if err != nil { | ||
return err | ||
} | ||
|
||
updateStatus, err := up.sendCommand(checkedEC2IDAsSlice, "apiclient update check") | ||
if err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just worried about all these return on err. Do we really like to return if anything failed in between ?
I think we should just:
1 . make sure to mark Instance "Active".
2. continue updating next instance and worry about updating this instance in next program iteration.
Similar things for lot many errors, I think we need to scan all the errors and proceed for non fatal error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're right (regarding checkAndDrain
's handling of the EC2ID)!
On your notes regarding returning err
- Is handled by the functions/methods themselves.
- I like the idea of scanning for fatals, I'll create a new issue to add that in. For now, what about capturing the error output and just printing it to log but not returning an error? This way the information about problems is still provided but it doesn't kill the operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll create a new issue to add that in.
I will do this, because I would not like to rush on how each error is handled. We would have to think of all the cases that can cause an error and then make a decision accordingly. For now happy path works, so we can get this merged and not block on this.
updater/main.go
Outdated
return nil | ||
} | ||
|
||
checkedEC2IDAsSlice := []string{checkedEC2ID} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we are adding checkedEC2ID
to a list ? Due this we had to run a loop inside getVersion
. We are updating only one instance at a time why do we need a list ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this because some functions like sendCommand
expect a slice as an argument. This ensures those functions can use the EC2ID we're operating on throughout the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case we can just pass checkedEC2IDAsSlice
only to sendCommand
and we can directly use ec2ID
for getVersion
. So that we don't have to unnecessarily run a loop when we know there is only one item.
a9b2a7d
to
c7ada67
Compare
updater/aws.go
Outdated
if desiredState == "DRAINING" { | ||
err := waitUntilDrained(ecsClient, *containerInstance, cluster) | ||
if err != nil { | ||
return err | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formatting issue ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll run format again, but I have it on save now and I did a go fmt
before pushing 🤔 .
updater/main.go
Outdated
return nil | ||
} | ||
|
||
checkedEC2IDAsSlice := []string{checkedEC2ID} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case we can just pass checkedEC2IDAsSlice
only to sendCommand
and we can directly use ec2ID
for getVersion
. So that we don't have to unnecessarily run a loop when we know there is only one item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good.
c13d549
to
b2d845c
Compare
b5e7a25
to
563c69f
Compare
updater/aws.go
Outdated
@@ -236,3 +237,49 @@ func (u *updater) checkSSMCommandOutput(commandID string, instanceIDs []string) | |||
} | |||
return updateCandidates, nil | |||
} | |||
|
|||
// getActiveVersion queries a Bottlerocket instnace to determine the active version in use by the instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// getActiveVersion queries a Bottlerocket instnace to determine the active version in use by the instance. | |
// getActiveVersion queries a Bottlerocket instance to determine the active version in use by the instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that!
0fd5035
to
c8ae8be
Compare
c8ae8be
to
63004c6
Compare
63004c6
to
fa949bb
Compare
de74fef
to
5737773
Compare
fa949bb
to
f6e3ec6
Compare
f6e3ec6
to
b5cbd69
Compare
b5cbd69
to
6ee06c6
Compare
updater/main.go
Outdated
ec2IDAsSlice := []string{ec2ID} | ||
if len(ec2IDAsSlice) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic here is a bit weird. Since line 103 constructs a slice with exactly one element, line 104 will always evaluate to true
.
Aside from removing the if
, it's also not necessary to say "AsSlice" to record the type of the variable in the name.
ec2IDAsSlice := []string{ec2ID} | |
if len(ec2IDAsSlice) != 0 { | |
ec2IDs := []string{ec2ID} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I had considered doing away with the if len(ec2IDAsSlice)
for the reasons you pointed out, but was concerned I may not have been thinking of a situation where it would evaluate false and left it for safety. I'll make the suggested change!
updater/main.go
Outdated
if err != nil { | ||
log.Printf("%#v", err) | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a pattern here of continue-on-error, which I think is probably not the right pattern. If there are errors in updating, we risk taking out the entire cluster and leaving no remaining capacity.
- What can we do in terms of recovery if an error occurs?
- If no recovery is possible, should we abort entirely?
- If no recovery is possible, how do we notify the cluster administrator that some action needs to be taken, and what that action is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- What can we do in terms of recovery if an error occurs?
I'm not yet sure on what recovery actions we can take for some of these failures. The apiclient update apply -r
could be re-attempted on the next pass and it may work just fine. But things like reboot failures or an instance entering a non Ok state may be beyond the scope of what Updater should act on. We could add in logic to attempt restarting it again, describe the instance or take some other action but that would require an escalation in IAM permissions to EC2 that I don't think, off the cuff, is a good idea.
- If no recovery is possible, should we abort entirely?
In my first iteration of this bit I had it returning err
rather than logging it so that it would abort the entire operation. However, it was pointed out by another commenter that we wanted it to proceed with the other instances to be updated as one failure doesn't necessarily disqualify the rest and that we could just let Updater try again on the next pass.
However, if they were all to fail taking down an entire cluster would be horrible as you rightly point out. I'll think on this but I think it may be the safest option, until better error handling is added later on, that we abort on a failure to preserve the integrity of the cluster as a whole.
- If no recovery is possible, how do we notify the cluster administrator that some action needs to be taken, and what that action is?
I had debated adding in additional log statements here in tandem with the functions already defined error message reporting to provide more context on the problem. However, I ended up removing that before posting the PR because I was concerned the messages may be too verbose.
Would it better to remove the log messages from their functions and allow them to be defined in main.go
when the functions are called and go through the if err != nil
checks? This would allow the error message to be tailor made for the operation and give more clear indication to users where things failed and why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not yet sure on what recovery actions we can take for some of these failures.
Depending on the type of failure, it may be reasonable to put the container instance back to ACTIVE and continue on, or to continue on directly. For example:
- If our call to drain the container instance fails due to something like a permission issue and the instance stays ACTIVE: it's safe to continue forward since no capacity was removed
- If our call to check update status fails: it's safe to undrain the instance and continue forward, unless the undrain fails
However, if they were all to fail taking down an entire cluster would be horrible as you rightly point out. I'll think on this but I think it may be the safest option, until better error handling is added later on, that we abort on a failure to preserve the integrity of the cluster as a whole.
I think it's worth thinking about it this way: any failure which results in a reduction in cluster capacity that cannot be successfully restored (undrained) should probably result in us aborting with a clear error message to the administrator.
Would it better to remove the log messages from their functions and allow them to be defined in main.go when the functions are called and go through the
if err != nil
checks? This would allow the error message to be tailor made for the operation and give more clear indication to users where things failed and why.
It's definitely worth revisiting the logging that we have before the updater is really ready to be used, but it doesn't have to be in the context of this PR.
In general: it's good to embed context into errors. Newer versions of Go (since 1.13 if I'm remembering correctly) have a native error-wrapping feature that allows you to add context. So you can do something like this:
if err != nil {
return fmt.Errorf("failed to undrain: %w", err)
}
and then have that error bubble all the way up to the top.
However, it can also be useful to add local logs with additional local context. Structured logging implementations (like logrus) enable contextual fields to be added and automatically included:
myLog := logrus.WithField("cluster", cluster)
// ... omitted ...
if err != nil {
myLog.WithError(err).Error("failed to undrain")
return fmt.Errorf("failed to undrain: %w", err)
}
updater/main.go
Outdated
if len(updateResult) == 0 { | ||
log.Printf("Update successfully applied") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if len(updateResult) != 0
? What should happen then?
updater/main.go
Outdated
updateResult, _ := u.checkSSMCommandResult(updateStatus, ec2IDAsSlice) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the error is unintentionally ignored here.
updateResult, _ := u.checkSSMCommandResult(updateStatus, ec2IDAsSlice) | |
if err != nil { | |
updateResult, err := u.checkSSMCommandResult(updateStatus, ec2IDAsSlice) | |
if err != nil { |
updater/main.go
Outdated
if len(updatedVersion) != 0 { | ||
log.Printf("Instance %s updated to Bottlerocket: %s", ec2ID, updatedVersion) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should happen if len(updatedVersion) == 0
?
updater/aws.go
Outdated
func (u *updater) getActiveVersion(commandID string, instanceID string) (string, error) { | ||
resp, err := u.ssm.GetCommandInvocation(&ssm.GetCommandInvocationInput{ | ||
CommandId: aws.String(commandID), | ||
InstanceId: aws.String(instanceID), | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like GetCommandInvocation
is called twice for the same command ID; once here and once in checkSSMCommandResult
. In checkSSMCommandResult
, there is further logic applied to map the result into a set of candidates. That looks like a bit of an anti-pattern, since we can instead make a single remote call and have two separate ways of interpreting the results.
There are a couple ways to restructure the code to achieve that, for example:
- Instead of having two calls to
GetCommandInvocation
, have one single place where that is called and have the raw bytes returned such that they can be unmarshalled in two separate places with different interpretation - Instead of having two functions that unmarshal the result differently, have a single function that calls
GetCommandInvocation
, unmarshalls the results into a struct, and then returns that struct. Then you can interpret the struct appropriately in based on your needs.
This would be a bit of refactoring, but is appropriate to do.
02929ef
to
bd290af
Compare
updater/main.go
Outdated
err = u.activateInstance(aws.String(containerInstance)) | ||
if err != nil { | ||
log.Printf("instance %s failed to return to ACTIVE after reboot. Aborting update operations.", ec2ID) | ||
return fmt.Errorf("%#v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: there's no additional error message so there's no need for fmt.Errorf
return fmt.Errorf("%#v", err) | |
return err |
updater/main.go
Outdated
updateStatus, err := u.sendCommand(ec2IDs, "apiclient update check") | ||
if err != nil { | ||
log.Printf("%#v", err) | ||
} | ||
|
||
updateResult, err := u.getCommandResult(updateStatus, ec2ID) | ||
if err != nil { | ||
log.Printf("%#v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If sending the command "apiclient update check" fails, getCommandResult
will probably also fail. Since we've already returned the instance to "ACTIVE", we can continue
here.
updateStatus, err := u.sendCommand(ec2IDs, "apiclient update check") | |
if err != nil { | |
log.Printf("%#v", err) | |
} | |
updateResult, err := u.getCommandResult(updateStatus, ec2ID) | |
if err != nil { | |
log.Printf("%#v", err) | |
} | |
updateStatus, err := u.sendCommand(ec2IDs, "apiclient update check") | |
if err != nil { | |
log.Printf("%#v", err) | |
continue | |
} | |
updateResult, err := u.getCommandResult(updateStatus, ec2ID) | |
if err != nil { | |
log.Printf("%#v", err) | |
continue | |
} |
updater/aws.go
Outdated
} | ||
// getActiveVersion unmarshals GetCommandInvocation output to determine the active version of a Bottlerocket instance. | ||
// Takes an SSM Command ID and EC2ID as parameters and returns the active version in use. | ||
func getActiveVersion(commandOutput []byte, instanceID string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like the instanceID
parameter is used
func getActiveVersion(commandOutput []byte, instanceID string) (string, error) { | |
func getActiveVersion(commandOutput []byte) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch. Thanks!
updater/aws.go
Outdated
type updateCheckResult struct { | ||
UpdateState string `json:"update_state"` | ||
} | ||
func getUpdateState(commandOutput []byte) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: getUpdateState
implies that the function would return some sort of state indicator, but instead it's just returning whether an update is available
func getUpdateState(commandOutput []byte) bool { | |
func isUpdateAvailable(commandOutput []byte) bool { |
updater/main.go
Outdated
@@ -96,6 +104,56 @@ func _main() error { | |||
continue | |||
} | |||
log.Printf("Instance %s drained", ec2ID) | |||
|
|||
ec2IDs := []string{ec2ID} | |||
_, err = u.sendCommand(ec2IDs, "apiclient update apply -r") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: short flags are present for convenience when typing into a terminal, but when scripting it's more clear to use long flags
_, err = u.sendCommand(ec2IDs, "apiclient update apply -r") | |
_, err = u.sendCommand(ec2IDs, "apiclient update apply --reboot") |
bd290af
to
258ac82
Compare
updater/main.go
Outdated
if isUpdateAvailable(updateResult) { | ||
log.Printf("Instance %s did not update. Manual update advised.", ec2ID) | ||
} else { | ||
log.Printf("Instance %s updated successfully", ec2ID) | ||
} | ||
|
||
updatedVersion, err := getActiveVersion(updateResult) | ||
if err != nil { | ||
log.Printf("%#v", err) | ||
} | ||
if len(updatedVersion) != 0 { | ||
log.Printf("Instance %s updated to Bottlerocket: %s", ec2ID, updatedVersion) | ||
} else { | ||
log.Printf("Unable to verify updated version. Manual verification of %s required.", ec2ID) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic enables the following scenario to occur:
- Instance updates
- Update is still available (perhaps because another update's wave is now visible to the instance)
- Log "Instance %s did not update. Manual update advised."
- Read the active version
- Log "Instance %s updated to Bottlerocket: %s"
Or this scenario:
- Instance fails to update
- Update is still available
- Log "Instance %s did not update. Manual update advised."
- Read the active version
- Log "Instance %s updated to Bottlerocket: %s"
In the first case, even though the instance did update the log claims it didn't. In the second case, even though the instance didn't update, the last log line claims it did.
These should probably not happen. Either the instance updated or it didn't; we shouldn't be able to claim both at the same time. There are a couple ways this could be changed:
- Remove the check for
isUpdateAvailable
, since the question is really "did the version change". Instead, store the original version and compare to the new version. - Adjust the logic such that if a "manual update advised" we skip the second check about the version.
- Adjust the language in the second change so it doesn't say "updated to" and instead says something more like "Instance %s is running Bottlerocket %s".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment! I think PR #47 will enable option 1 nicely (which I think is the right option long term) as we could cast the version to the struct introduced by @srgothi92 in that pull request and use that to compare with during the update process. I could implement the version check in a different way, but unless I'm mistaken, it would make sense to move it into that struct when its merged anyways.
I considered waiting until that PR was merged and revising my PR off that, but in the interest of moving forward I'm going to implement the second and third options you lay out, add a TODO and open an issue to put the version comparison in a subsequent PR.
updater/aws.go
Outdated
if err != nil { | ||
log.Printf("failed to unmarshal command invocation output: %#v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the error is ignored here.
if err != nil { | |
log.Printf("failed to unmarshal command invocation output: %#v", err) | |
} | |
if err != nil { | |
log.Printf("failed to unmarshal command invocation output: %#v", err) | |
return "", err | |
} |
updater/aws.go
Outdated
switch updateState.UpdateState { | ||
case "Available": | ||
return true | ||
} | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This can be simplified
switch updateState.UpdateState { | |
case "Available": | |
return true | |
} | |
return false | |
return updateState.UpdateState == "Available" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats awesome, thanks for pointing that out. Didn't know that'd return bool
.
a0a5cd9
to
e755b66
Compare
updater/main.go
Outdated
updateState, empty := isUpdateAvailable(updateResult) | ||
if updateState { | ||
log.Printf("Instance %s did not update. Manual update advised.", ec2ID) | ||
continue | ||
} else if empty != "" { | ||
log.Printf("Unable to determine update result. Manual verification of %s required", ec2ID) | ||
continue | ||
} else { | ||
log.Printf("Instance %s updated successfully", ec2ID) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While testing my changes to address @samuelkarp's comments I encountered a scenario where an empty Json would trigger a success result. I added this in to prevent that scenario from giving false positives.
updater/aws.go
Outdated
type updateCheckResult struct { | ||
UpdateState string `json:"update_state"` | ||
} | ||
func isUpdateAvailable(commandOutput []byte) (bool, string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A string result here is very strange. An error
is a common mechanism for communicating failure.
Alternately, success and failure can be communicated with a bool
(conventionally, an ok
variable).
func isUpdateAvailable(commandOutput []byte) (bool, string) { | |
func isUpdateAvailable(commandOutput []byte) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally used error
but was concerned using it in an elif
statement as I had intended would be anti-pattern. I reckon I overthought that a bit as I don't think I even need to use it in an elif for that to work. Thanks for the comment!
updater/aws.go
Outdated
updateCandidates = append(updateCandidates, v) | ||
} | ||
// getActiveVersion unmarshals GetCommandInvocation output to determine the active version of a Bottlerocket instance. | ||
// Takes an SSM Command ID and EC2ID as parameters and returns the active version in use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this comment is out of date
Adds functionality to perform Bottlerocket version upgrades and reactivates drained instances after successful upgrade. [Issue: #9]
e755b66
to
930fb22
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Issue number:
Addresses #9
Description of changes:
Adds functionality to update drained Bottlerocket instances, reboot updated instances and change state to "ACTIVE" after successful upgrade.
Testing done:
Executed changes against ECS test cluster containing a mix of AL2 instances and 4 Bottlerocket version 1.0.5 hosts with service and non-service tasks present in the cluster:
Output (truncated for brevity):
3 Bottlerocket instances running a Service and 1 with no tasks:
The previous, updated, instances were then replaced with 1.0.5 instances and a non-service tasks placed into the cluster, running on BR-1:
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.