Add instance update and reactivation #38

WilboMo · 2021-04-13T04:33:26Z

Issue number:
Addresses #9

Description of changes:
Adds functionality to update drained Bottlerocket instances, reboot updated instances and change state to "ACTIVE" after successful upgrade.

Testing done:
Executed changes against ECS test cluster containing a mix of AL2 instances and 4 Bottlerocket version 1.0.5 hosts with service and non-service tasks present in the cluster:

Output (truncated for brevity):

---[ RESPONSE ]--------------------------------------
HTTP/1.1 200 OK
Connection: keep-alive
Content-Type: application/x-amz-json-1.1
Date: Thu, 15 Apr 2021 16:48:30 GMT
Server: Server
X-Amzn-Requestid: c3f71a22-c586-4c1f-b62b-d9724aefd926


-----------------------------------------------------
2021/04/15 09:48:30 {"CloudWatchOutputConfig":{"CloudWatchLogGroupName":"","CloudWatchOutputEnabled":false},"CommandId":"8865c81a-83ae-4c2e-958d-05dc107ef47e","Comment":"","DocumentName":"AWS-RunShellScript","DocumentVersion":"$DEFAULT","ExecutionElapsedTime":"PT0.484S","ExecutionEndDateTime":"2021-04-15T16:48:25.429Z","ExecutionStartDateTime":"2021-04-15T16:48:25.429Z","InstanceId":"i-##########","PluginName":"aws:runShellScript","ResponseCode":0,"StandardErrorContent":"16:48:25 \u001B[0m\u001B[34m[INFO] \u001B[0mRefreshing updates...\n","StandardErrorUrl":"","StandardOutputContent":"{\n  \"active_partition\": {\n    \"image\": {\n      \"arch\": \"x86_64\",\n      \"variant\": \"aws-ecs-1\",\n      \"version\": \"1.0.7\"\n    },\n    \"next_to_boot\": true\n  },\n  \"available_updates\": [\n    \"1.0.7\",\n    \"1.0.6\",\n    \"1.0.5\",\n    \"1.0.4\",\n    \"1.0.3\",\n    \"1.0.2\",\n    \"1.0.1\",\n    \"1.0.0\"\n  ],\n  \"chosen_update\": null,\n  \"most_recent_command\": {\n    \"cmd_status\": \"Success\",\n    \"cmd_type\": \"refresh\",\n    \"exit_status\": 0,\n    \"stderr\": \"\",\n    \"timestamp\": \"2021-04-15T16:48:25.835131261Z\"\n  },\n  \"staging_partition\": null,\n  \"update_state\": \"Idle\"\n}\n","StandardOutputUrl":"","Status":"Success","StatusDetails":"Success"}
2021/04/15 09:48:30 Instance i-########## updated to Bottlerocket 1.0.7

3 Bottlerocket instances running a Service and 1 with no tasks:

BR - 1
$ apiclient update check
04:27:15 [INFO] Refreshing updates...
{
  "active_partition": {
    "image": {
      "arch": "x86_64",
      "variant": "aws-ecs-1",
      "version": "1.0.7"
    },

BR - 2
$ apiclient update check
04:27:15 [INFO] Refreshing updates...
{
  "active_partition": {
    "image": {
      "arch": "x86_64",
      "variant": "aws-ecs-1",
      "version": "1.0.7"
    },

BR - 3

$ apiclient update check
04:27:59 [INFO] Refreshing updates...
{
  "active_partition": {
    "image": {
      "arch": "x86_64",
      "variant": "aws-ecs-1",
      "version": "1.0.7"
    },
 
BR - 4

$ apiclient update check
04:28:31 [INFO] Refreshing updates...
{
  "active_partition": {
    "image": {
      "arch": "x86_64",
      "variant": "aws-ecs-1",
      "version": "1.0.7"
    },

The previous, updated, instances were then replaced with 1.0.5 instances and a non-service tasks placed into the cluster, running on BR-1:

BR-1 
{
  "active_partition": {
    "image": {
      "arch": "x86_64",
      "variant": "aws-ecs-1",
      "version": "1.0.5"
    },

BR-2
{
  "active_partition": {
    "image": {
      "arch": "x86_64",
      "variant": "aws-ecs-1",
      "version": "1.0.8"
    },
    "next_to_boot": true
  },

 BR-3
{
  "active_partition": {
    "image": {
      "arch": "x86_64",
      "variant": "aws-ecs-1",
      "version": "1.0.8"
    },

BR-4
{
  "active_partition": {
    "image": {
      "arch": "x86_64",
      "variant": "aws-ecs-1",
      "version": "1.0.7"
    },

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

webern

Looks good. Some stream of consciousness while reading.

webern · 2021-04-15T22:05:42Z

updater/aws.go

 	}

 	err := changeInstanceState(u.ecs, aws.String(containerInstance), &u.cluster, "DRAINING")
 	if err != nil {
 		err2 := changeInstanceState(u.ecs, aws.String(containerInstance), &u.cluster, "ACTIVE")
 		if err2 != nil {
-			return fmt.Errorf("failed to undrain: %w", err2.Error())
+			return "", fmt.Errorf("failed to undrain: %w", err2)


The saddest outcome.

webern · 2021-04-15T22:06:41Z

updater/aws.go

@@ -62,20 +62,20 @@ func containsAttribute(attrs []*ecs.Attribute, searchString string) bool {

 // checkAndDrain drains eligible container instances. Container instances are eligible if all running
 // tasks were started by a service, or if there are no running tasks.
-func (u *updater) checkAndDrain(containerInstance string) error {
+func (u *updater) checkAndDrain(ec2ID string, containerInstance string) (string, error) {


Not sure why this function is taking an ec2ID and then returning it.

The EC2ID goes through some filtering to make sure its eligible for update. So the EC2ID It returns is one thats been vetted and cleared to proceed.

But, I don't see ec2ID being used anywhere in the method ? If the function does not return an error doesn't that mean that ec2ID is ready for update ?

webern · 2021-04-15T22:08:57Z

updater/aws.go

+	}
+	log.Printf("Response: %s", *resp.Status)
+	if *resp.Status == "Success" {
+		log.Printf("Updated %s succeessfully", instanceID)


Is this true? Or does resp.Status only mean that the SSM Command invocation succeeded? (Typo succeessfully)

Ah, my mistake. I have another idea that may do the trick.

webern · 2021-04-15T22:10:22Z

updater/aws.go

@@ -218,3 +218,53 @@ func (u *updater) checkCommandOutput(commandID string, instanceIDs []string) ([]
 	}
 	return updateCandidates, nil
 }
+
+func (u *updater) checkUpdateResult(commandID string, instanceID string) error {


Is checkSsmCommandResult a better name, or will this function eventually check more deeply that the update actually occurred?

webern · 2021-04-15T22:13:54Z

updater/aws.go

+	return nil
+}
+
+func (u *updater) getVersion(commandID string, instanceIDs []string) (map[string]string, error) {


Maybe getActiveVersions is a better name? Also maybe explain these functions with doc comments.

webern · 2021-04-15T22:16:04Z

updater/main.go

@@ -78,7 +78,7 @@ func _main() error {
 		return err
 	}

-	instancesToUpdate, err := up.checkCommandOutput(commandID, instances)
+	instancesToUpdate, err := up.checkUpdateStatus(commandID, instances)


The name is still a little weird. This might read better:

instancesToUpdate, err := up.listInstancesToUpdate(commandID, instances)

webern · 2021-04-15T22:17:10Z

updater/main.go

+		if checkedEC2ID == "" {
+			return nil
+		}


I think using this empty string to carry some meaning is a little strange. You could use the ok bool pattern instead.

webern · 2021-04-15T22:19:31Z

updater/main.go

+			}
+
+			updatedVersions, err := up.getVersion(updateStatus, checkedEC2IDAsSlice)
+			for _ec2ID, version := range updatedVersions {


_ec2ID: the underscore isn't needed since you are using the variable.

srgothi92 · 2021-04-15T22:29:36Z

updater/aws.go

@@ -62,20 +62,20 @@ func containsAttribute(attrs []*ecs.Attribute, searchString string) bool {

 // checkAndDrain drains eligible container instances. Container instances are eligible if all running
 // tasks were started by a service, or if there are no running tasks.
-func (u *updater) checkAndDrain(containerInstance string) error {
+func (u *updater) checkAndDrain(ec2ID string, containerInstance string) (string, error) {


But, I don't see ec2ID being used anywhere in the method ? If the function does not return an error doesn't that mean that ec2ID is ready for update ?

srgothi92 · 2021-04-15T22:50:03Z

updater/main.go

+			if err != nil {
+				return err
+			}
+
+			err = up.checkUpdateResult(updateCommandID, checkedEC2ID)
+			if err != nil {
+				return err
+			}
+
+			err = changeInstanceState(up.ecs, aws.String(containerARN), flagCluster, "ACTIVE")
+			if err != nil {
+				return err
+			}
+
+			updateStatus, err := up.sendCommand(checkedEC2IDAsSlice, "apiclient update check")
+			if err != nil {
+				return err
+			}


I am just worried about all these return on err. Do we really like to return if anything failed in between ?
I think we should just:
1 . make sure to mark Instance "Active".
2. continue updating next instance and worry about updating this instance in next program iteration.
Similar things for lot many errors, I think we need to scan all the errors and proceed for non fatal error.

Oh, you're right (regarding checkAndDrain's handling of the EC2ID)!

On your notes regarding returning err

Is handled by the functions/methods themselves.

I like the idea of scanning for fatals, I'll create a new issue to add that in. For now, what about capturing the error output and just printing it to log but not returning an error? This way the information about problems is still provided but it doesn't kill the operation?

I'll create a new issue to add that in.

I will do this, because I would not like to rush on how each error is handled. We would have to think of all the cases that can cause an error and then make a decision accordingly. For now happy path works, so we can get this merged and not block on this.

updater/aws.go

srgothi92 · 2021-04-15T23:19:31Z

updater/main.go

+			return nil
+		}
+
+		checkedEC2IDAsSlice := []string{checkedEC2ID}


Any reason why we are adding checkedEC2ID to a list ? Due this we had to run a loop inside getVersion. We are updating only one instance at a time why do we need a list ?

I did this because some functions like sendCommand expect a slice as an argument. This ensures those functions can use the EC2ID we're operating on throughout the process.

In that case we can just pass checkedEC2IDAsSlice only to sendCommand and we can directly use ec2ID for getVersion. So that we don't have to unnecessarily run a loop when we know there is only one item.

srgothi92 · 2021-04-16T13:56:26Z

updater/aws.go

+		if desiredState == "DRAINING" {
+			err := waitUntilDrained(ecsClient, *containerInstance, cluster)
+			if err != nil {
+				return err
+			}
 		}


formatting issue ?

I'll run format again, but I have it on save now and I did a go fmt before pushing 🤔 .

srgothi92 · 2021-04-16T15:12:32Z

updater/main.go

+			return nil
+		}
+
+		checkedEC2IDAsSlice := []string{checkedEC2ID}


In that case we can just pass checkedEC2IDAsSlice only to sendCommand and we can directly use ec2ID for getVersion. So that we don't have to unnecessarily run a loop when we know there is only one item.

srgothi92

Mostly looks good.

updater/main.go

webern · 2021-04-16T21:06:15Z

updater/aws.go

@@ -236,3 +237,49 @@ func (u *updater) checkSSMCommandOutput(commandID string, instanceIDs []string)
 	}
 	return updateCandidates, nil
 }
+
+// getActiveVersion queries a Bottlerocket instnace to determine the active version in use by the instance.


Suggested change

// getActiveVersion queries a Bottlerocket instnace to determine the active version in use by the instance.

// getActiveVersion queries a Bottlerocket instance to determine the active version in use by the instance.

Thanks for catching that!

samuelkarp · 2021-04-25T21:17:43Z

updater/main.go

+		ec2IDAsSlice := []string{ec2ID}
+		if len(ec2IDAsSlice) != 0 {


This logic here is a bit weird. Since line 103 constructs a slice with exactly one element, line 104 will always evaluate to true.

Aside from removing the if, it's also not necessary to say "AsSlice" to record the type of the variable in the name.

Suggested change

ec2IDAsSlice := []string{ec2ID}

if len(ec2IDAsSlice) != 0 {

ec2IDs := []string{ec2ID}

Thanks. I had considered doing away with the if len(ec2IDAsSlice) for the reasons you pointed out, but was concerned I may not have been thinking of a situation where it would evaluate false and left it for safety. I'll make the suggested change!

samuelkarp · 2021-04-25T21:57:29Z

updater/main.go

+			if err != nil {
+				log.Printf("%#v", err)
+				continue
+			}


There's a pattern here of continue-on-error, which I think is probably not the right pattern. If there are errors in updating, we risk taking out the entire cluster and leaving no remaining capacity.

What can we do in terms of recovery if an error occurs?

If no recovery is possible, should we abort entirely?

If no recovery is possible, how do we notify the cluster administrator that some action needs to be taken, and what that action is?

What can we do in terms of recovery if an error occurs?

I'm not yet sure on what recovery actions we can take for some of these failures. The apiclient update apply -r could be re-attempted on the next pass and it may work just fine. But things like reboot failures or an instance entering a non Ok state may be beyond the scope of what Updater should act on. We could add in logic to attempt restarting it again, describe the instance or take some other action but that would require an escalation in IAM permissions to EC2 that I don't think, off the cuff, is a good idea.

If no recovery is possible, should we abort entirely?

In my first iteration of this bit I had it returning err rather than logging it so that it would abort the entire operation. However, it was pointed out by another commenter that we wanted it to proceed with the other instances to be updated as one failure doesn't necessarily disqualify the rest and that we could just let Updater try again on the next pass.

However, if they were all to fail taking down an entire cluster would be horrible as you rightly point out. I'll think on this but I think it may be the safest option, until better error handling is added later on, that we abort on a failure to preserve the integrity of the cluster as a whole.

If no recovery is possible, how do we notify the cluster administrator that some action needs to be taken, and what that action is?

I had debated adding in additional log statements here in tandem with the functions already defined error message reporting to provide more context on the problem. However, I ended up removing that before posting the PR because I was concerned the messages may be too verbose.

Would it better to remove the log messages from their functions and allow them to be defined in main.go when the functions are called and go through the if err != nil checks? This would allow the error message to be tailor made for the operation and give more clear indication to users where things failed and why.

I'm not yet sure on what recovery actions we can take for some of these failures.

Depending on the type of failure, it may be reasonable to put the container instance back to ACTIVE and continue on, or to continue on directly. For example:

If our call to drain the container instance fails due to something like a permission issue and the instance stays ACTIVE: it's safe to continue forward since no capacity was removed

If our call to check update status fails: it's safe to undrain the instance and continue forward, unless the undrain fails

However, if they were all to fail taking down an entire cluster would be horrible as you rightly point out. I'll think on this but I think it may be the safest option, until better error handling is added later on, that we abort on a failure to preserve the integrity of the cluster as a whole.

I think it's worth thinking about it this way: any failure which results in a reduction in cluster capacity that cannot be successfully restored (undrained) should probably result in us aborting with a clear error message to the administrator.

Would it better to remove the log messages from their functions and allow them to be defined in main.go when the functions are called and go through the if err != nil checks? This would allow the error message to be tailor made for the operation and give more clear indication to users where things failed and why.

It's definitely worth revisiting the logging that we have before the updater is really ready to be used, but it doesn't have to be in the context of this PR.

In general: it's good to embed context into errors. Newer versions of Go (since 1.13 if I'm remembering correctly) have a native error-wrapping feature that allows you to add context. So you can do something like this:

if err != nil { return fmt.Errorf("failed to undrain: %w", err) }

and then have that error bubble all the way up to the top.

However, it can also be useful to add local logs with additional local context. Structured logging implementations (like logrus) enable contextual fields to be added and automatically included:

myLog := logrus.WithField("cluster", cluster) // ... omitted ... if err != nil { myLog.WithError(err).Error("failed to undrain") return fmt.Errorf("failed to undrain: %w", err) }

samuelkarp · 2021-04-25T21:59:15Z

updater/main.go

+			if len(updateResult) == 0 {
+				log.Printf("Update successfully applied")
+			}


What if len(updateResult) != 0? What should happen then?

samuelkarp · 2021-04-25T22:00:32Z

updater/main.go

+			updateResult, _ := u.checkSSMCommandResult(updateStatus, ec2IDAsSlice)
+			if err != nil {


It looks like the error is unintentionally ignored here.

Suggested change

updateResult, _ := u.checkSSMCommandResult(updateStatus, ec2IDAsSlice)

if err != nil {

updateResult, err := u.checkSSMCommandResult(updateStatus, ec2IDAsSlice)

if err != nil {

samuelkarp · 2021-04-25T22:01:16Z

updater/main.go

+			if len(updatedVersion) != 0 {
+				log.Printf("Instance %s updated to Bottlerocket: %s", ec2ID, updatedVersion)
+			}


What should happen if len(updatedVersion) == 0?

samuelkarp · 2021-04-25T22:10:04Z

updater/aws.go

+func (u *updater) getActiveVersion(commandID string, instanceID string) (string, error) {
+	resp, err := u.ssm.GetCommandInvocation(&ssm.GetCommandInvocationInput{
+		CommandId:  aws.String(commandID),
+		InstanceId: aws.String(instanceID),
+	})


It looks like GetCommandInvocation is called twice for the same command ID; once here and once in checkSSMCommandResult. In checkSSMCommandResult, there is further logic applied to map the result into a set of candidates. That looks like a bit of an anti-pattern, since we can instead make a single remote call and have two separate ways of interpreting the results.

There are a couple ways to restructure the code to achieve that, for example:

Instead of having two calls to GetCommandInvocation, have one single place where that is called and have the raw bytes returned such that they can be unmarshalled in two separate places with different interpretation

Instead of having two functions that unmarshal the result differently, have a single function that calls GetCommandInvocation, unmarshalls the results into a struct, and then returns that struct. Then you can interpret the struct appropriately in based on your needs.

This would be a bit of refactoring, but is appropriate to do.

samuelkarp · 2021-04-29T17:17:42Z

updater/main.go

+		err = u.activateInstance(aws.String(containerInstance))
+		if err != nil {
+			log.Printf("instance %s failed to return to ACTIVE after reboot. Aborting update operations.", ec2ID)
+			return fmt.Errorf("%#v", err)


nit: there's no additional error message so there's no need for fmt.Errorf

Suggested change

return fmt.Errorf("%#v", err)

return err

samuelkarp · 2021-04-29T17:20:19Z

updater/main.go

+		updateStatus, err := u.sendCommand(ec2IDs, "apiclient update check")
+		if err != nil {
+			log.Printf("%#v", err)
+		}
+
+		updateResult, err := u.getCommandResult(updateStatus, ec2ID)
+		if err != nil {
+			log.Printf("%#v", err)
+		}


If sending the command "apiclient update check" fails, getCommandResult will probably also fail. Since we've already returned the instance to "ACTIVE", we can continue here.

Suggested change

updateStatus, err := u.sendCommand(ec2IDs, "apiclient update check")

if err != nil {

log.Printf("%#v", err)

}

updateResult, err := u.getCommandResult(updateStatus, ec2ID)

if err != nil {

log.Printf("%#v", err)

}

updateStatus, err := u.sendCommand(ec2IDs, "apiclient update check")

if err != nil {

log.Printf("%#v", err)

continue

}

updateResult, err := u.getCommandResult(updateStatus, ec2ID)

if err != nil {

log.Printf("%#v", err)

continue

}

samuelkarp · 2021-04-29T17:35:58Z

updater/aws.go

-		}
+// getActiveVersion unmarshals GetCommandInvocation output to determine the active version of a Bottlerocket instance.
+// Takes an SSM Command ID and EC2ID as parameters and returns the active version in use.
+func getActiveVersion(commandOutput []byte, instanceID string) (string, error) {


It doesn't look like the instanceID parameter is used

Suggested change

func getActiveVersion(commandOutput []byte, instanceID string) (string, error) {

func getActiveVersion(commandOutput []byte) (string, error) {

Ah, good catch. Thanks!

samuelkarp · 2021-04-29T17:38:55Z

updater/aws.go

-		type updateCheckResult struct {
-			UpdateState string `json:"update_state"`
-		}
+func getUpdateState(commandOutput []byte) bool {


nit: getUpdateState implies that the function would return some sort of state indicator, but instead it's just returning whether an update is available

Suggested change

func getUpdateState(commandOutput []byte) bool {

func isUpdateAvailable(commandOutput []byte) bool {

samuelkarp · 2021-04-29T17:40:00Z

updater/main.go

@@ -96,6 +104,56 @@ func _main() error {
 			continue
 		}
 		log.Printf("Instance %s drained", ec2ID)
+
+		ec2IDs := []string{ec2ID}
+		_, err = u.sendCommand(ec2IDs, "apiclient update apply -r")


nit: short flags are present for convenience when typing into a terminal, but when scripting it's more clear to use long flags

Suggested change

_, err = u.sendCommand(ec2IDs, "apiclient update apply -r")

_, err = u.sendCommand(ec2IDs, "apiclient update apply --reboot")

updater/main.go

samuelkarp · 2021-04-29T21:57:26Z

updater/main.go

+		if isUpdateAvailable(updateResult) {
+			log.Printf("Instance %s did not update. Manual update advised.", ec2ID)
+		} else {
+			log.Printf("Instance %s updated successfully", ec2ID)
+		}
+
+		updatedVersion, err := getActiveVersion(updateResult)
+		if err != nil {
+			log.Printf("%#v", err)
+		}
+		if len(updatedVersion) != 0 {
+			log.Printf("Instance %s updated to Bottlerocket: %s", ec2ID, updatedVersion)
+		} else {
+			log.Printf("Unable to verify updated version. Manual verification of %s required.", ec2ID)
+		}


This logic enables the following scenario to occur:

Instance updates

Update is still available (perhaps because another update's wave is now visible to the instance)

Log "Instance %s did not update. Manual update advised."

Read the active version

Log "Instance %s updated to Bottlerocket: %s"

Or this scenario:

Instance fails to update

Update is still available

Log "Instance %s did not update. Manual update advised."

Read the active version

Log "Instance %s updated to Bottlerocket: %s"

In the first case, even though the instance did update the log claims it didn't. In the second case, even though the instance didn't update, the last log line claims it did.

These should probably not happen. Either the instance updated or it didn't; we shouldn't be able to claim both at the same time. There are a couple ways this could be changed:

Remove the check for isUpdateAvailable, since the question is really "did the version change". Instead, store the original version and compare to the new version.

Adjust the logic such that if a "manual update advised" we skip the second check about the version.

Adjust the language in the second change so it doesn't say "updated to" and instead says something more like "Instance %s is running Bottlerocket %s".

Thanks for the comment! I think PR #47 will enable option 1 nicely (which I think is the right option long term) as we could cast the version to the struct introduced by @srgothi92 in that pull request and use that to compare with during the update process. I could implement the version check in a different way, but unless I'm mistaken, it would make sense to move it into that struct when its merged anyways.

I considered waiting until that PR was merged and revising my PR off that, but in the interest of moving forward I'm going to implement the second and third options you lay out, add a TODO and open an issue to put the version comparison in a subsequent PR.

samuelkarp · 2021-04-29T21:59:58Z

updater/aws.go

+	if err != nil {
+		log.Printf("failed to unmarshal command invocation output: %#v", err)
 	}


It looks like the error is ignored here.

Suggested change

if err != nil {

log.Printf("failed to unmarshal command invocation output: %#v", err)

}

if err != nil {

log.Printf("failed to unmarshal command invocation output: %#v", err)

return "", err

}

samuelkarp · 2021-04-29T22:00:47Z

updater/aws.go

+	switch updateState.UpdateState {
+	case "Available":
+		return true
+	}
+	return false


nit: This can be simplified

Suggested change

switch updateState.UpdateState {

case "Available":

return true

}

return false

return updateState.UpdateState == "Available"

Thats awesome, thanks for pointing that out. Didn't know that'd return bool.

WilboMo · 2021-04-30T01:50:35Z

updater/main.go

+		updateState, empty := isUpdateAvailable(updateResult)
+		if updateState {
+			log.Printf("Instance %s did not update. Manual update advised.", ec2ID)
+			continue
+		} else if empty != "" {
+			log.Printf("Unable to determine update result. Manual verification of %s required", ec2ID)
+			continue
+		} else {
+			log.Printf("Instance %s updated successfully", ec2ID)
+		}


While testing my changes to address @samuelkarp's comments I encountered a scenario where an empty Json would trigger a success result. I added this in to prevent that scenario from giving false positives.

samuelkarp · 2021-04-30T03:26:25Z

updater/aws.go

-		type updateCheckResult struct {
-			UpdateState string `json:"update_state"`
-		}
+func isUpdateAvailable(commandOutput []byte) (bool, string) {


A string result here is very strange. An error is a common mechanism for communicating failure.

Alternately, success and failure can be communicated with a bool (conventionally, an ok variable).

Suggested change

func isUpdateAvailable(commandOutput []byte) (bool, string) {

func isUpdateAvailable(commandOutput []byte) (bool, error) {

I originally used error but was concerned using it in an elif statement as I had intended would be anti-pattern. I reckon I overthought that a bit as I don't think I even need to use it in an elif for that to work. Thanks for the comment!

samuelkarp · 2021-04-30T03:33:30Z

updater/aws.go

-			updateCandidates = append(updateCandidates, v)
-		}
+// getActiveVersion unmarshals GetCommandInvocation output to determine the active version of a Bottlerocket instance.
+// Takes an SSM Command ID and EC2ID as parameters and returns the active version in use.


nit: this comment is out of date

Adds functionality to perform Bottlerocket version upgrades and reactivates drained instances after successful upgrade. [Issue: #9]

samuelkarp

LGTM, thanks!

WilboMo force-pushed the DrainInstances branch 2 times, most recently from e44ab4b to 430af79 Compare April 13, 2021 17:08

WilboMo force-pushed the UpgradeBRInstances branch 3 times, most recently from 7880e70 to 75bf459 Compare April 14, 2021 17:02

WilboMo force-pushed the DrainInstances branch from 430af79 to 97ef618 Compare April 14, 2021 21:56

WilboMo force-pushed the UpgradeBRInstances branch from 75bf459 to 8ffad26 Compare April 14, 2021 23:16

WilboMo force-pushed the DrainInstances branch 2 times, most recently from eb94b5e to 1f7371c Compare April 15, 2021 16:20

WilboMo force-pushed the UpgradeBRInstances branch from 8ffad26 to 43421cf Compare April 15, 2021 17:11

WilboMo force-pushed the DrainInstances branch 2 times, most recently from af45757 to e779565 Compare April 15, 2021 18:47

WilboMo force-pushed the UpgradeBRInstances branch from 43421cf to a9b2a7d Compare April 15, 2021 19:31

WilboMo marked this pull request as ready for review April 15, 2021 19:31

WilboMo requested review from samuelkarp and srgothi92 April 15, 2021 19:32

webern approved these changes Apr 15, 2021

View reviewed changes

WilboMo force-pushed the DrainInstances branch from e779565 to f321aa9 Compare April 15, 2021 22:34

srgothi92 reviewed Apr 15, 2021

View reviewed changes

This was referenced Apr 15, 2021

Implement managed package refactor #39

Open

Add better error handling #40

Closed

WilboMo force-pushed the UpgradeBRInstances branch from a9b2a7d to c7ada67 Compare April 16, 2021 00:34

srgothi92 reviewed Apr 16, 2021

View reviewed changes

updater/main.go Outdated Show resolved Hide resolved

WilboMo force-pushed the DrainInstances branch 3 times, most recently from c13d549 to b2d845c Compare April 16, 2021 20:34

WilboMo force-pushed the UpgradeBRInstances branch 2 times, most recently from b5e7a25 to 563c69f Compare April 16, 2021 21:05

webern reviewed Apr 16, 2021

View reviewed changes

WilboMo force-pushed the UpgradeBRInstances branch 2 times, most recently from 0fd5035 to c8ae8be Compare April 16, 2021 21:31

WilboMo force-pushed the DrainInstances branch from b2d845c to db7b592 Compare April 19, 2021 19:00

WilboMo force-pushed the UpgradeBRInstances branch from c8ae8be to 63004c6 Compare April 19, 2021 21:24

srgothi92 approved these changes Apr 19, 2021

View reviewed changes

WilboMo force-pushed the DrainInstances branch from db7b592 to e393a8f Compare April 20, 2021 19:30

WilboMo force-pushed the UpgradeBRInstances branch from 63004c6 to fa949bb Compare April 20, 2021 19:46

WilboMo force-pushed the DrainInstances branch 2 times, most recently from de74fef to 5737773 Compare April 20, 2021 21:08

WilboMo force-pushed the UpgradeBRInstances branch from fa949bb to f6e3ec6 Compare April 20, 2021 21:17

WilboMo force-pushed the DrainInstances branch from 5737773 to c4d0474 Compare April 20, 2021 21:17

WilboMo force-pushed the UpgradeBRInstances branch from f6e3ec6 to b5cbd69 Compare April 20, 2021 21:18

Base automatically changed from DrainInstances to develop April 23, 2021 16:40

WilboMo force-pushed the UpgradeBRInstances branch from b5cbd69 to 6ee06c6 Compare April 23, 2021 21:10

samuelkarp reviewed Apr 25, 2021

View reviewed changes

WilboMo force-pushed the UpgradeBRInstances branch 3 times, most recently from 02929ef to bd290af Compare April 28, 2021 19:32

samuelkarp reviewed Apr 29, 2021

View reviewed changes

WilboMo force-pushed the UpgradeBRInstances branch from bd290af to 258ac82 Compare April 29, 2021 19:25

samuelkarp reviewed Apr 29, 2021

View reviewed changes

WilboMo force-pushed the UpgradeBRInstances branch 2 times, most recently from a0a5cd9 to e755b66 Compare April 30, 2021 01:48

WilboMo commented Apr 30, 2021

View reviewed changes

samuelkarp reviewed Apr 30, 2021

View reviewed changes

Add instance update and reactivation

930fb22

Adds functionality to perform Bottlerocket version upgrades and reactivates drained instances after successful upgrade. [Issue: #9]

WilboMo force-pushed the UpgradeBRInstances branch from e755b66 to 930fb22 Compare April 30, 2021 17:05

samuelkarp approved these changes May 3, 2021

View reviewed changes

WilboMo merged commit b462c32 into develop May 3, 2021

WilboMo deleted the UpgradeBRInstances branch May 3, 2021 17:19

	// getActiveVersion queries a Bottlerocket instnace to determine the active version in use by the instance.
	// getActiveVersion queries a Bottlerocket instance to determine the active version in use by the instance.

	ec2IDAsSlice := []string{ec2ID}
	if len(ec2IDAsSlice) != 0 {
	ec2IDs := []string{ec2ID}

		updateResult, _ := u.checkSSMCommandResult(updateStatus, ec2IDAsSlice)
		if err != nil {

	func getActiveVersion(commandOutput []byte, instanceID string) (string, error) {
	func getActiveVersion(commandOutput []byte) (string, error) {

	func getUpdateState(commandOutput []byte) bool {
	func isUpdateAvailable(commandOutput []byte) bool {

	_, err = u.sendCommand(ec2IDs, "apiclient update apply -r")
	_, err = u.sendCommand(ec2IDs, "apiclient update apply --reboot")

Add instance update and reactivation #38

Add instance update and reactivation #38

Conversation

WilboMo commented Apr 13, 2021 • edited Loading

webern left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srgothi92 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WilboMo Apr 26, 2021 • edited Loading

Choose a reason for hiding this comment

samuelkarp Apr 27, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WilboMo Apr 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samuelkarp left a comment

Choose a reason for hiding this comment

WilboMo commented Apr 13, 2021 •

edited

Loading

WilboMo Apr 26, 2021 •

edited

Loading

samuelkarp Apr 27, 2021 •

edited

Loading

WilboMo Apr 30, 2021 •

edited

Loading