fix: fail deployment when it's aborted #115

danielskinstad · 2024-11-07T09:41:27Z

I didn't know if we wanted to fail on everything except 204, so I added MENDER_ABORTED. If we can go to failure on everything except a 204, then we could just check for MENDER_FAIL
I also added a check to reboot just in case you abort right after the check in install

lluiscampos

See my comment below. Otherwise looks good

I didn't know if we wanted to fail on everything except 204, so I added MENDER_ABORTED.

I think this is a good idea.

core/src/mender-client.c

vpodzime · 2024-11-26T08:49:21Z

core/src/mender-client.c

@@ -1069,6 +1081,9 @@ mender_client_update_work_function(void) {
                if (!mender_update_module->supports_rollback) {
                    mender_log_warning("Rollback not supported for artifacts of type '%s'", mender_update_module->artifact_type);
                    ret = MENDER_FAIL;
+                } else if (aborted_deployment) {
+                    /* Don't rollback if deployment is aborted */


Wouldn't you as a user expect a rollback if you abort a deployment that isn't committed yet?

Right - but the code doesn't support an abort after a reboot - there is nothing publishing the status between reboot and commit, and on commit we've said it's too late. My idea is that before we have rebooted, we should not rollback into the other partition, but just go to failure - if not we'll end up in the wrong partition if we e.g. abort the deployment after the install state has published its status and set the pending image

I don't understand, sorry. The code is in a place where we are in the ROLLBACK state, i.e. we are supposed to rollback. And at that place we don't rollback if the deployment was aborted. To me that's wrong. We either should not get into the ROLLBACK in case there's nothing to rollback or we should do a rollback. So, if I understand correctly what you describe, the proper behavior is to not end up in this state in such a case. IOW, either there's something to rollback and we should do a rollback if we get here and the deployment is aborted or there's nothing to rollback and we should not end up here. Or am I still missing something? If so, please describe the sequence of states that will result in a problematic case.

Assume we have MENDER_UPDATE_STATE_DOWNLOAD -> MENDER_UPDATE_STATE_INSTALL -> MENDER_UPDATE_STATE_REBOOT.
On each of these states we publish the status and check if the deployment is aborted. If it's aborted when we enter MENDER_UPDATE_STATE_REBOOT, then what do we want to do? We haven't rebooted yet as this is caught before the reboot callback is called, so we can either go directly to failure - which deviates from the normal flow, but it does make sense since we haven't actually rebooted, and this is what I originally did, or if we follow the normal state transitions we should go the failure route, which will transition to rollback, but indeed, we have nothing to rollback, which is why I checked for an aborted deployment and set ret = MENDER_FAIL in MENDER_UPDATE_STATE_ROLLBACK

If we checked if the status is aborted in MENDER_UPDATE_STATE_COMMIT (which Lluis said was too late, but I don't quite know why it's too late), then it would make sense to do a proper rollback, as we have rebooted into the new partition, and we would then need to perform a rollback

If it's aborted when we enter MENDER_UPDATE_STATE_REBOOT, then what do we want to do? We haven't rebooted yet as this is caught before the reboot callback is called

That brings us to my first reaction to this -- how does it differ from the case where the reboot callback fails? Then we also haven't rebooted yet and that goes to the rollback because that's the failure path, right? I believe the ROLLBACK state is responsible for recognizing that it should do nothing. But not by checking if (aborted_deployment) but by checking if anything needs to be done to rollback. If not, it should be a no-op.

Right - but as of now, it will rollback if I don't check it explicitly, so changing the functionality of rollback seems out of scope for this PR at least, I can remove the check for aborted in rollback, and then we should probably create a separate ticket for a no-op rollback?

Works for me! We still have this in the default update module:

/* no need for a rollback callback because a reboot without image confirmation is a rollback */

so I guess it should just work? Better double-check for sure, but my understanding is that ROLLBACK does nothing and then it goes to ROLLBACK_REBOOT which reboots the device and goes to VERIFY_ROLLBACK_REBOOT and then to FAILURE. So if the deployment abortion happens as you described above, it will simply reboot and follow the failure path. No big harm done except that the reboot could be skipped which is what we can fix/improve.

A double free can occur in `mender_flash_abort_deployment`, so call `FREE_AND_NULL` Changelog: None Ticket: None Signed-off-by: Daniel Skinstad Drabitzius <[email protected]>

vpodzime

Just one last nitpick. Thanks for your patience here! It will be interesting to see the integration tests for this 😉 😁

core/src/mender-client.c

Changelog: Title Ticket: MEN-7693 Signed-off-by: Daniel Skinstad Drabitzius <[email protected]>

mender-test-bot · 2024-11-29T10:01:40Z

Merging these commits will result in the following changelog entries:

Changelogs

mender-mcu (abort-deployment)

New changes in mender-mcu since main:

Bug Fixes

fail deployment when it's aborted
(MEN-7693)

danielskinstad requested review from lluiscampos and vpodzime November 7, 2024 09:41

lluiscampos requested changes Nov 7, 2024

View reviewed changes

core/src/mender-client.c Outdated Show resolved Hide resolved

vpodzime reviewed Nov 7, 2024

View reviewed changes

core/src/mender-client.c Outdated Show resolved Hide resolved

danielskinstad force-pushed the abort-deployment branch from 20e2f05 to 65f9994 Compare November 7, 2024 14:07

danielskinstad requested review from lluiscampos and vpodzime November 7, 2024 14:10

lluiscampos approved these changes Nov 8, 2024

View reviewed changes

vpodzime requested changes Nov 11, 2024

View reviewed changes

core/src/mender-client.c Outdated Show resolved Hide resolved

core/src/mender-client.c Outdated Show resolved Hide resolved

danielskinstad force-pushed the abort-deployment branch from 65f9994 to 9703d03 Compare November 11, 2024 10:07

danielskinstad force-pushed the abort-deployment branch 2 times, most recently from bc3ce0d to d37607b Compare November 25, 2024 19:29

danielskinstad requested a review from vpodzime November 26, 2024 08:34

vpodzime reviewed Nov 26, 2024

View reviewed changes

fix: double free in mender_flash_abort_deployment

da6f3d8

A double free can occur in `mender_flash_abort_deployment`, so call `FREE_AND_NULL` Changelog: None Ticket: None Signed-off-by: Daniel Skinstad Drabitzius <[email protected]>

danielskinstad force-pushed the abort-deployment branch 2 times, most recently from dd862e0 to c2c628d Compare November 29, 2024 09:20

danielskinstad requested a review from vpodzime November 29, 2024 09:20

vpodzime approved these changes Nov 29, 2024

View reviewed changes

core/src/mender-client.c Outdated Show resolved Hide resolved

fix: fail deployment when it's aborted

69445a0

Changelog: Title Ticket: MEN-7693 Signed-off-by: Daniel Skinstad Drabitzius <[email protected]>

danielskinstad force-pushed the abort-deployment branch from c2c628d to 69445a0 Compare November 29, 2024 10:01

danielskinstad merged commit fd65e00 into mendersoftware:main Nov 29, 2024
2 checks passed

danielskinstad deleted the abort-deployment branch November 29, 2024 10:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fail deployment when it's aborted #115

fix: fail deployment when it's aborted #115

danielskinstad commented Nov 7, 2024

lluiscampos left a comment

vpodzime Nov 26, 2024

danielskinstad Nov 26, 2024

vpodzime Nov 26, 2024

danielskinstad Nov 26, 2024

vpodzime Nov 27, 2024

danielskinstad Nov 27, 2024

vpodzime Nov 27, 2024

vpodzime left a comment

mender-test-bot commented Nov 29, 2024 •

edited by jira bot

Loading

fix: fail deployment when it's aborted #115

fix: fail deployment when it's aborted #115

Conversation

danielskinstad commented Nov 7, 2024

lluiscampos left a comment

Choose a reason for hiding this comment

vpodzime Nov 26, 2024

Choose a reason for hiding this comment

danielskinstad Nov 26, 2024

Choose a reason for hiding this comment

vpodzime Nov 26, 2024

Choose a reason for hiding this comment

danielskinstad Nov 26, 2024

Choose a reason for hiding this comment

vpodzime Nov 27, 2024

Choose a reason for hiding this comment

danielskinstad Nov 27, 2024

Choose a reason for hiding this comment

vpodzime Nov 27, 2024

Choose a reason for hiding this comment

vpodzime left a comment

Choose a reason for hiding this comment

mender-test-bot commented Nov 29, 2024 • edited by jira bot Loading

Changelogs

mender-mcu (abort-deployment)

Bug Fixes

mender-test-bot commented Nov 29, 2024 •

edited by jira bot

Loading