auto-provisioning: improve behavior with power cuts #217

jsrc27 · 2025-01-30T01:36:08Z

Some users when auto-provisioning, power down the system once they see the auto-provisioning service exit successfully. However, in some cases this can cause the files created from the provisioning process to not be fully written to disk, resulting in 0-byte files upon next boot.

Therefore, add an explicit sync after the files are written to improve the behavior of the auto-provisioning in these cases.

Related-to: TOR-3743

EdTheBearded · 2025-01-30T12:31:28Z

Hey @jsrc27!
It seems that this sync would help on this scenario where the power cut happens because the operator is waiting for a prompt to power off the device.
But thinking broadly on improving this for power cuts, couldn't we add some hash verification on those files on check_provisioning_status? So that we know that the files are there and are correct? Maybe even add a 'flag' file after we confirm this (on the next boot), just so we're not comparing hashes everytime?

Something like:

for i in $(unzip -Z1 credentials.zip); do
    orig=$(unzip -p credentials.zip $i | md5sum | cut -d' ' -f1)
    [ -f "$i" ] && file=$(md5sum $i | cut -d' ' -f1) || file=""
    [ "$orig" != "$file" ] && return
done

rborn-tx · 2025-01-30T15:42:45Z

Related to @EdTheBearded's comment on making the whole operation generally more robust, another approach we could consider is relying on the atomicity of the rename operation. Basically, we could simply create a new version of the /var/sota/import directory, say a /var/sota/import.tmp, do a sync and then (forcefully) rename /var/sota/import.tmp -> /var/sota/import maybe followed by another sync.

In practice this would probably mean just rewriting write_credentials() to something like this:

write_credentials() {
    log "Updating device credentials"

    rm -Rf ${SOTA_CRED_DIR}.tmp && mkdir -p ${SOTA_CRED_DIR}.tmp
    if ! unzip device.zip -d ${SOTA_CRED_DIR}.tmp >/dev/null; then
        exit_error "Failed extracting credentials file"
    fi
    sync
    mv -f ${SOTA_CRED_DIR}.tmp ${SOTA_CRED_DIR}

    rm -rf $SOTA_BASE_DIR/sql.db
    rm -rf $CONFIG_FILE
    sync
}

An even more robust method would be to do the same with the whole /var/sota directory (I mean create a new version of the directory and do the rename as the last step), but I don't think we need to go that far.

jsrc27 · 2025-01-30T17:55:22Z

@EdTheBearded Not sure if I fully understood your feedback. I have some questions for clarification.

First of all did you mean device.zip instead of credentials.zip? We don't use any credentials.zip anywhere in this script.

Also did you mean to suggest adding this logic to write_credentials instead of check_provisioning_status? Because in check_provisioning_status the device.zip hasn't been downloaded yet from the server. Or did I misunderstand what you meant?

EdTheBearded · 2025-01-30T17:59:37Z

@EdTheBearded Not sure if I fully understood your feedback. I have some questions for clarification.

First of all did you mean device.zip instead of credentials.zip? We don't use any credentials.zip anywhere in this script.

Also did you mean to suggest adding this logic to write_credentials instead of check_provisioning_status? Because in check_provisioning_status the device.zip hasn't been downloaded yet from the server. Or did I misunderstand what you meant?

Yeah, I meant device.zip. I wrote with credentials.zip because that was a zip file I had in hand to test the output.

No, I meant on check_provisioning_status, since this is what decides whether the script will run or not. In the case that we have the zip file and the hashes match, we exit early, if not we just keep running and (re)do the provisioning.

But as @rborn-tx suggested, maybe doing all this in a .tmp folder and as a last step renaming it is a bit cleaner.

jsrc27 · 2025-01-30T18:24:13Z

No, I meant on check_provisioning_status, since this is what decides whether the script will run or not. In the case that we have the zip file and the hashes match, we exit early, if not we just keep running and (re)do the provisioning.

I don't think that would ever realistically be the case no? The device.zip gets downloaded during register_device into a temp directory in /tmp. If the device rebooted then /tmp would have been cleared and the zip file would be lost.

For device.zip to exist during check_provisioning_status, the service would need to have been manually started/restarted while device.zip is still in /tmp somehow. Which probably means someone is just messing with the process while it's occurring, which is difficult to account for.

So we'd be checking for the case where device.zip somehow exists and auto-provisioning.json still exists and was not removed from a prior successful run of auto-provisioning. Which would be weird cause how do we have existing certs and auto-provisioning.json still exists, but we have cert files installed on the device that do not match what we have in device.zip. I just have a hard time imagining the scenario this would be checking for.

It would make sense if we want to do the hash verification after we unzip just for extra robustness I guess.

EdTheBearded · 2025-01-30T18:52:19Z

No, I meant on check_provisioning_status, since this is what decides whether the script will run or not. In the case that we have the zip file and the hashes match, we exit early, if not we just keep running and (re)do the provisioning.

I don't think that would ever realistically be the case no? The device.zip gets downloaded during register_device into a temp directory in /tmp. If the device rebooted then /tmp would have been cleared and the zip file would be lost.

For device.zip to exist during check_provisioning_status, the service would need to have been manually started/restarted while device.zip is still in /tmp somehow. Which probably means someone is just messing with the process while it's occurring, which is difficult to account for.

So we'd be checking for the case where device.zip somehow exists and auto-provisioning.json still exists and was not removed from a prior successful run of auto-provisioning. Which would be weird cause how do we have existing certs and auto-provisioning.json still exists, but we have cert files installed on the device that do not match what we have in device.zip. I just have a hard time imagining the scenario this would be checking for.

It would make sense if we want to do the hash verification after we unzip just for extra robustness I guess.

Or we download device.zip somewhere else. I did not provide a full solution on my comment, just an idea and an example code of hash verification.
I'm certain a bunch of other details would need to be sorted as well.

But either way, @rborn-tx suggestion makes more sense.

jsrc27 · 2025-01-31T01:29:16Z

V2 submitted

rborn-tx

Raising some other points.

recipes-sota/auto-provisioning/auto-provisioning/auto-provisioning.sh

jsrc27 · 2025-01-31T23:47:55Z

V3 submitted

rborn-tx · 2025-02-03T21:07:50Z

@jsrc27 Would you like to update the commit message before merging this?

Some users when auto-provisioning, power down the system once they see the auto-provisioning service exit successfully. However, in some cases this can cause the files created from the provisioning process to not be fully written to disk, resulting in 0-byte files upon next boot. Therefore, add some explicit syncs to improve this behavior. Also make use of a temp directory to improve the overall atomicity of the process. Related-to: TOR-3743 Signed-off-by: Jeremias Cordoba <[email protected]>

jsrc27 · 2025-02-04T20:34:05Z

Commit message updated

jsrc27 requested review from EdTheBearded and rborn-tx January 30, 2025 01:36

jsrc27 force-pushed the TOR-3743 branch 2 times, most recently from b7e59ab to 46ceffb Compare January 31, 2025 01:26

jsrc27 force-pushed the TOR-3743 branch from 46ceffb to 2beed82 Compare January 31, 2025 01:31

EdTheBearded previously approved these changes Jan 31, 2025

View reviewed changes

rborn-tx reviewed Jan 31, 2025

View reviewed changes

recipes-sota/auto-provisioning/auto-provisioning/auto-provisioning.sh Show resolved Hide resolved

recipes-sota/auto-provisioning/auto-provisioning/auto-provisioning.sh Outdated Show resolved Hide resolved

jsrc27 dismissed EdTheBearded’s stale review via 7ee5aad January 31, 2025 23:45

jsrc27 force-pushed the TOR-3743 branch from 2beed82 to 7ee5aad Compare January 31, 2025 23:45

rborn-tx approved these changes Feb 1, 2025

View reviewed changes

jsrc27 force-pushed the TOR-3743 branch from 7ee5aad to 1f2c87f Compare February 4, 2025 20:33

rborn-tx merged commit a88fec4 into torizon:scarthgap-7.x.y Feb 5, 2025

This was referenced Feb 5, 2025

auto-provisioning: improve behavior with power cuts (kirkstone) #225

Open

auto-provisioning: improve behavior with power cuts (master) #226

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auto-provisioning: improve behavior with power cuts #217

auto-provisioning: improve behavior with power cuts #217

jsrc27 commented Jan 30, 2025

EdTheBearded commented Jan 30, 2025

rborn-tx commented Jan 30, 2025

jsrc27 commented Jan 30, 2025

EdTheBearded commented Jan 30, 2025

jsrc27 commented Jan 30, 2025

EdTheBearded commented Jan 30, 2025

jsrc27 commented Jan 31, 2025

rborn-tx left a comment

jsrc27 commented Jan 31, 2025

rborn-tx commented Feb 3, 2025

jsrc27 commented Feb 4, 2025

auto-provisioning: improve behavior with power cuts #217

auto-provisioning: improve behavior with power cuts #217

Conversation

jsrc27 commented Jan 30, 2025

EdTheBearded commented Jan 30, 2025

rborn-tx commented Jan 30, 2025

jsrc27 commented Jan 30, 2025

EdTheBearded commented Jan 30, 2025

jsrc27 commented Jan 30, 2025

EdTheBearded commented Jan 30, 2025

jsrc27 commented Jan 31, 2025

rborn-tx left a comment

Choose a reason for hiding this comment

jsrc27 commented Jan 31, 2025

rborn-tx commented Feb 3, 2025

jsrc27 commented Feb 4, 2025