-
Notifications
You must be signed in to change notification settings - Fork 176
Put device in maintenance mode if no edge node certs #5328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ce38931 to
c8ec63a
Compare
@eriknordmark shouldn't we restore the certs from TPM instead of recreating them? or this PR is only for devices with no TPM? |
These three certs are not stored in the TPM nvram. I don't know if we have space to add them across all possible TPMs. In any case, the supported way to reinstall EVE is to do a TPM clear. This fix is to make it obvious when that wasn't done by flagging the issue. |
|
@eriknordmark , please, rebase on top of master in order to get Yetus fixes... |
c8ec63a to
62b5d9a
Compare
Done |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #5328 +/- ##
==========================================
+ Coverage 19.52% 20.39% +0.86%
==========================================
Files 19 19
Lines 3021 2314 -707
==========================================
- Hits 590 472 -118
+ Misses 2310 1721 -589
Partials 121 121 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@rene For some reason yetus/golangcilint complains about files which this PR does not touch. Is that due to the new version of golangcilint? |
Yes @eriknordmark , I'm still not sure why unchanged files were checked, but for now we can ignore these.... |
rene
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
62b5d9a to
1b22e03
Compare
1b22e03 to
8e53757
Compare
|
We got job failure with "No space left on device", I thought this was fixed... |
44cb6d6 to
550cc36
Compare
88abdca to
e68fefa
Compare
The Go Tests action doesn't run on our runners yet, that's why we cannot fix out of disk space.... |
To get MaintenanceModeReason_MAINTENANCE_MODE_REASON_EDGE_NODE_CERTS_REFUSED Signed-off-by: eriknordmark <[email protected]>
After a device has been reinstalled but reusing the the device certificate the attempts to publish EdgeNodeCerts will be rejected by the controller for security reasons since they have changed. (They are stored in /persist/certs hence are lost and recreated on a device reinstall.) This makes that unusual condition visible to the user by putting the device in maintenance mode. Signed-off-by: eriknordmark <[email protected]>
For debug->recovertpm dependency Signed-off-by: eriknordmark <[email protected]>
e68fefa to
9d404ac
Compare
Description
After a device has been reinstalled but reusing the the device certificate the attempts to publish EdgeNodeCerts will be rejected by the controller for security reasons since they have changed. (They are stored in /persist/certs hence are lost and recreated on a device reinstall.) This makes that unusual condition visible to the user by putting the device in maintenance mode.
PR dependencies
lf-edge/eve-api#123
How to test and validate this PR
This can be tested using either the qemu setup (make live run) or using a physical device.
For a physical device reasonable steps are:
Without the fix the device will show as Online after it has rebooted (but some subtle functionality will not work).
With the fix in place the device will appear in Maintenance mode in the controller (but with an unknown reason due to controller not knowing about the new enum value).
Once the testing has completed you can ssh into EVE and restore using
mv /persist/*.cert.pem /persist/certs/; sync; reboot
Changelog notes
If a device looses is /persist partition it will automatically create that partition on the next boot. However, that results in recreating some certificates in /persist/certs, and the controller might not accept the new certificates). That typically results in the device being marked as SUSPECT in the controller, which is not very informative.
This fix ensures that this results in a visible maintenance mode instead of some subtle failures to deploy applications with cloud-init user data.
PR Backports
N/A
Checklist
And the last but not least:
check them.
Please, check the boxes above after submitting the PR in interactive mode.