Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the rollback mechanism #102

Open
say-paul opened this issue Jun 7, 2023 · 10 comments
Open

Update the rollback mechanism #102

say-paul opened this issue Jun 7, 2023 · 10 comments

Comments

@say-paul
Copy link
Member

say-paul commented Jun 7, 2023

Currently greenboot rollback is dependent on ostree-finallized-stage.service which is be triggered only on first reboot, after an update is deployed in ostree. So time delayed failure can not trigger any rollback which may hamper certain use cases.Also this helps greenboot to be more closely integrated with the ostree architecture.
It will also reduce dependency on systemd service orchestration.

Example: /usr/lib/greenboot/check.d/required.d/02_watchdog.sh failure will not have any rollback triggered for cases after first reboot, which can happen in an edge scenario.

@say-paul
Copy link
Member Author

say-paul commented Jul 3, 2023

We can leverage the result of rpm-ostree status --json to get the time stamp of the deployments and ordering.

@say-paul
Copy link
Member Author

say-paul commented Jul 3, 2023

There are confusion though as how to determine when an update is actually deployed. as the json seems to only have timestamp of when the update is staged.

@jmarrero
Copy link

jmarrero commented Jul 3, 2023

I think the idea of the status to show the deployment time, is that that is the actual time when the commit is added to the system, however I understand that you might need the actual deployment(finalization) time. I did not find a rpm-ostree or ostree output that shows it. But I might be overlooking something obvious... however, you can take the timeline were the latest deployment was added to /ostree/deploy/fedora/deploy/ for example (replace fedora for your distro.) For example:
ls -la /ostree/deploy/fedora/deploy/

[jmarrero@silverblue deploy]$ ls -la /ostree/deploy/fedora/deploy/
total 16
drwxr-xr-x. 1 root root 1112 Jul  3 11:09 .
drwxr-xr-x. 1 root root   18 Oct  8  2021 ..
drwxr-xr-x. 1 root root  158 Jul  1 20:27 1449077d3cf7a324a331e1a26665e0517d135024c332e48f07c715772fe3809e.0
-rw-r--r--. 1 root root  113 Jul  1 20:34 1449077d3cf7a324a331e1a26665e0517d135024c332e48f07c715772fe3809e.0.origin
drwxr-xr-x. 1 root root  158 May 11 17:59 62c79b40b17284f9897b00aae1f858a56990ccde51997d870765fd2b6a040fab.0
-rw-r--r--. 1 root root  148 May 11 21:45 62c79b40b17284f9897b00aae1f858a56990ccde51997d870765fd2b6a040fab.0.origin
drwxr-xr-x. 1 root root  158 Jul  2 23:01 638741c80aac72e9febb76da08cd5dfa0ca27656827305331b2388c6440dba31.0
-rw-r--r--. 1 root root  113 Jul  3 11:09 638741c80aac72e9febb76da08cd5dfa0ca27656827305331b2388c6440dba31.0.origin
drwxr-xr-x. 1 root root  158 May 11 08:07 65c0a202abe2e80bd09814bd38c71a996fee1ace0ab14f86c0666f8c3de111a5.0
-rw-r--r--. 1 root root  148 May 11 17:15 65c0a202abe2e80bd09814bd38c71a996fee1ace0ab14f86c0666f8c3de111a5.0.origin

Then running stat on the newest deployment:

[jmarrero@silverblue deploy]$ stat 638741c80aac72e9febb76da08cd5dfa0ca27656827305331b2388c6440dba31.0
  File: 638741c80aac72e9febb76da08cd5dfa0ca27656827305331b2388c6440dba31.0
  Size: 158       	Blocks: 0          IO Block: 4096   directory
Device: 0,37	Inode: 68033679    Links: 1
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:root_t:s0
Access: 2023-07-03 11:51:57.117719748 -0400
Modify: 2023-07-02 23:01:38.541359148 -0400
Change: 2023-07-03 11:09:52.043005424 -0400
 Birth: 2023-07-02 23:01:35.357372660 -0400

You can see that the Birth/Modify are the same but Change is when it was added to this directory. I think you could use that Change timestamp on the latest deployment you find in the /ostree directory.

However maybe @cgwalters knows a better way and something I am overlooking.

@cgwalters
Copy link
Contributor

I am not fully following but if the goal is to know a timestamp for when a deployment was created, then it's closer to the birth time right? In theory the deployment directory inode could be modified for other reasons although in practice usually isn't. Note though the birth time may not available on all linux filesystems I believe but may be on the ones we care about.

greenboot perhaps could add xattrs on the deployment directory? Though doing so would require temporarily lifting the immutable bit, which is a bit racy unfortunately...

@cgwalters
Copy link
Contributor

There is also the origin file which is arbitrary metadata associated with a deployment.

@say-paul
Copy link
Member Author

say-paul commented Jul 4, 2023

@cgwalters The goal is actually to calculate the grace period to mark the update as successful and no rollbacck will be triggered post that even if the health check fails. The time needs to be calculated from the moment the system restarts after a commit is staged. Since there can be a gap between rpm-ostree upgrade and reboot I am looking for options to resolve this.
I was looking at the system-update-done.service and ostree-finalize-staged.service but that will be just parsing through the journald which as you suggested is not a great idea.

@say-paul
Copy link
Member Author

say-paul commented Jul 4, 2023

@jmarrero I looked into the your suggested method, I did ostree admin unlock --hotfix and saw the timestamp got updated. Though I dont see any practical implication of doing that but that echos @cgwalters statement of

deployment directory inode could be modified for other reasons

This might require some investigation of what all cases can modify the timestamp, and find ways that it wont hurt greenboot's functionality.

@jmarrero
Copy link

jmarrero commented Jul 6, 2023

Does it need to be on first boot, can't it be when finalization finishes? If so maybe looking at the /boot/ostree entries?
But if greenboot can't add more xattrs maybe we can extend the origin file or deployment metadata to add another entry? Like first-boot-time:

@cgwalters
Copy link
Contributor

I'd be fine to add an xattr upstream in ostree for when a deployment is first booted. I think it'd be a pretty easy change because we already as of lately run a systemd unit on boot.

@say-paul
Copy link
Member Author

@jmarrero @cgwalters
POC PR:say-paul#1

Consolidated Challenges or information that will be useful.

  • Timestamp of when an update is deployed so that the rollback grace period can start from there.
  • To know which commit is older so that we dont get in a race condition of rollbacks (applicable when a system is updated from an unhealthy state)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants