- 
                Notifications
    You must be signed in to change notification settings 
- Fork 144
Composefs deleting deployment #1685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
de10dec    to
    883110d      
    Compare
  
            
          
                crates/lib/src/cli.rs
              
                Outdated
          
        
      | Opt::ConfigDiff => get_etc_diff().await, | ||
|  | ||
| #[cfg(feature = "composefs-backend")] | ||
| Opt::DeleteDeployment { depl_id, delete } => { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this is also an RFE here #1276
| pub bootloader: Bootloader, | ||
| /// The sha256sum of vmlinuz + initrd | ||
| /// Only `Some` for Type1 boot entries | ||
| pub boot_digest: Option<String>, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of String let's use say https://docs.rs/oci-spec/latest/oci_spec/image/struct.Sha256Digest.html for stuff like this?
Though that might be a bit harder with the jsonschema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that's a good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the jsonschema it's a bit tricky
| let Some(depl_to_del) = depl_to_del else { | ||
| anyhow::bail!("Deployment {deployment_id} not found"); | ||
| }; | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a missing check here that refuses to delete the booted one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a delete param passed in to this function for debugging purposes. Might be helpful to just see what all will be deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the param, as it doesn't really make much sense
| grub_dir | ||
| .atomic_write(USER_CFG_STAGED, buffer) | ||
| .with_context(|| format!("Writing to {USER_CFG_STAGED}"))?; | ||
|  | ||
| rustix::fs::fsync(grub_dir.reopen_as_ownedfd().context("Reopening")?).context("fsync")?; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make an atomic_write_fsync helper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also btw there's https://docs.rs/cap-std-ext/latest/cap_std_ext/dirext/trait.CapStdExtDirExt.html#tymethod.atomic_replace_with that supports writing-as-you-go instead of building up the in memory buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use atomic_replace_with
We could make an atomic_write_fsync helper
looking at it again, we don't have any one show write + fsync. It's mostly a couple of writes then a final fsync
| .remove_dir_all(&state_dir) | ||
| .with_context(|| format!("Removing dir {state_dir:?}"))?; | ||
|  | ||
| for sha in diff { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should really be an api in composefs-rs right?
Also bigger picture...I think we want to very clearly separate two things (as libostree does): managing the "GC roots" vs a GC operation.
Deleting a deployment starts with unlinking its GC root: the bootloader entry.
But thereafter we should just invoke a generic "GC" operation which traverses all active roots.
The idea here is we must support being interrupted - and we want to be idempotent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. There are a few failure cases that need to be handled
This should really be an api in composefs-rs right?
The deletion of unreferenced objects? It does make sense for it to be in composefs-rs. There's even a command for it, but right now it's basically a no-op.
Add a command to delete a composefs native deployment Deleting a deployment would mean, deleting the EROFS image, the bootloader entries for that deployment and deleting any objects in the composefs repository that are only referenced by said deployment. Also refactor some functions and add error contexts in some places Signed-off-by: Pragyan Poudyal <[email protected]> composefs-backend: Deleting staged deployment Signed-off-by: Pragyan Poudyal <[email protected]>
Delete the boot entries first, the image second and everything else afterwards. If we fail to delete the boot entry, then there's no point in deleting the image as the boot entry will still show, but there will be no image. We delete the objects at the end, as when we later perform a gc operation and don't find the image that references these objects, we can remove them then. The state directory shouldn't have any effect on boot if the image associated to it doesn't exist. If the staged file /run/composefs/staged-deployment does exist, but we have already deleted the staged image, the finalize service would fail but that wouldn't break anything Signed-off-by: Pragyan Poudyal <[email protected]>
883110d    to
    66d163e      
    Compare
  
    Update the deletion of deployment to only simply delete the bootloader entries related to the deployment and then call a `gc` function, which will just get the difference between the states represented by the bootloader entries and the repository then try to reconcile everything by performing GC operation on the repository. Signed-off-by: Pragyan Poudyal <[email protected]>
66d163e    to
    90b88f9      
    Compare
  
    Add debug logs for whatever is being deleted Remove the `delete` param from `delete_deployment` function Use `atomic_replace_with` instead of writing to a buffer then writing Signed-off-by: Pragyan Poudyal <[email protected]>
Signed-off-by: Pragyan Poudyal <[email protected]>
fce95be    to
    152ef25      
    Compare
  
    | Had some discussions with @allisonkarlitskaya on Matrix on how we'd want to handle gc for a composefs repo. The  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks generally OK, nothing blocking.
| "" | ||
| }; | ||
|  | ||
| tracing::info!("Deleting {kind}deployment '{deployment_id}'"); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor but let's use structured logging for this instead like tracing::info!("Deleting deployment: {deployment_id}", kind = kind) or so
|  | ||
| delete_depl_boot_entries(&depl_to_del, deleting_staged)?; | ||
|  | ||
| composefs_gc().await?; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the GC should be a clearly separate step in general so we can amortize its cost (e.g. "delete 3 deployments, then gc") and also skip the gc ("lazy prune storage later")
| #[fn_error_context::context("Listing EROFS images")] | ||
| fn list_erofs_images(sysroot: &Dir) -> Result<Vec<String>> { | ||
| let images_dir = sysroot | ||
| .open_dir("composefs/images") | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm there isn't an API for this in Repository?
I think what would help here is if this API actually took a Repository instance for now at least - we can reference its fd and read things directly. Same elsewehere
| Hmm it looks like we had a timeout during install in this run only for c9s. https://github.com/bootc-dev/bootc/actions/runs/18873507377/job/53857419924?pr=1685 | 
Add a command to delete a composefs native deployment
Deleting a deployment would mean, deleting the EROFS image, the
bootloader entries for that deployment and deleting any objects in the
composefs repository that are only referenced by said deployment.
Also refactor some functions and add error contexts in some places
Draft PR for now as it requires some more adjustments