Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subvolume snapshot broken? #8

Open
akrantz01 opened this issue Sep 1, 2020 · 5 comments
Open

subvolume snapshot broken? #8

akrantz01 opened this issue Sep 1, 2020 · 5 comments
Labels
bug Something isn't working
Milestone

Comments

@akrantz01
Copy link

akrantz01 commented Sep 1, 2020

For some reason, I can't take a snapshot of a volume using Subvolume::snapshot. I keep getting a SearchFalied response no matter the path I provide as the destination. Here is the code I am using to take the snapshot:

Subvolume::get("/mnt/images/alpine")
  .unwrap()
  .snapshot("/mnt/containers/test/rootfs", None, None)?;

/mnt is a the root of a btrfs filesystem with a subvolume at images/alpine. When I try it using the btrfs command line, it works with no issue. Even when I call it using std::process::Command it works properly. Am I doing something wrong, or is it within the library?

Edit: I should also add that I am running the program as root, so permissions shouldn't be an issue?

@cezarmathe cezarmathe added the bug Something isn't working label Sep 1, 2020
@cezarmathe
Copy link
Owner

cezarmathe commented Sep 1, 2020

I think this is most likely a library issue. Right now, after issuing the snapshot creation we try to get the subvolume at that path

Ok(Self::from_path(path)?)
and that may be too fast. I think this could be fixed by using an async transaction id here
std::ptr::null_mut(), // should be changed in the future for async support
and then waiting on it(or don't wait, but that will be in the async version of the library).

@cezarmathe cezarmathe added this to the 0.2.0 milestone Sep 12, 2020
@ypoluektovich
Copy link

I managed to reproduce the error (though it might be a different one) and figure out the cause.

First, version 0.1.0 was failing with the error "Could not search B-tree". I don't remember what exactly the reason was, but eventually I had a thought to upgrade to the current master (003fa612bbacad1ef4c54abc6d817f0eab4ec8c2) and that error went away.

Another one appeared instead: "Could not create subvolume", with errno (which I had to extract manually using nix, by the way; it might be a good idea to expose it somehow in this library's errors) having been set to EOPNOTSUPP. This one I managed to trace all the way down to the implementation of the ioctl function used to create snapshots.

The function btrfs_util_create_snapshot from libbtrfsutil accepts a *mut u64 and is supposed to write into it the BTRFS transaction id of the transaction in which the snapshot was created. However...
https://github.com/kdave/btrfs-devel/blob/92477dd1faa650e50bd3bb35a6c0b8d09198cc35/include/uapi/linux/btrfs.h#L40-L41
... the ioctl implementation in the kernel no longer supports that! The feature has been deprecated since 5.7, as evidenced by the linked comment. And, indeed, when I tried to call the generated binding (from btrfsutil-sys 1.2.1) directly without specifying the async, it worked.

@cezarmathe
Copy link
Owner

cezarmathe commented Sep 23, 2021

First, version 0.1.0 was failing with the error "Could not search B-tree". I don't remember what exactly the reason was, but eventually I had a thought to upgrade to the current master (003fa612bbacad1ef4c54abc6d817f0eab4ec8c2) and that error went away.

AFAIK that error was caused by attempting to always use a function that required superuser permissions

Ok(Self::get(path)?)

which uses
errcode = btrfs_util_subvolume_id(path_cstr.as_ptr(), id);

I think this was fixed on master, but there isn't any published version with a fix for this.

The function btrfs_util_create_snapshot from libbtrfsutil accepts a *mut u64 and is supposed to write into it the BTRFS transaction id of the transaction in which the snapshot was created. However...
https://github.com/kdave/btrfs-devel/blob/92477dd1faa650e50bd3bb35a6c0b8d09198cc35/include/uapi/linux/btrfs.h#L40-L41
... the ioctl implementation in the kernel no longer supports that! The feature has been deprecated since 5.7, as evidenced by the linked comment. And, indeed, when I tried to call the generated binding (from btrfsutil-sys 1.2.1) directly without specifying the async, it worked.

This is (or was?) the libbtrfsutil version supported by this library: https://github.com/kdave/btrfs-progs/blob/471b4cf7e3a46222531a895f90228ea164b1b857/libbtrfsutil/btrfsutil.h#L28-L30

Here there is no sign that async transaction IDs are not supported: https://github.com/kdave/btrfs-progs/blob/471b4cf7e3a46222531a895f90228ea164b1b857/libbtrfsutil/btrfsutil.h#L415-L431. Are you saying that support for async transactions has been dropped?

By the way, I looked over the implementation and I might have found an issue.

let transid: u64 = {
let mut transid: u64 = 0;
unsafe_wrapper!({
btrfs_util_create_snapshot(
path_src_cstr.as_ptr(),
path_dest_cstr.as_ptr(),
flags_val,
&mut transid,
qgroup_ptr,
)
})?;
transid
};

unsafe_wrapper!({ btrfs_util_wait_sync(path_dest_cstr.as_ptr(), transid) })?;

The transaction is set to 0 - essentially NULL (might've been better to use std::ptr::null() tbh). After that, the library uses btrfs_util_wait_sync to wait on a NULL async transaction id.

I haven't tested this, but I feel like there's something wrong with that.

Another one appeared instead: "Could not create subvolume", with errno (which I had to extract manually using nix, by the way; it might be a good idea to expose it somehow in this library's errors) having been set to EOPNOTSUPP.

At the moment the library returns the error as reported by libbtrfsutil (enum btrfs_util_error). How do you suggest integrating the errno as well in the reported errors?

Thanks for sharing your findings!

@ypoluektovich
Copy link

ypoluektovich commented Sep 23, 2021

This is (or was?) the libbtrfsutil version supported by this library: https://github.com/kdave/btrfs-progs/blob/471b4cf7e3a46222531a895f90228ea164b1b857/libbtrfsutil/btrfsutil.h#L28-L30

Here there is no sign that async transaction IDs are not supported: https://github.com/kdave/btrfs-progs/blob/471b4cf7e3a46222531a895f90228ea164b1b857/libbtrfsutil/btrfsutil.h#L415-L431. Are you saying that support for async transactions has been dropped?

libbtrfsutil is a wrapper around the actual btrfs code. To do the actual creation of the snapshot, it invokes the kernel code through ioctl: https://github.com/kdave/btrfs-progs/blob/471b4cf7e3a46222531a895f90228ea164b1b857/libbtrfsutil/subvolume.c#L1174

It's that kernel code that's supposed to create the snapshot and return the transaction id by setting a field in the args struct, which is defined here: https://github.com/kdave/btrfs-progs/blob/471b4cf7e3a46222531a895f90228ea164b1b857/ioctl.h#L98. That field is also present in the parallel definition in the btrfs-devel repo: https://github.com/kdave/btrfs-devel/blob/92477dd1faa650e50bd3bb35a6c0b8d09198cc35/include/uapi/linux/btrfs.h#L130

However, if you trace the ioctl invocation:

You'll find the place where EOPNOTSUPP is returned. It seems that libbtrfsutil passes an unsupported flag — namely, the one which asks the kernel code to return the transaction. It's set here: https://github.com/kdave/btrfs-progs/blob/471b4cf7e3a46222531a895f90228ea164b1b857/libbtrfsutil/subvolume.c#L1156-L1157. That flag is no longer supported though (its definition is commented out in btrfs-devel), and you won't find any usages of transid in the ioctl implementation.

The transaction is set to 0 - essentially NULL (might've been better to use std::ptr::null() tbh). After that, the library uses btrfs_util_wait_sync to wait on a NULL async transaction id.

That's not quire correct. You declare a variable and set its value to 0. Then you send the address of that variable to libbtrfsutil — and the address is not zero/nullptr. The libbtrfs function detects that, adds the flag for ioctl and, if the ioctl invocation is successful, uses the pointer to write the transaction id into your variable, thus overwriting the zero that you initialized it with: https://github.com/kdave/btrfs-progs/blob/471b4cf7e3a46222531a895f90228ea164b1b857/libbtrfsutil/subvolume.c#L1178-L1179. After that, it'd be correct to use the newly written value of the variable to wait for the transaction.
So, to summarize, your usage of the feature is correct; it's just that the feature is no longer supported by the new kernel code, even though libbtrfsutil still exposes it. It would work with an older version of the kernel (presumably up to version 5.7).

@ypoluektovich
Copy link

At the moment the library returns the error as reported by libbtrfsutil (enum btrfs_util_error). How do you suggest integrating the errno as well in the reported errors?

Option 1: add a cause to LibError and fill it with nix::errno::Errno (as returned by e. g. Errno::last()) (which conveniently implements Error). So it'd be a "btrfs error such-and-such" caused by "system error such-and-such".

Option 2: make LibError a composite type which has both a btrfs_util_error value and an Errno. Would require a nontrivial Debug implementation, but it shouldn't be too complex.

In any case, be sure to handle the case of errno 0 (aka "no error") :)

Repository owner deleted a comment from Nitesh639 Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants