Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Contribution] qubes-incremental-backup-poc OR Wyng backup #858

Open
marmarek opened this issue Mar 8, 2015 · 98 comments
Open

[Contribution] qubes-incremental-backup-poc OR Wyng backup #858

marmarek opened this issue Mar 8, 2015 · 98 comments
Labels
C: contrib package community dev This is being developed by a member of the community rather than a core Qubes developer. P: major Priority: major. Between "default" and "critical" in severity. S: needs review Status: needs review. Core devs must review contributed code for potential inclusion in Qubes OS. T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality.

Comments

@marmarek
Copy link
Member

marmarek commented Mar 8, 2015

Community Devs: @v6ak, @tasket
@v6ak's PoC: https://github.com/v6ak/qubes-incremental-backup-poc
@tasket's PoC: https://github.com/tasket/wyng-backup | Status update as of 2022-08-16: #858 (comment)


Reported by joanna on 14 May 2014 10:38 UTC
None

Migrated-From: https://wiki.qubes-os.org/ticket/858


Note to any contributors who wish to work on this issue: Please either ask for details or propose a design before starting serious work on this.

@marmarek marmarek added T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. C: artwork Artwork. You know, the pretty stuff. P: major Priority: major. Between "default" and "critical" in severity. labels Mar 8, 2015
@marmarek

This comment was marked as outdated.

@marmarek marmarek added C: core and removed C: artwork Artwork. You know, the pretty stuff. labels Mar 8, 2015
@marmarek marmarek added this to the Release 2.1 (post R2) milestone Mar 8, 2015
@marmarek marmarek added T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality. and removed T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. labels Mar 8, 2015
@marmarek
Copy link
Member Author

marmarek commented Mar 8, 2015

Comment by joanna on 14 May 2014 10:42 UTC
Discussion here:

https://groups.google.com/d/msg/qubes-devel/Gcrb7KQVcMk/CK-saQU_1HYJ

@marmarek
Copy link
Member Author

@Rudd-O in #1588 (comment)

Honestly, the backup tool should be replaced by something like Duplicity. They get it right, they do incremental backups, they do encryption, and they do arbitrary targets, so it would be extremely easy to accommodate backing up to another VM, or even to damn S3 if we wanted to.

Ideally, the way I would see that working is:

  1. Take snapshot of file system containing VMs. Mount the snapshot somewhere, read-only.
  2. Execute duplicity pointing it to the read-only VM data source, and the target storage destination (VM or local mountpoint)
  3. Destroy snapshot.

This would also allow for backup of VMs that are running, so no need to shut down VMs during backup. I highly recommend we research replacing qvm-backup with Duplicity.

It looks like Duplicity supports making an incremental backup even when part of a file was changed (include diff of file, not full changed files). So indeed it may be good idea to somehow use it.

But mounting VM filesystem in dom0 is a big NO-GO. On the other hand, it may be good idea to simply run duplicity in the VM, and collect its output. Take a look at the linked discussion for an idea how to handle powered off VMs (in short: launch minimal "stubdomain" like system with access to its disk for the backup purpose).

Of course all this requires evaluation whether duplicity correctly handle encryption / validation. To not make things worse than they currently are...

@Rudd-O
Copy link

Rudd-O commented Jun 12, 2016

On 06/12/2016 07:04 PM, Marek Marczykowski-Górecki wrote:

@Rudd-O https://github.com/Rudd-O in #1588 (comment)
#1588 (comment)

Honestly, the backup tool should be replaced by something like
Duplicity. They get it right, they do /incremental/ backups, they
do /encryption/, and they do arbitrary targets, so it would be
extremely easy to accommodate backing up to another VM, or even to
damn S3 if we wanted to.

Ideally, the way I would see that working is:
1. Take snapshot of file system containing VMs. Mount the snapshot
somewhere, read-only.
2. Execute duplicity pointing it to the read-only VM data source,
and the target storage destination (VM or local mountpoint)
3. Destroy snapshot.

This would also allow for backup of VMs that are running, so no
need to shut down VMs during backup. I highly recommend we
research replacing qvm-backup with Duplicity.

It looks like Duplicity supports making an incremental backup even
when part of a file was changed (include diff of file, not full
changed files). So indeed it may be good idea to somehow use it.

But mounting VM filesystem in dom0 is a big NO-GO. On the other hand,
it may be good idea to simply run duplicity in the VM, and collect its
output. Take a look at the linked discussion for an idea how to handle
powered off VMs (in short: launch minimal "stubdomain" like system
with access to its disk for the backup purpose).

No need to mount any VM filesystem in dom0. That is also not how
Duplicity works either.

The way it would work with Duplicity, is a backend would need to be
written. This backend would have two main functions, really:

  1. let Duplicity retrieve the encrypted rdiff database from the backup
    VM, so that Duplicity can locally (in dom0) compute the differences and
    store only the differences.
  2. let Duplicity push the encrypted full / differential backup files, as
    well as the updated encrypted rdiff database, to the backup VM.

See this:

https://bazaar.launchpad.net/~duplicity-team/duplicity/0.7-series/files/head:/duplicity/backends/

Later today, or perhaps tomorrow, I will use my Qubes bombshell-client
to simulate an SSH connection to the backup VM, and see how Duplicity
behaves using this false SSH. That can serve as a good starting point.

Rudd-O
http://rudd-o.com/

@Rudd-O
Copy link

Rudd-O commented Jun 12, 2016

Better still:

https://bazaar.launchpad.net/~duplicity-team/duplicity/0.7-series/view/head:/duplicity/backends/ssh_pexpect_backend.py

Remarkably much like my Qubes Ansible plugin.

Rudd-O
http://rudd-o.com/

@marmarek
Copy link
Member Author

Can you explain more how exactly that would work?

  1. Where and at what level VM data would be retrieved (private.img file, individual files from within?)
  2. Where and at what level incremental data is computed?

Generally dom0 shouldn't be exposed to VM filesystem in any way (regardless of the form: mounting directly, accessing via sftp-like service etc). If incremental data needs to be computed at VM-file level, it should be done in separate VM and dom0 should treat the result as opaque blob. Also during restore.

@Rudd-O
Copy link

Rudd-O commented Jun 12, 2016

On 06/12/2016 08:15 PM, Marek Marczykowski-Górecki wrote:

Can you explain more how exactly that would work?

  1. Where and at what level VM data would be retrieved (|private.img|
    file, individual files from within?)

Duplicity + hypotheticalplugin (from now on, HP) would never retrieve
any VM data. It would only retrieve an encrypted file from the backup
directory within the VM. This encrypted file contains a manifest of
what exists as a backup, as well as a set of rdifflib diff informations.

  1. Where and at what level incremental data is computed?

Incremental data is computed in dom0 using both the rdifflib database
and the actual contents of the snapshotted VM images. This should be
safe because the database is stored encrypted, so the VM cannot tamper
with it and induce a code execution in dom0.

Generally dom0 shouldn't be exposed to VM filesystem in any way
(regardless of the form: mounting directly, accessing via sftp-like
service etc). /If/ incremental data needs to be computed at VM-file
level, it should be done in separate VM and dom0 should treat the
result as opaque blob. Also during restore.

That cannot work because that involves copying the IMG files into the
separate VM, and that would take forever.

Duplicity's backends are designed to make the storage opaque to
Duplicity. The backend retrieves files and uploads files, and those
files are encrypted by the time the data reaches the backend plugin.
The backend does not care about anything else.

Rudd-O
http://rudd-o.com/

@marmarek
Copy link
Member Author

That cannot work because that involves copying the IMG files into the separate VM, and that would take forever.

Not necessary - it can be attached with qvm-block-like mechanism.

Anyway, IIUC you want to backup *.img files in dom0 using duplicity. This may work. My initial (maybe invalid) concern about input validation on restore is still stands.

@Rudd-O
Copy link

Rudd-O commented Jun 13, 2016

Well, look at what I have got here:

(deleted cos buggy, see below for updated version)

I have a (shit) complete Qubes VM backend for duplicity which you can add to your dom0's duplicity backends directory, and then run something like this:

duplicity /var/lib/qubes qubesvm://backupvm/Backupsmountpoint/mydom0/qubesvms

Very early working prototype and building block. Enjoy!

@v6ak
Copy link

v6ak commented Jun 19, 2016

I have some experience with Duplicity:

  • It works, but I remember some problems when backup was interrupted. This might have been fixed, though. Alternatively, it is possible to add some sanity checks.
  • It uses GPG for encryption. I am not sure about the defaults, but it is configurable. Both symmetric and asymmetric modes are supported. For some good reasons, I prefer the asymmetric mode for backups.
  • I believe it is authenticated somehow, but I have checked the options some time ago, so I am not 100% sure.
  • Provided that it is properly authenticated and authentication data are properly checked, I feel the @Rudd-O's approach to be basically correct. (Maybe some extra validation/sanitization might be needed for filenames.) The GPG in dom0 seems to be already part of the TCB, so I hope that it is included in the “critical security updates” provided for dom0 even after EOL of the Fedora version used in dom0.
  • Data are stored in large blocks of roughly (or maybe exactly) the same (configurable) size, except the last one. So, this does not reveal the structure of files.
  • Some metadata are AFAIR stored separately, which theoretically allows guessing average file size. However, this can be considered as a tradeoff between maximum privacy and minimum bandwidth usage. This tradeoff is reasonable for me.
  • Incremental backups reveal the amount of data changed between backups, of course. Again, this is some tradeoff.
  • It uses compression and I am not sure if this is configurable. This introduces some potential side channel. (See CRIME/BREACH attacks for more details.) However, with some reasonable usage, the side channel is likely to be too noisy to be practically useful. Note that the current Qubes backup system seems to have a similar side channel, except that incremental backups make the side channel inherently less noisy.
  • Per-VM backup is a double-edged sword. On one hand, it eliminates some inter-VM attacks. On the other hand, it makes data-size based side channels less noisy. Maybe we could get advantages of both (perform per-VM compression first, then divide to blocks and encrypt), but this seems to be far-from-trivial to properly design and implement.

I would prefer using Duplicity per-VM, so one could exclude caches etc. from the backup. My idea is something like the following process invoked from dom0:

  1. Start a DVM (say BackupDVM) and optionally disable networking there.
  2. Attach the corresponding private.img to the BackupDVM.
  3. Send a per-VM key (stored in dom0 and backed-up separately) to the BackupDVM.
  4. Mount the private.img (well, rather /dev/xvdi) in the BackupDVM.
  5. Run Duplicity (or maybe another similar app) in the BackupDVM. The backup storage would be some custom that sends commands over QubesRPC to BackupStorageVM. The BackupStorageVM would be just a proxy from QubesRPC to a real backup storage for multiple domains.

As you can see, this would require some data stored in dom0 to be also backed up, but this could be handled by some very similar mechanism. (It would probably run directly in dom0 rather than in dom0.)

Some implementation and security notes on those points:

  1. We need the VM's name for multiple purposes. This part seems to be harder than I thought, because there seems to be no proper way of starting a DVM and getting its name other than cloning and modifying /usr/lib/qubes/qfile-daemon-dvm.
  2. For standard VMs, this requires them to be off. For LVM-backed VMs, it is possible to make a CoW clone and backup the clone when running. There is one assumption for the CoW approach: I assume there is some consistency kept on unclean shutdown (e.g. you don't use data=writeback). You see the CoW made when VM is running would look exactly like a drive recovering from unclean shutdown.
  3. Currently not sure if this can be a raw key or if it has to be a password. But this is rather a minor concern.
  4. Sure, this exposes the BackupDVM to the VM's filesystem. However, the BackupDVM is used only for the one VM and then discarded. In case of some kernel vulnerabilities, a malicious VM could alter or remove its own old backups. It could not touch other VM's backups. Moreover, it would have access to the DVM, which might be some concern when multiple DVMs are implemented, especially for some untrusted Whonix-workstation-based VMs. In those cases, just another VM should be used here. (Maybe we could use something like qvm-trim-template, which seems to use a corresponsing template instead of standard DVM template.)
  5. The BackupDVM would contain something similar to the implementation by @Rudd-O , except that this would be rewritten to RPC, because BackupStorageVM shouldn't trust the corresponding BackupDVMs. The BackupStorageVM would prepend some VM-specific path to all the files (maybe some random identifier from a table stored in dom0 in order to prevent leaking VM names to the backup storages), validate the filenames (maybe \A[a-z][a-z0-9.-]*\Z would be enough for Duplicity) and send it to some actual Duplicity backup storage backend (e.g. rsync, SFTP, Amazon S3, Google Drive, …).

@Rudd-O
Copy link

Rudd-O commented Jun 19, 2016

  1. Encryption. The ciphers used by GnuPG suck by default. You have to tell Duplicity about that. Use --gpg-options '--cipher-algo=AES256 --digest-algo=SHA256' to fix that.
  2. Compression is configurable. You can choose the compressor. Command line switch, in fact. I believe it's a GPG command line switch, but the man page of duplicity does talk about it.
  3. There's no need to be attaching and detaching disks to a backup VM or similar monkey business. With this setup, Duplicity can back up your image files to a backup VM painlessly and incrementally.
  4. The philosophy of Duplicity is simplicity -- point it to a source dir and a dest dir (for qubesvm:// see below) and Duplicity will back up the source dir to the dest dir. Presumably we would add value by creating a system of "profiles" (each defined as "what to back up, and where to") where the user can use a GUI or a config file to establish a routine backup plan (perhaps automatic) of one or more of these profiles.
  5. Adapter for Duplicity in Dom0. Here is the latest version of the plugin that enables Duplicity to back up to a VM. Deposit at /usr/lib64/python2.7/site-packages/duplicity/backends/qubesvmbackend.py. Then your destination URL in your duplicity command line can be qubesvm://yourbackupvm/path/to/directory/to/store/backup. I backed up /var/lib/qubes (technically, a mounted snapshot of it) with no problem to a qubesvm://mybackupvm/mnt/disk/qubesbackup directory.
# -*- Mode:Python; indent-tabs-mode:nil; tab-width:4 -*-
#
# This file is NOT part of duplicity.  It is an extension to Duplicity
# that allows the Qubes OS dom0 to back up to a Qubes OS VM.
#
# Duplicity is free software, and so is this file.  This file is
# under the same license as the one distributed by Duplicity.

import os
import pipes
import subprocess

import duplicity.backend

from duplicity.errors import *

BLOCKSIZE = 1048576  # for doing file transfers by chunk
MAX_LIST_SIZE = 10485760  # limited to 10 MB directory listings to avoid problems


class QubesVMBackend(duplicity.backend.Backend):
    """This backend accesses files stored on a Qubes OS VM.  It is intended to
    work as a backed within a Qubes OS dom0 (TCB) for the purposes of using
    Duplicity to back up to a VM.  No special tools are required other than
    this backend file itself installed to your Duplicity backends directory.

    Missing directories on the remote (VM) side will NOT be created.  It is
    an error to try and back up to a VM when the target directory does
    not already exist.

    module URL: qubesvm://vmname/path/to/backup/directory
    """
    def __init__(self, parsed_url):
        duplicity.backend._ensure_urlparser_initialized()
        duplicity.backend.urlparser.uses_netloc.append("qubesvm")
        duplicity.backend.urlparser.clear_cache()
        properly_parsed_url = duplicity.backend.ParsedUrl(parsed_url.geturl())
        duplicity.backend.Backend.__init__(self, properly_parsed_url)
        if properly_parsed_url.path:
            self.remote_dir = properly_parsed_url.path
        else:
            self.remote_dir = '.'
        self.hostname = properly_parsed_url.hostname

    def _validate_remote_filename(self, op, remote_filename):
        if os.path.sep in remote_filename or "\0" in remote_filename:
            raise BackendException(
                ("Qubes VM %s failed: path separators "
                 "or nulls in destination file name %s") % (
                     op, remote_filename))

    def _dd(self, iff=None, off=None):
        cmd = ["dd", "status=none", "bs=%s" % BLOCKSIZE]
        if iff:
            cmd.append("if=%s" % iff)
        if off:
            cmd.append("of=%s" % off)
        return cmd

    def _execute_qvmrun(self, cmd, stdin, stdout):
        subcmd = " ".join(pipes.quote(s) for s in cmd)
        cmd = ["qvm-run", "--pass-io", "--", self.hostname, subcmd]
        return subprocess.Popen(
            cmd,
            stdin=stdin,
            stdout=stdout,
            bufsize=MAX_LIST_SIZE,
            close_fds=True
        )

    def put(self, source_path, remote_filename=None):
        """Transfers a single file to the remote side."""
        if not remote_filename:
            remote_filename = source_path.get_filename()
        self._validate_remote_filename("put", remote_filename)
        rempath = os.path.join(self.remote_dir, remote_filename)
        cmd = self._dd(off=rempath)
        fobject = open(source_path.name, "rb")
        try:
            p = self._execute_qvmrun(cmd,
                                     stdin=fobject,
                                     stdout=open(os.devnull))
        except Exception, e:
            raise BackendException(
                "Qubes VM put of %s (as %s) failed: (%s) %s" % (
                    source_path.name, remote_filename, type(e), e))
        finally:
            fobject.close()
        err = p.wait()
        if err != 0:
            raise BackendException(
                ("Qubes VM put of %s (as %s) failed: writing the "
                 "destination path exited with nonzero status %s") % (
                     source_path.name, remote_filename, err))

    def get(self, remote_filename, local_path):
        """Retrieves a single file from the remote side."""
        self._validate_remote_filename("get", remote_filename)
        rempath = os.path.join(self.remote_dir, remote_filename)
        cmd = self._dd(iff=rempath)
        fobject = open(local_path.name, "wb")
        try:
            p = self._execute_qvmrun(cmd,
                                     stdin=open(os.devnull),
                                     stdout=fobject)
        except Exception, e:
            raise BackendException(
                "Qubes VM get of %s (as %s) failed: (%s) %s" % (
                    remote_filename.name, local_path, type(e), e))
        finally:
            fobject.close()
        err = p.wait()
        if err != 0:
            raise BackendException(
                ("Qubes VM get of %s (as %s) failed: writing the "
                 "destination path exited with nonzero status %s") % (
                     remote_filename.name, local_path, err))

    def _list(self):
        """Lists the contents of the one duplicity dir on the remote side."""
        cmd = ["find", self.remote_dir, "-maxdepth", "1", "-print0"]
        try:
            p = self._execute_qvmrun(cmd,
                                     stdin=open(os.devnull, "rb"),
                                     stdout=subprocess.PIPE)
        except Exception, e:
            raise BackendException(
                "Qubes VM list of %s failed: %s" % (self.remote_dir, e))
        data = p.stdout.read(MAX_LIST_SIZE)
        p.stdout.close()
        err = p.wait()
        if err != 0:
            raise BackendException(
                ("Qubes VM list of %s failed: list command finished "
                "with nonzero status %s" % (self.remote_dir, err)))
        if not data:
            raise BackendException(
                ("Qubes VM list of %s failed: list command returned "
                "empty" % (self.remote_dir,)))
        filename_list = data.split("\0")
        if filename_list[0] != self.remote_dir:
            raise BackendException(
                ("Qubes VM list of %s failed: list command returned a "
                "filename_list for a path different from the remote folder") % (
                    self.remote_dir,))
        filename_list.pop(0)
        if filename_list[-1]:
            raise BackendException(
                ("Qubes VM list of %s failed: list command returned "
                "wrongly-terminated data or listing was too long") % (
                    self.remote_dir,))
        filename_list.pop()
        filename_list = [ p[len(self.remote_dir) + 1:] for p in filename_list ]
        if any(os.path.sep in p for p in filename_list):
            raise BackendException(
                ("Qubes VM list of %s failed: list command returned "
                "a path separator in the listing") % (
                    self.remote_dir,))
        return filename_list

    def delete(self, filename_list):
        """Deletes all files in the list on the remote side."""
        if any(os.path.sep in p or "\0" in p for p in filename_list):
            raise BackendException(
                ("Qubes VM delete of files in %s failed: delete "
                 "command asked to delete a file with a path separator "
                 "or a null character in the listing") % (
                     self.remote_dir,))
        pathlist = [os.path.join(self.remote_dir, p) for p in filename_list]
        cmd = ["rm", "-f", "--"] + pathlist
        try:
            p = self._execute_qvmrun(cmd,
                                     stdin=open(os.devnull, "rb"),
                                     stdout=open(os.devnull, "wb"))
        except Exception, e:
            raise BackendException(
                "Qubes VM delete of files in %s failed: %s" % (
                    self.remote_dir, e))
        err = p.wait()
        if err != 0:
            raise BackendException(
                ("Qubes VM delete of files in %s failed: delete "
                 "command finished with nonzero status %s") % (
                     self.remote_dir, err))


duplicity.backend.register_backend("qubesvm", QubesVMBackend)

Finally: I disagree with the direction of implementation which suggests we need to play Towers of Hanoi with a backup VM and attaching disk images to it. That's entirely unnecessary complication, and it also demands VMs be off for backup purposes. Entirely unnecessary and extremely discouraging of backup procedures.

The three step process is all that is necessary:

  1. Snapshot the container volume of /var/lib/qubes, then mount somewhere.
  2. Invoke duplicity with a wrapper W that sets the appropriate options to back up /mnt/snapshot/var/lib/qubes (I believe there's even an option to trim the mountpoint out of the sent paths, much like tar --strip-components).
  3. Unmount and destroy snapshot.

That is all that is needed.

Of course, the Qubes setting of which VMs to back up, plus options like -x that are supported in qvm-backup, would also be sensible things to support in the wrapper W.

@Rudd-O
Copy link

Rudd-O commented Jul 19, 2016

Update: Duplicity is no good for the process, because differential backups slow to a crawl.

We need to research other stuff, like Attic.

@v6ak
Copy link

v6ak commented Jul 19, 2016

I have not experienced such issue with Duplicity. I have managed to perform incremental backup on roughly 15GiB (after exclusions) of various data (many small files and few large files) in ~2 minutes even on HDD + dm-crypt. (Of course, this depends on size of changes.) Encryption and compression were enabled.

Maybe it skips many files just because of timestamps and diffs just few files.

So, I feel you must be doing something wrong. (No offense.)

What was your scenario? What files did you try to backup? (E.g., dom0's ~ with 1GiB of small files.) What was your drive setup? (E.g., 7200 RPM HDD with dm-crypt with ext4.) How much time did it take? Where do you store the backup? (Other VM using your storage backend? Well, theoretically, this should not matter so much for performance of scanning, as Duplicity caches metadata locally, AFAIK somewhere in ~/.cache. But I am curious.) Did you have the metadata cached? How much data did you add to the backup? (Well, if you backup ~ and you don't exclude ~/.cache, you might add Duplicity metadata to the backup, which could explain both some time and space penalty. I am not sure if Duplicity is smart enough to exclude this automagically.)

@Rudd-O
Copy link

Rudd-O commented Jul 20, 2016

On 07/19/2016 05:45 PM, Vít Šesták wrote:

I have not experienced such issue with Duplicity. I have managed to
perform incremental backup on roughly 15GiB (after exclusions) of
various data (many small files and few large files) in ~2 minutes even
on HDD + dm-crypt. (Of course, this depends on size of changes.)
Encryption and compression were enabled.

My backup of a 60 GB disk image progressed essentially nothing over the
course of five hours. I think there is an exponential slowdown after a
certain size.

Rudd-O
http://rudd-o.com/

@andrewdavidwong andrewdavidwong modified the milestones: Release 4.2, Release TBD Jun 26, 2023
@andrewdavidwong andrewdavidwong removed this from the Release TBD milestone Aug 13, 2023
@mooreye
Copy link

mooreye commented Mar 26, 2024

Once Wyng reaches a stable release, can we expect officially-supported incremental backups? This is a serious usability issue if you have lots of data.

@marmarek marmarek removed their assignment Mar 27, 2024
@tlaurion
Copy link
Contributor

Would love to see support for incremental backups based on the ZFS volume driver too.

@Rudd-O tasket/wyng-backup#110 (comment)

@tlaurion
Copy link
Contributor

Update:

Wyng has entered its final alpha for v0.4. The big changes have been completed, which include:

  • Btrfs and XFS source volumes

  • Authenticated encryption with auth caching

  • Simpler authentication of non-encrypted archives

  • Overall faster detection of changed/unchanged volumes

  • Fast differential receive when using available snapshots

  • Simple switching between multiple archives: Choose any (dest) archive location each time you run Wyng

  • Multiple volumes can now be specified for most Wyng commands

There is also a Qubes integration script which allows backup/restore by VM name. Currently this only works with LVM but reflink support is planned.

@tasket maybe a quick update is needed here?

Or @marmarek, I see you removed assignment to yourself. Maybe you want to say what is missing to go forward under tasket/wyng-backup#102 instead?

@andrewdavidwong
Copy link
Member

I see you removed assignment to yourself.

That's not specific to this issue. It's just a general change in how issue assignments are used. Before, issues were assigned to people who work in certain areas, even if there was no current work being done (including just "someday, maybe"). Now, by contrast, issues are only assigned to devs while those devs are actively working on the issues. You can read more about the new policy here.

@tasket
Copy link

tasket commented Apr 25, 2024

@tlaurion @marmarek Wyng is now in final beta for v0.8 and all features are frozen. Its exhibiting good stability overall, but I have added a caveat to the Readme about lvmthin's need for metadata space since adding Wyng's snapshots on top of Qubes' snapshots will naturally consume more and LVM does not have good defaults or out-of-space handling.

The wyng-util-qubes wrapper for Qubes integration has just gone to v0.9 beta and pushed to main branch, as I recommend using this version now. The new version includes support for both reflink and lvmthin pools; the wrapper can now alias volume names as necessary between the two pool types during restore. This is now generally usable for Qubes users who are comfortable with the command line, and its quite feasible to adapt for a GUI.

This is what a typical backup session from a Btrfs Qubes system looks like:

[me@dom0 ~]$ sudo wyng-util-qubes backup -i --dest qubes-ssh://sshvm:[email protected]/home/user/wyng.backup
wyng-util-qubes v0.9beta rel 20240424

Wyng 0.8beta release 20240423
Encrypted archive 'qubes-ssh://sshvm:[email protected]/home/user/wyng.backup' 
Last updated 2024-04-25 13:22:30.245619 (-04:00)

Preparing snapshots in '/mnt/btrpool/libqubes/'...
  Queuing full scan of import 'wyng-qubes-metadata'
Acquiring deltas.

Sending backup session 20240425-144142:
———————————————————————————————————————————————————
no change |  -  | appvms/banking/private.img
no change |  -  | appvms/dev/private.img
no change |  -  | appvms/mail/private.img
    5.0MB |  2s | appvms/personal/private.img
    0.5MB |  1s | appvms/root-backup/private.img
no change |  -  | appvms/sshvm/private.img
no change |  -  | appvms/sys-vpn2/private.img
   45.8MB | 10s | appvms/untrusted/private.img
no change |  -  | vm-templates/debian-12/private.img
no change |  -  | vm-templates/debian-12/root.img
    0.0MB |  1s | wyng-qubes-metadata
———————————————————————————————————————————————————
 11 volumes, 79428——>51 MB in 15.6 seconds.

This is a restore from a backup session containing two VMs:

[me@dom0 ~]$ sudo python3 wuq restore --session=20240424-123248 --dest qubes-ssh://sshvm:[email protected]/home/user/wyng.backup
wyng-util-qubes v0.9beta rel 20240424

VMs selected: temp1, temp2
Warning:  Restoring to existing VMs will overwrite them.
Continue [y/N]? y
Wyng 0.8beta release 20240423
Encrypted archive 'qubes-ssh://sshvm:[email protected]/home/user/wyng.backup' 
Last updated 2024-04-25 17:32:53.942176 (-04:00)

Receiving volume 'appvms/sys-vpn2/private.img' 20240424-123248
Saving to file '/mnt/btrpool/libqubes/appvms/sys-vpn2/private.img'
OK

Receiving volume 'appvms/mail/private.img' 20240424-123248
Saving to file '/mnt/btrpool/libqubes/appvms/mail/private.img'
OK

Currently, the Qubes default pool for newly created VMs, or whichever one an existing VM resides, is used for restore but you can also specify --pool <name> to override the Qubes default when creating VMs.

@tlaurion
Copy link
Contributor

tlaurion commented Apr 25, 2024

Also, if wyng is to be considered in newer versions of QubesOS, I would recommend thinking about seperating dom0 data from qubes hypervisor. That is a whole separated topic here, but on my side i'm used wyng against a lot of cloned templates and cloned qubes which results in the following:

[user@dom0 ~]
(130)$ du -chs /var/lib/wyng/
719M	/var/lib/wyng/
719M	total
[user@dom0 ~]
$ df -h /
Filesystem                   Size  Used Avail Use% Mounted on
/dev/mapper/qubes_dom0-root   20G   11G  7.8G  58% /
[user@dom0 ~]

To talk about moving to brtfs+ bees dedup here is out of point of course and would be #6476, not here.

But since the caveat of TLVM metadata is touched, I would love to take the opportunity to remind devs that having separated dom0 from vm-pool was one step into the direction of limiting impacts of using TLVM in the first place under Qubes considering that QubesOS uses a lot of sometimes stalling snapshots for back+x and volatile volumes which could have locked the user out, which impacts pool metadata, a lot, also to be discussed under #6476 not here.

All of those could of course be dodged altogether if TLVM was reconsidered as first QubesOS candidate to envvision going into something more practical fitting specialized cloned templates, salting qubes and massive qubes private disk data preventing quick shutdown and all that jazz. But that would be #6476 not here.

So I would just want to remind that using wyng consumes dom0 space for meta-dir (chunks mapping downloaded from the archive qube) and that dom0 LVM is 20GB considering past decisions (it could be within same pool if not TLVM), and that extending TLVM pool metadata requires to steal some of the swap space (manually), while having 20Gb today is quite limitating (cannot instlal multiple templates, be cautious and check dom0 usage) because dom0 should actually be quite static, if dom0 was just dom0 without dom0 keeping external states that should not exist in dom0 in the first place.


@tasket and @marmarek @DemiMarie : amazing work outside of those critical, but constructive points, as usual :)

Currently ongoing: users hacking qubes-backup encryption/authentication to use incremental backup tools:
https://forum.qubes-os.org/t/guide-incremental-backup-using-the-official-backup-system/25792
Let's not permit end users (while still devs) into hacking qubes-backup to do what they actually want (differential backups) for still too long, ok?

... What about writing a grant application to achieve that pressing goal? Yes?

@tasket
Copy link

tasket commented Apr 26, 2024

@tlaurion Just FYI, Wyng has an issue open (109) for reducing metadata footprint. Currently it keeps both a compressed and uncompressed copy of manifests in /var. It does set the compression bit on the uncompressed files, so if /var is on Btrfs or other compressing fs then used space will be reduced somewhat. To be completely honest, the main issue with that issue is the low level of interest in it, otherwise I think it would have been done already.

BTW, you can keep deleting all of the 'manifest' files, which should reduce the footprint by up to 2/3.

What about writing a grant application to achieve that pressing goal? Yes?

Its not something I'm familiar with, and the goal needs to be better-defined. AFAIC, Wyng is fully functional now and the only things a typical user would really miss would be a GUI and perhaps a way to mount archived volumes.

@tlaurion
Copy link
Contributor

tlaurion commented Apr 26, 2024

What about writing a grant application to achieve that pressing goal? Yes?

Its not something I'm familiar with, and the goal needs to be better-defined. AFAIC, Wyng is fully functional now and the only things a typical user would really miss would be a GUI and perhaps a way to mount archived volumes.

@tasket Can you contact me over Matrix /mastodon/QubesOS forum?

@tlaurion
Copy link
Contributor

tlaurion commented May 2, 2024

@marmarek @tasket @rapenne-s should we team up for a grant application? Interest?

@rapenne-s
Copy link

@marmarek @tasket @rapenne-s should we team up for a grant application? Interest?

I'm be happy to work on this

@tlaurion
Copy link
Contributor

tlaurion commented May 4, 2024

@marmarek @tasket @rapenne-s should we team up for a grant application? Interest?

I'm be happy to work on this

Random thoughts for grant application.

What I envision to be done as of now :

@rapenne-s I think you are amazing candidate for documentation and insights on infra optimization.

@marmarek @marmarta @DemiMarie UX and GUI integration is needed. Fill us in with high level requirements?

@tasket of course to tackle wyng-backup wyng-util-qubes work and input other features envisioned missing to reach massive adoption with clear use cases.

@tlaurion : heads and general plumbing, facing NLnet, grant application writeup, validation of proof of work (PoW, normally PR) prior of request for payment (RfP) and project management. Heads integration, possibly using SecureDrop workstation use case to drive this.

@deeplow interest from FPF' SD and interfacing with them with requirements? Goal here would be to have restorable states from network booted environement to pull states from the network and prepare drive content to be ready to use under minutes, hosted on FPF infra.

Maybe we should create discussion on QubesOS forum if there is interest or in a seperate issue? Whether you see fit best.


Notes on grant application process.

Grant work is paid upon PoW for scoped tasks.
Grant application needs high level view of the whys and deliverables, not to so much on the hows. When grant application passes first round, more details need to be given on scoped deliverables and costs, where PoW makes work paid upon validation of reaching scoped task.

So for teaming up here, we need first acceptation (consent) of engagement into doing the work within a year after grant application is accepted. Scoping of general tasks, by whom the work will be done and required approximate funds to be budgetized do such high level tasks, to be then broken down in smaller tasks upon project approval in terms of deliverables paid upon PoW.

@rapenne-s @marmarek @marmarta @deeplow @DemiMarie @tasket : would you be willing to engage into teaming up to expend on the needs of such integration and documentation, and agree on the first step, which is to consent into accomplishing such integration as a goal if such grant application was accepted to fund the work needed?

@tasket
Copy link

tasket commented May 7, 2024

Aside from organizational plans, I've decided that there will have to be a beta5 now. The metadata caching issue @tlaurion pointed out needs to be addressed before v0.8 rc, along with a unicode-handling issue I identified. Fixes are now in the 08wip branch and I expect beta5 to be available within a week.

Session metadata will now be aged-out from /var/wyng before 3 days, by default. This can be controlled using an option like --meta-reduce=on:0 will remove uncompressed metadata immediately; in that case the user should consistently see a ~2/3 reduction of /var usage.

@marmarek
Copy link
Member Author

marmarek commented Jun 8, 2024

@tlaurion here is what I'd like from a backup solution, to be considered a replacement for the current (non-incremental) one:

  1. Backup should be integrity-protected. Attacker with write access to the backup archive should not be able to compromise dom0 on restore (resistance against malicious metadata modification), nor should be able to silently modify data (resistance against malicious data modification). Some attacks in this threat model are not easily avoidable - attacker can break the backup (or simply remove it) making it impossible to restore - that's acceptable risk. Protection against rollback is also likely non-trivial - rollback of a full backup archive is acceptable risk (but, nice to have if it could be detected), but rollback on individual VMs or even blocks should still be detected. See https://www.qubes-os.org/news/2017/04/26/qubes-compromise-recovery/#recovering-from-a-full-qubes-system-compromise for more explanation. The approach with using DispVM to extract data/metadata in such a model is okay (and even desirable).
  2. Backup should be encrypted by default. Any data leaving dom0 should be already encrypted, it shouldn't be necessary to require some external entity to do encryption to ensure backup confidentiality. At the very least the VMs data should be encrypted and their names. But ideally (nice to have), other metadata (like how big each VM is) should be encrypted too. Similar to above, some information leak is unavoidable - for example you can't hide amount of data in total, or amount of changes in each increment - that's okay. There can be an option to disable encryption if one wishes to.
  3. It should be possible to restore all VMs at once on a freshly installed system, with just access to the backup archive and its passphrase. I mean, restoring backup shouldn't require having any extra metadata, keys etc that were created when making the backup.
  4. It should be possible to restore an individual VM without touching others.
  5. Restoring a VM should restore all its metadata (properties, tags, what netvm is used etc)
  6. It would be nice to be able to restore to an older version of a VM, maybe even under a different name (but one can use qvm-clone, so it's easy to do). Nice to have.
  7. It should be possible to restore an archive made in older qubes version into newer qubes version. In other words: if archive format would change in the future, the tool should still support reading the old format
  8. It should be possible to restore into a different storage pool than than the backup was initially created on.
  9. It should be possible to access the data (of individual VMs) without qubes (emergency restore). It can be an instruction how to do that manually (like we have right now), or a tool that works outside of qubes too (doesn't require LVM/btrfs/qubes-specific packages etc).
  10. It should be possible to backup into popular some cloud service (S3, Dropbox, Nextcloud, ...), ideally ("nice to have") without storing the whole backup archive in some intermediate wyng-aware place. So, ideally, backup target accessible with only simple object operations (get, put, list) should work ("nice to have"), but the minimal requirement is that backup archive stored in wyng-aware place (USB disk? local NAS?) can be then synced to some cloud without loosing incremental properties. And similarly restore: nice to have if possible directly from the cloud service, but necessary to work when backup archive is retrieved from the cloud first (for example info a fresh USB disk). Wyng doesn't need to support every possible cloud service itself, but it should be possible to achieve with relatively simple external tool (like, s3cmd).

Those are the main ones on the functional side. Some are optional (marked "nice to have"). Some (if not most) are already satisfied by wyng, but I've written them down anyway.

Some of those apply to Wyng itself, some to its integration with Qubes OS. Lets discuss how we can make this happen :)

@tlaurion
Copy link
Contributor

tlaurion commented Jun 8, 2024

@tlaurion here is what I'd like from a backup solution, to be considered a replacement for the current (non-incremental) one:

  1. Backup should be integrity-protected. Attacker with write access to the backup archive should not be able to compromise dom0 on restore (resistance against malicious metadata modification), nor should be able to silently modify data (resistance against malicious data modification). Some attacks in this threat model are not easily avoidable - attacker can break the backup (or simply remove it) making it impossible to restore - that's acceptable risk. Protection against rollback is also likely non-trivial - rollback of a full backup archive is acceptable risk (but, nice to have if it could be detected), but rollback on individual VMs or even blocks should still be detected. See https://www.qubes-os.org/news/2017/04/26/qubes-compromise-recovery/#recovering-from-a-full-qubes-system-compromise for more explanation. The approach with using DispVM to extract data/metadata in such a model is okay (and even desirable).
  2. Backup should be encrypted by default. Any data leaving dom0 should be already encrypted, it shouldn't be necessary to require some external entity to do encryption to ensure backup confidentiality. At the very least the VMs data should be encrypted and their names. But ideally (nice to have), other metadata (like how big each VM is) should be encrypted too. Similar to above, some information leak is unavoidable - for example you can't hide amount of data in total, or amount of changes in each increment - that's okay. There can be an option to disable encryption if one wishes to.
  3. It should be possible to restore all VMs at once on a freshly installed system, with just access to the backup archive and its passphrase. I mean, restoring backup shouldn't require having any extra metadata, keys etc that were created when making the backup.
  4. It should be possible to restore an individual VM without touching others.
  5. Restoring a VM should restore all its metadata (properties, tags, what netvm is used etc)
  6. It would be nice to be able to restore to an older version of a VM, maybe even under a different name (but one can use qvm-clone, so it's easy to do). Nice to have.
  7. It should be possible to restore an archive made in older qubes version into newer qubes version. In other words: if archive format would change in the future, the tool should still support reading the old format
  8. It should be possible to restore into a different storage pool than than the backup was initially created on.
  9. It should be possible to access the data (of individual VMs) without qubes (emergency restore). It can be an instruction how to do that manually (like we have right now), or a tool that works outside of qubes too (doesn't require LVM/btrfs/qubes-specific packages etc).
  10. It should be possible to backup into popular some cloud service (S3, Dropbox, Nextcloud, ...), ideally ("nice to have") without storing the whole backup archive in some intermediate wyng-aware place. So, ideally, backup target accessible with only simple object operations (get, put, list) should work ("nice to have"), but the minimal requirement is that backup archive stored in wyng-aware place (USB disk? local NAS?) can be then synced to some cloud without loosing incremental properties. And similarly restore: nice to have if possible directly from the cloud service, but necessary to work when backup archive is retrieved from the cloud first (for example info a fresh USB disk). Wyng doesn't need to support every possible cloud service itself, but it should be possible to achieve with relatively simple external tool (like, s3cmd).

Those are the main ones on the functional side. Some are optional (marked "nice to have"). Some (if not most) are already satisfied by wyng, but I've written them down anyway.

Some of those apply to Wyng itself, some to its integration with Qubes OS. Lets discuss how we can make this happen :)

@tasket yhis needs a thorough project status update from you here!

@tasket
Copy link

tasket commented Jun 8, 2024

@marmarek Thanks for taking time to post your queries/requirements. My answers follow:

  1. Authentication & integrity checks should already be thoroughly addressed by the Wyng and wyng-util-qubes code and archive format: The Wyng format spec shows a hierarchical structure where everything is hash-checked from the root file (archive.ini) downward, which is how the Wyng code operates. (However, this is not very complex as there are only 3 levels of metadata, 1 of data.) This also means every bit of an archive, such as various volumes, must validate in lock-step fashion. The root file is updated with new hashes as archive elements change, and it is AEAD authenticated so any subsequent access after initialization is also authenticated.

    Rollback protection: Apart from the internal hashing described above, which prevents piecemeal replacement with older authenticated messages, on startup Wyng persistently compares the locally-cached archive root with the remote. It first checks for an exact match, and if not exact then the internal timestamp in the cache cannot be newer. The root must pass the AEAD decryption phase before the timestamps are compared. This protects against whole-archive rollbacks. There are also comparisons made to protect encryption counters. Otherwise, if there is no current cache of the root present (such as when restoring to a new system or moving back and forth between systems) then the user must be careful to check the 'Updated at' time that is displayed when the archive is accessed, or at least take heed of the session date-times that are present in the archive.

    The wyng-util-qubes code doesn't mix-and-match volumes from different backup sessions in a single invocation; the user either has to specify a session date-time, or else any VMs you request for restore must all be present in the same session (users who want to restore from more than one session may run the util more than once).

  2. The Wyng default is to encrypt the archive, or to fail during archive creation if encryption dependency is not present. A user must specify --encrypt=off to create an unencrypted archive. Further, Wyng displays the (un)encrypted status when accessing an archive.

    Volume details like name and actual size are encrypted, however the amount of data in compressed/deduplicated form is visible without decryption. Also, the backup session date-times are visible (these are typically close to the Unix timestamps on component files, so was not identified by me as a must-have); the in-volume data chunk addresses are also visible, for similar reasons and that the resolution is fairly low. Also, which chunks are hard-linked to each other are visible in the case where deduplication is used. The volume names are encrypted and a volume ID number is used for the on-disk representation instead.

  3. Only the passphrase is required for archive decryption; the key is derived from only the passphrase and the salt stored in the archive. There is one package dependency for decryption (python3-pycryptodomex, which is already in dom0 by default) and another one if the archive was created with zstandard compression.

  4. Individual VMs may be specified, although in that case there will be no restriction on which backup session can be selected. Currently, the util's default will not restore dispVMs unless they are specified by name. The --include-disposable option must be specified to restore dispVMs implicitly. To restore an entire backup session, a user can run sudo wyng-util-qubes restore --session=<date-time> --include-disposable=on --dest=<archive URL> to restore all VMs in a session. The util will use either the Qubes default storage pool or an existing VM's pool; the default can be overridden with --pool option).

  5. VM settings are given a similar best-effort approach similar to qvm-backup-restore and will return a latent error at completion if the setting wasn't possible. The restored metadata are: prefs (properties), features, tags, devices. VM names are always preserved, so that existing VMs will be overwritten; failure during restore leaves the VM tagged with 'restore_incomplete'.

  6. Older versions can be restored by specifying --session, but not as different VM names. Wyng itself has a --save-to option that only works for individual volumes.

  7. There has been one format transition of note, from V2 to V3, that requires the user run wyng arch-check --upgrade-format in the current Wyng beta release (however, the upgrade doesn't add some of the V3 features, like encryption). Its my intention that future versions of Wyng would be able to at least list + extract volume data from V3 format archives onward.

  8. Restoring into pools is determined by the receiving Qubes system pool configuration; the name/path info in the archive should be considered as relative. As described earlier, the Qubes default pool can be overridden with --pool for non-pre-existing VMs, otherwise the existing VM's pool will be used. User removal of an existing VM before restore would be a precondition for controlling where its restored to.

  9. Non-Qubes access of data volumes is already be possible just by using Wyng directly on a typical Linux distro; how a user finds which volumes belong to which VMs could be documented with just a few paragraphs. I've started testing Wyng on FreeBSD to see where its retrieval functions can be made more portable, but for Linux distros Wyng itself is intended to run full-featured on them.

  10. Cloud protocol backups are planned for the [next])v0.9 release timeline tasket/wyng-backup#197) release of Wyng after v0.8. Currently Wyng needs either filesystem access (local mount or FUSE), or access via a helper script over ssh which uses basic shell commands + CPython.

For backing up the backup, simple rsync -aH --delete is recommended (with a caveat—the copy should be renamed '.updating' or similar during updates so that an aborted update isn't mistaken for whole). Incremental updates with this method should yield rsync run times that are proportional to the incremental delta. There is an issue to eventually provide an internal archive duplication function for convenience and increased efficiency.

CC @tlaurion

@mooreye
Copy link

mooreye commented Jun 9, 2024

One more nice-to-have thing I would like to see: possibility to exclude a directory/file (not sure if possible since wyng works on LVMs) or at least backup just the VM metadata (name, color, netvm, template name, etc.) without backing up its contents.

Example use case: You have a "media" qube where you watch videos offline. You download a playlist from youtube with yt-dlp which you want locally only temporarily, and you will delete it when done watching it. Excluding a dir where it is from backups so you don't have to remember to move it out of the VM before backup. Or if excluding dir not possible, dedicate a media-tmp qube to it, and backup only its metadata (not ideal, since things like media player's local config will not be backed up). Currently the only workaround seems to be not to backup the media VM at all until you delete the temp videos (not ideal), or create a partition outside of Qubes and mount it to media qube (cumbersome...), so backup utility won't work with it.

@tlaurion
Copy link
Contributor

tlaurion commented Aug 27, 2024

@marmarek @tasket #858 (comment): ping

@tlaurion
Copy link
Contributor

tlaurion commented Oct 3, 2024

@tasket @marmarek @marmarta @rapenne-s QubesOS mini-summit talk published https://youtu.be/It13u9UASs4?list=PLuISieMwVBpL5S7kPUHKenoFj_YJ8Y0_d

@marmarek said wyng-backups was in the pipelines in opening talks.

Do we try to organize for funding?


Safe disks state as a firmware service had some push-backs in Q&A of the talk. I understand there is reluctance in having the firmware be network-ready (some reluctance stating that machines using QubeOS seperate this with sys-net, sys-firewall etc), but again I have to remind people that on-demand network access from firmware (IPXE->DTS under Dasharo to produce HCL, upgrade firmware etc) is common, where Heads rely on kernel modules loaded when asked per user and where measurements are extended and wouldn't permit unsealing any secret from that boot session. I also want to remind that the promoted way of doing this under Heads is through CDC tethering over phone, so the Phone itself is exposing itself of network risks, where phone can use VPN or whatever is needed per threat model to mitigate network risks. Heads only gets an IP unless Ethernet drivers are used to connect directly to network, and where SSH would be used to connect to wyng-backup server defined in config. So provisioning of ssh private key (with/without passphrase) would be fused in CBFS alongside of provisioned wyng-backup location, and where those states maintained by IT dpt would not require encryption, where authentication and integrity of backups would be maintained by IT dpt where storage would be accessible only in read only for pre-authorized ssh public key for that server.

If the above is accepted (on-demand network access from ethernet/CDC tethering/IPXE pointing to kernel+initrd), where Heads would need to include new curl dependency to pull those initrd+kernel images, verify them and then boot into them (just like DTS today), then Heads could pass credentials and wyng backup location and be able to pull trusted states from the network and apply them to disk. This is my dream, but this is disconnected of the first steps needed.


First step is to integrate wyng-backups into QubesOS, define proper UX and GUI and revise @marmarek requirements being filled by @tasket anwers above. And then look how we could merge efforts into a grant application or other funding sources to wor torward the goal of having differential backups into QubesOS today, and then see how we could easily integrate this better so that those states could be restored outside of QubesOS, from external disks or network.

Please ping me back when you are ready to organize toward that goal and what would be the next steps.

@DemiMarie
Copy link

Safe disks state as a firmware service had some push-backs in Q&A of the talk. I understand there is reluctance in having the firmware be network-ready (some reluctance stating that machines using QubeOS seperate this with sys-net, sys-firewall etc), but again I have to remind people that on-demand network access from firmware (IPXE->DTS under Dasharo to produce HCL, upgrade firmware etc) is common, where Heads rely on kernel modules loaded when asked per user and where measurements are extended and wouldn't permit unsealing any secret from that boot session.

That does somewhat (though not entirely) mitigate the risks. Thank you.

A better solution is for the firmware itself to include an isolated network and USB stack that runs in a micro-VM. However, this is significantly more work, not least because of flash size constraints. Requiring it might be a case of “the perfect is the enemy of the good.”

I also want to remind that the promoted way of doing this under Heads is through CDC tethering over phone, so the Phone itself is exposing itself of network risks, where phone can use VPN or whatever is needed per threat model to mitigate network risks. Heads only gets an IP unless Ethernet drivers are used to connect directly to network, and where SSH would be used to connect to wyng-backup server defined in config. So provisioning of ssh private key (with/without passphrase) would be fused in CBFS alongside of provisioned wyng-backup location, and where those states maintained by IT dpt would not require encryption, where authentication and integrity of backups would be maintained by IT dpt where storage would be accessible only in read only for pre-authorized ssh public key for that server.

That is an interesting approach. I’m not sure if the Linux kernel CDC drivers are secure, but compromising a phone with strong verified boot is not easy. It’s all about tradeoffs here.

@HW42
Copy link

HW42 commented Oct 17, 2024

My comment is about the "networked backup tool in the firmware" part and not wyng backup specific. It might make sense to move that part of the discussion elsewhere.


Safe disks state as a firmware service had some push-backs in Q&A of the talk. I understand there is reluctance in having the firmware be network-ready (some reluctance stating that machines using QubeOS seperate this with sys-net, sys-firewall etc),

Yeah I was one of those commenters.

There are two main aspects here:

  • With the proposed implementation it's significantly less isolated than under Qubes OS. For this reason I don't think that is a good fit for Qubes.

  • Given that firmware is harder to update (very critical for stability, space constrained, constrained environment, ...), moving a complex thing like this into it seems not ideal.

And there's the minor aspect that I haven't quite seen yet why this would much better than having a little backup system that you load and update normally, for example on a USB drive, a separate partition, etc. (depending on the exact use case).

but again I have to remind people that on-demand network access from firmware (IPXE->DTS under Dasharo to produce HCL, upgrade firmware etc) is common,

Yes, that's Dasharo's decission. Dasharo is not limited to being a Qubes OS loader. It has it's own goals and I see how it's useful for them. Doesn't changes my opinion that I don't think that that exposure is a good fit for Qubes.

where Heads rely on kernel modules loaded when asked per user and where measurements are extended and wouldn't permit unsealing any secret from that boot session.

But that system is intended to restore a backup, no? So unless you have a very fancy setup with pre-sealed secrets (or very locked down verified boot) for the state you are restoring that networked system has full control about the OS install it's restoring and can get access to the secrets after a reboot into the restored system.

I also want to remind that the promoted way of doing this under Heads is through CDC tethering over phone, so the Phone itself is exposing itself of network risks, where phone can use VPN or whatever is needed per threat model to mitigate network risks. [...]

If the users considers the used phone to be highly secure then this works as a strange sys-net, so addresses the isolation concern. But I'm not sure if that's a common case. If you don't consider the phone highly secure I consider this setup worse to a wired ethernet connection, since now you are also exposing the USB stack of the Linux kernel.

If the above is accepted [...]

I'm not sure what you mean with accepted here. Heads is it's own project and doesn't need approval from the Qubes side. As stated I'm not convinced that the proposed implementation (whether backup tool directly in firmware or via network boot) is a great fit for Qubes' security model. But this doesn't stop Heads from implementing something. Since you already compared to Dasharo: They have some great features from Qubes' perspective, like the ability to disable network and USB stack of the firmware, but also features like the IPXE boot that I wouldn't recommend to be used together with Qubes. But since they are an independent open source project that targets to be a general purpose boot firmware, that makes a lot of sense.

How big is Heads allowed to get? Maybe it's practical to squeeze some isolated network setup in there (but that surely isn't as simple as packaging curl/wyng/... into it).

@tlaurion
Copy link
Contributor

tlaurion commented Oct 26, 2024

Edited for typos, reviewed 2025-01-02.

@marmarek @HW42 @DemiMarie @tasket @marmarta: Please don't let my personal plan for Heads providing trustworthy disk states as a firmware service get in the way of drafting the joint grant application. I feel it's another case where people will agree on a desirable outcome only once it's released. Please neglect it for now while letting it be part of the joint grant application.

To clarify the firmware/OS made available disk state as a service:

  • Minimal Heads Addition: Could require a simple curl addition to pull initrd + kernel (IPXE-ready images) to network boot (kexec into those) for externally maintained wyng-backup ready environment, just like Dasharo DTS does. This approach is easier, with less addition of unnecessary components in firmware, reuses the already available CDC/Ethernet under Heads for network access through tethering, and reuses detached signature validation already available (with added distro signature). This separates the rest of the needed work off the firmware project and focuses on reusability under discussed QubesOS ISO added tools for QubesOS restoration from installation media.
  • Complete Heads Disk as a Firmware Service Support: Could add CPython + python cryptodome + python zstd instead of curl under Heads to perform SSH operations directly from the controlled Heads OS environment (directly from GUI interacting with provisioned SSH server). This allows Heads to generate a private key (and fuse under CBFS) and work towards streamlined SSH server requirements. The client pushes its SSH public key to the SSH server, where the SSH server moves the key out of the user-accessible directory for off-channel review and adds it to OEM infrastructure's authorized_keys for approval before end-user usage. This facilitates what otherwise becomes complicated for DTS user input prior to the end-user accessing subscribed content. After this setup, access is transparent and without UX friction, with the OEM server side watching for public key reuse with some daily access limit/blacklisting of abused public keys.
  • Reusability of Disk State as an OS Installer Service: Could be added under QubesOS ISO if QubesOS intends to deploy such recoverable states for ISO-accessible recovery environments. Adding network support there is somehow mixed with the LiveOS setup, which went unmaintained for a while and never went through the Summit of Code last time I checked. This aims to offer better network isolation compared to prior points' proposed alternatives. I don't see how QubesOS ISO would permit something better or different than Heads could provide as an on-demand, in-memory-only backup restore session or launched with kexec to download curl kernel + initrd + detached signatures.

Anyway, the above is irrelevant here outside of drafting the joint grant application. Maybe @mfc would want to jump in?

Let's focus first on the current blocking points

First, make sure that:


Reminder from #858 (comment):

Notes on the grant application process.

Grant work is paid upon PoW for scoped tasks.
The grant application needs a high-level view of the whys and deliverables, not so much on the hows. When the grant application passes the first round, more details need to be given on scoped deliverables and costs, where PoW makes the work paid upon validation of reaching the scoped task.

So for teaming up here, we need first acceptance (consent) of engagement into doing the work within a year after the grant application is accepted. Scoping of general tasks, by whom the work will be done, and required approximate funds to be budgetized for such high-level tasks, to be broken down into smaller tasks upon project approval in terms of deliverables paid upon PoW.

@rapenne-s @marmarek @marmarta @deeplow @DemiMarie @tasket: Would you be willing to engage in teaming up to expand on the needs of such integration and documentation, and agree on the first step, which is to consent to accomplishing such integration as a goal if the grant application is accepted to fund the work needed?

@marmarta
Copy link
Member

marmarta commented Jan 2, 2025

I was talking a bit recently with people about backups and I have a better idea on what would a good UX for backups be, so: I could definitely participate in this project wrt to UI/UX; I can also help a bit with writing applications, I'm not terrible with lots of words.

@tasket
Copy link

tasket commented Jan 13, 2025

I'm happy to participate in a funded effort to make this possible. Although wyng-util-qubes may not end up as the proper place for it, I've created an issue there for relating any UX, design and implementation ideas. This will help us decide what will go into the grant proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: contrib package community dev This is being developed by a member of the community rather than a core Qubes developer. P: major Priority: major. Between "default" and "critical" in severity. S: needs review Status: needs review. Core devs must review contributed code for potential inclusion in Qubes OS. T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality.
Projects
None yet
Development

No branches or pull requests