Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable more and less error prone caching #1756

Open
DaanDeMeyer opened this issue Aug 8, 2023 · 10 comments
Open

Enable more and less error prone caching #1756

DaanDeMeyer opened this issue Aug 8, 2023 · 10 comments

Comments

@DaanDeMeyer
Copy link
Contributor

DaanDeMeyer commented Aug 8, 2023

Currently, --incremental has some very basic cache invalidation based on whether the list of configured packages changed or not. This is not sufficient for multiple reasons:

  • There's a lot more options that affect the cached images outside of which packages we install
  • For BaseTrees=, SkeletonTrees= and PackageManagerTrees=, it's not sufficient to just check whether the option changed, we need to check whether the directory tree or file pointed to by the option changed at all. For directory trees, we have to check every file in the tree for changes
  • We currently check whether we can use the cached image or not, we should also be able to check whether we need to rebuild the image at all, this means checking all config options, all configured source files and all input trees for changes and making a cache manifest out of those.

For checking for tree changes, we can probably use systemd-dissect --mtree to and diff the mtree output to see what changed. We should also make sure the diffs between caches are displayed in --debug mode to allow debugging why we don't reuse the cache and rebuild the image.

We should also support appending existing cache images where we install more packages into the existing cached image instead of rebuilding it from scratch.

@DaanDeMeyer DaanDeMeyer added the RFE label Aug 8, 2023
@septatrix
Copy link
Contributor

  • There's a lot more options that affect the cached images outside of which packages we install

Do you already have a list of those options or should we collect them here? One I am currently stumbling over is changes in RootSize= (yes, I know, not present in v15 anymore)

‣  Refreshing partition table…
Partition #1 contains a vfat signature.
The last usable GPT sector is 6815790, but 8912935 is requested.
Failed to add #2 partition: Invalid argument

@malt3
Copy link
Contributor

malt3 commented Aug 23, 2023

I noticed that presets in mkosi are roughly analogous to layers in a container image.
This makes me think that mkosi could have incremental builds on a preset level.
Let me explain the idea (in vague terms) before talking about how this could be implemented:

Idea

Let's say I have an multi-preset build like the one used by systemd:

  • base holds packages, builds software and outputs a sysroot as a directory
  • initrd generates a cpio used as initrd
  • system depends on base, initrd and some extra trees (let's say a few tar files with software built elsewhere that can be used as extra trees) and generates a bootable uefi image with a uki

Now I perform one build.
If only one of the tar files for the system preset has changed, I want to be able to rebuild incrementally (keep base and initrd from the last build and only rebuild system).

Possible implementations

Let the user choose what needs to be rebuilt via command line parameters

This requires only a tiny code change but probably works well if used with existing build systems. The build system invoking mkosi would know what external dependencies a preset depends on and can tell mkosi what needs to be rebuilt.
All that is needed would be flags to either keep existing presets or to explicitly rebuild a list of presets. Mkosi should then rebuild all chosen presets and all of the presets that depend on the chosen presets.

Actually check all inputs for a preset and understand what has changed

This could be a lot of work in practice. Mkosi would essentially need to record all inputs for a preset (Trees, resolved packages, the exact configuration, probably more) and decide itself if a preset needs to be rebuilt. It would the need to rebuild the changed presets and all of the presets that depend on changed presets transitively.

Possible alternatives

You can define layers by having separate folders with their own mkosi config and let them depend on each other (by referencing the outputs from one directory in the basetree of another directory).
In this scenario, mkosi cannot by itself orchestrate the whole build and another build system needs to know that layers depend on each other and build layers in the correct order.


I think this would make mkosi an optimal build tool for systemd sysext and full os images.
It would provide a similar level of convenience as a Dockerfiles do for building containers. I also think implementing only the first option for now would already provide a lot of value.

@DaanDeMeyer
Copy link
Contributor Author

I completely agree, but we should do this properly by checking all the inputs. We should also make this easy to debug by providing an informative diff if requested of what's causing a rebuild to happen. We can either use systemd-dissect --mtree or diffoscope to figure out differences. Would love to review PRs for this, but we shouldn't half bake this but do it properly.

@clarfonthey
Copy link

clarfonthey commented Feb 8, 2025

Poking around, this seems like the best place to comment:

While I do appreciate mkosi in general a lot more than my previous homemade approach (using docker to build the filesystem, then a bunch of hacky scripts to create a disk image), it is a substantial downgrade to go from caching on every single command run to caching only once for the entire build, effectively. While I did think that the every-step caching was excessive (changed an environment variable? entirely new image!) I do think that only one cache step is not enough, especially when debugging.

It would be nice if the final version of this proposal allowed for more caching steps, even if we do something simple like caching between each build script and restarting back to a previous image if one of the build scripts changes.

@DaanDeMeyer
Copy link
Contributor Author

DaanDeMeyer commented Feb 8, 2025

@clarfonthey What exactly do you want to cache? Build scripts already have access to a build directory in which incremental build results can be stored. But this relies on whatever tool you invoke in the build script to support incremental builds. If you give a bit more details on your use case I can give more guidance.

@clarfonthey
Copy link

Oh, the fact that build scripts can cache their artifacts helps a lot, just, there are definitely steps in the process that are slow and caching can be made very fast with snapshotting.

For example, if you have an issue in a prepare script you're debugging, even if all the packages have been already downloaded, on an Arch build I have to wait for the keyring to populate and then wait for the packages to extract, which takes time. If the base and build layers were cached before the prepare scripts ran and then again after, it would speed up that debugging step a lot.

This was quite brutal when an earlier version of the script I'm working on had a DKMS package being installed in a prepare script, because I needed to add a separate repository that could not be added as a mirror: the DKMS modules being built is an atomic, uncacheable step as it's designed, and while I could try and work around that and turn it into a cacheable step, it would have been nicer if I just could isolate it and run it by itself.

Right now, I have build scripts build packages and then postinst scripts install them, and that actually can be quite slow: again, post-installation steps often build things from the package sources, like local cache databases, and it's frustrating to have to wait several minutes between builds to debug things.

So, while a lot of these issues could be solved by me just getting things right the first time, that's not very realistic, and caching more between steps could help a lot. Caching is already an opt-in feature, so, perhaps this level of caching could be an extra setting.

To clarify how Docker/Podman does it, they effectively take a snapshot between each line in the Dockerfile/Containerfile, then go back to the step whose dependencies haven't been modified if it needs to. (For example, a script being run being changed, or the steps themselves being changed.) It feels like the checking you're describing in the issue description, like doing better checking on whether the extra trees are modified, could be done with an extra caching step, so that if those are modified, the prepare scripts aren't even run and you just perform the extra tree copying step.

@DaanDeMeyer
Copy link
Contributor Author

@clarfonthey I sympathize but getting this right is hard. Doing this completely properly means you eventually end up with a build system, which mkosi definitely isn't. For an image builder that tries to go all the way in this aspect, check out https://facebookincubator.github.io/antlir/.

I'm happy to review patches that improve the situation in mkosi, but we have to keep the complexity manageable somehow.

@clarfonthey
Copy link

So, I strongly sympathise with the idea of caching being difficult, and I wasn't expecting this kind of feature to be anything more than a long-term dream, but I'm kind of confused by the assertion that mkosi isn't a build system. I get wanting to call it a simple build system, or to say that this feature is out of scope, buṫ… it's literally building images, and it even has several components labelled "build."

It feels weird to say it isn't a build system, rather than just wanting to assert that it's a simple build system and that complicated caching is out of scope. It feels like several features that have already been included might be out of scope from the perspective of "not being a build system", and so it's likely to just be confusing for people who aren't sure what the project direction is. (Me being one of those people.)

@DaanDeMeyer
Copy link
Contributor Author

By build system I mean that inputs and outputs of each individual step are declaratively declared and the tool figures out which targets should be rebuilt.

@clarfonthey
Copy link

clarfonthey commented Feb 12, 2025

I mean, I would say that the inputs and outputs of most of the steps in mkosi are explicitly declared, for the most part, even if they aren't configurable. You have declared what the steps are in the pipeline and what they have access to.

Sure, I don't think that every output should be declared: it would be far out of scope to track everything output to BUILDDIR, ARTIFACTDIR, etc., but there are plenty of clear steps that could be cached by the simple checks of whether these scripts have changed or not.

And to be clear, I am not arguing in favour of the method I proposed where effectively every command run is cached, but a potential middle ground where there are more caching steps possible, like being able to only run postinst scripts if none of the scripts, skeleton trees, or extra trees meant to be used before those steps has been modified.

@DaanDeMeyer DaanDeMeyer removed the RFE label Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants