-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable more and less error prone caching #1756
Comments
Do you already have a list of those options or should we collect them here? One I am currently stumbling over is changes in ‣ Refreshing partition table…
Partition #1 contains a vfat signature.
The last usable GPT sector is 6815790, but 8912935 is requested.
Failed to add #2 partition: Invalid argument |
I noticed that presets in mkosi are roughly analogous to layers in a container image. IdeaLet's say I have an multi-preset build like the one used by systemd:
Now I perform one build. Possible implementationsLet the user choose what needs to be rebuilt via command line parametersThis requires only a tiny code change but probably works well if used with existing build systems. The build system invoking mkosi would know what external dependencies a preset depends on and can tell mkosi what needs to be rebuilt. Actually check all inputs for a preset and understand what has changedThis could be a lot of work in practice. Mkosi would essentially need to record all inputs for a preset (Trees, resolved packages, the exact configuration, probably more) and decide itself if a preset needs to be rebuilt. It would the need to rebuild the changed presets and all of the presets that depend on changed presets transitively. Possible alternativesYou can define layers by having separate folders with their own mkosi config and let them depend on each other (by referencing the outputs from one directory in the basetree of another directory). I think this would make mkosi an optimal build tool for systemd sysext and full os images. |
I completely agree, but we should do this properly by checking all the inputs. We should also make this easy to debug by providing an informative diff if requested of what's causing a rebuild to happen. We can either use |
Poking around, this seems like the best place to comment: While I do appreciate mkosi in general a lot more than my previous homemade approach (using docker to build the filesystem, then a bunch of hacky scripts to create a disk image), it is a substantial downgrade to go from caching on every single command run to caching only once for the entire build, effectively. While I did think that the every-step caching was excessive (changed an environment variable? entirely new image!) I do think that only one cache step is not enough, especially when debugging. It would be nice if the final version of this proposal allowed for more caching steps, even if we do something simple like caching between each build script and restarting back to a previous image if one of the build scripts changes. |
@clarfonthey What exactly do you want to cache? Build scripts already have access to a build directory in which incremental build results can be stored. But this relies on whatever tool you invoke in the build script to support incremental builds. If you give a bit more details on your use case I can give more guidance. |
Oh, the fact that build scripts can cache their artifacts helps a lot, just, there are definitely steps in the process that are slow and caching can be made very fast with snapshotting. For example, if you have an issue in a prepare script you're debugging, even if all the packages have been already downloaded, on an Arch build I have to wait for the keyring to populate and then wait for the packages to extract, which takes time. If the base and build layers were cached before the prepare scripts ran and then again after, it would speed up that debugging step a lot. This was quite brutal when an earlier version of the script I'm working on had a DKMS package being installed in a prepare script, because I needed to add a separate repository that could not be added as a mirror: the DKMS modules being built is an atomic, uncacheable step as it's designed, and while I could try and work around that and turn it into a cacheable step, it would have been nicer if I just could isolate it and run it by itself. Right now, I have build scripts build packages and then postinst scripts install them, and that actually can be quite slow: again, post-installation steps often build things from the package sources, like local cache databases, and it's frustrating to have to wait several minutes between builds to debug things. So, while a lot of these issues could be solved by me just getting things right the first time, that's not very realistic, and caching more between steps could help a lot. Caching is already an opt-in feature, so, perhaps this level of caching could be an extra setting. To clarify how Docker/Podman does it, they effectively take a snapshot between each line in the Dockerfile/Containerfile, then go back to the step whose dependencies haven't been modified if it needs to. (For example, a script being run being changed, or the steps themselves being changed.) It feels like the checking you're describing in the issue description, like doing better checking on whether the extra trees are modified, could be done with an extra caching step, so that if those are modified, the prepare scripts aren't even run and you just perform the extra tree copying step. |
@clarfonthey I sympathize but getting this right is hard. Doing this completely properly means you eventually end up with a build system, which I'm happy to review patches that improve the situation in mkosi, but we have to keep the complexity manageable somehow. |
So, I strongly sympathise with the idea of caching being difficult, and I wasn't expecting this kind of feature to be anything more than a long-term dream, but I'm kind of confused by the assertion that It feels weird to say it isn't a build system, rather than just wanting to assert that it's a simple build system and that complicated caching is out of scope. It feels like several features that have already been included might be out of scope from the perspective of "not being a build system", and so it's likely to just be confusing for people who aren't sure what the project direction is. (Me being one of those people.) |
By build system I mean that inputs and outputs of each individual step are declaratively declared and the tool figures out which targets should be rebuilt. |
I mean, I would say that the inputs and outputs of most of the steps in mkosi are explicitly declared, for the most part, even if they aren't configurable. You have declared what the steps are in the pipeline and what they have access to. Sure, I don't think that every output should be declared: it would be far out of scope to track everything output to And to be clear, I am not arguing in favour of the method I proposed where effectively every command run is cached, but a potential middle ground where there are more caching steps possible, like being able to only run postinst scripts if none of the scripts, skeleton trees, or extra trees meant to be used before those steps has been modified. |
Currently,
--incremental
has some very basic cache invalidation based on whether the list of configured packages changed or not. This is not sufficient for multiple reasons:For checking for tree changes, we can probably use
systemd-dissect --mtree
to and diff the mtree output to see what changed. We should also make sure the diffs between caches are displayed in --debug mode to allow debugging why we don't reuse the cache and rebuild the image.We should also support appending existing cache images where we install more packages into the existing cached image instead of rebuilding it from scratch.
The text was updated successfully, but these errors were encountered: