-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backport overlayfs creation approach from v19 #13
Conversation
We are going to redesign the use of overlay images during the upgrade to resolve number of issues we have with the old solution. However, we need to keep the old solution as a fallback (read below). This is small preparation to keep the new and old code separated safely. Reasoning for the fallback: * There is a chance the new solution could raise also some problems mainly for systems with many partitions/volumes in fstab, or when they are using many loop devices already - as the new solution will require to create loop device for each partition/volume noted in the fstab. * Also RHEL 7 is going to switch to ELS on Jun 2024 after which the project will be fixing just critical bugfixes for in-place upgrades. This problem blocking the upgrade is not considered to be critical. (cherry picked from commit e4fa867)
All LEAPP_* envars are supposed to be read by library function which ensures persistent behaviour during the whole upgrade process. (cherry picked from commit dfd1093)
The in-place upgrade itself requires to do some changes on the system to be able to perform the in-place upgrade itself - or even to be able to evaluate if the system is possible to upgrade. However, we do not want to (and must not) change the original system until we pass beyond the point of not return. For that purposes we have to create a layer above the real host file system, where we can safely perform all operations without affecting the system setup, rpm database, etc. Currently overlay (OVL) technology showed it is capable to handle our requirements good enough - with some limitations. However, the original design we used to compose overlay layer above the host system had number of problems: * buggy calculation of the required free space for the upgrade RPM transaction * consumed too much space to handle partitions formatted with XFS without ftype attributes (even tens GBs) * bad UX as people had to manually adjust size of OVL disk images * .. and couple of additional issues derivated from problems listed above The new solution prepares a disk image (represented by sparse-file) and an overlay image for each mountpoint configured in /etc/fstab, excluding those with FS types noted in the `OVERLAY_DO_NOT_MOUNT` set. Such prepared OVL images are then composed together to reflect the real host filesystem. In the end everything is cleaned. The composition could look like this: orig mountpoint -> disk img -> overlay img -> new mountpoint ------------------------------------------------------------- / -> root_ -> root_/ovl -> root_/ovl/ /boot -> root_boot -> root_boot/ovl -> root_/ovl/boot /var -> root_var -> root_var/ovl -> root_/ovl/var /var/lib -> root_var_lib -> root_var_lib/ovl -> root_/ovl/var/lib ... The new solution can be now problematic for system with too many partitions and loop devices, as each disk image is loop mounted (that's same as before, but number of disk images will be bigger in total number). For such systems we keep for now the possibility of the fallback to an old solution, which has number of issues mentioned above, but it's a trade of. To fallback to the old solution, set envar: LEAPP_OVL_LEGACY=1 Disk images created for OVL are formatted with XFS by default. In case of problems, it's possible to switch to Ext4 FS using: LEAPP_OVL_IMG_FS_EXT4=1 XFS is better optimized for our use cases (faster initialisation consuming less space). However we have reported several issues related to overlay images, that happened so far only on XFS filesystems. We are not sure about root causes, but having the possibility to switch to Ext4 seems to be wise. In case of issues, we can simple ask users to try the switch and see if the problem is fixed or still present. Some additional technical details about other changes * Added simple/naive checks whether the system has enough space on the partition hosting /var/lib/leapp (usually /var). Consuming the all space on the partition could lead to unwanted behaviour - in the worst case if we speak about /var partition it could mean problems also for other applications running on the system * In case the container is larger than the expected min default or the calculation of the required free space is lower than the minimal protected size, return the protected size constant (200 MiB). * Work just with mountpoints (paths) in the _prepare_required_mounts() instead of with list of MountPoint named tuple. I think about the removal of the named tuple, but let's keep it for now. * Make apparent size of created disk images 5% smaller to protect failed upgrades during the transaction execution due to really small amount of free space. * Cleanup the scratch directory at the end to free the consumed space. Disks are kept after the run of leapp when LEAPP_DEVEL_KEEP_DISK_IMGS=1 (cherry picked from commit d074926)
mknod(/var/lib/leapp/scratch/mounts/root_/system_overlay/dev/null) failed: File exists Ignoring such vfstype fixes the issue rhbz#2215027 (cherry picked from commit 6e470f4)
In case the filesystem for which the disk img is going to be created has very small amount of free space (under 130 MiBs), it cannot be formatted by XFS with current params. This could be hit in several cases: * the system partition/volume - in this case, most likely an issue will be hit anyway later by DNF speaking about small amount of free space if a content is installed inside by RPMs as such a small amount of free space is really not expected to see at all * it's a data mount point (e.g. iso) or a filesystem type that should by part of the OVERLAY_DO_NOT_MOUNT set, so enlarging the value to 130 MiBs should not affect anything negatively at all * in case of /boot, the problem with the free space is covered already in a different actor prior we try to create any disk img, so we are safe here Based on arguments above, I am considering setting the 130 MiBs as minimal value safe for in-place upgrades. Also it will allow to skip possible problems with specific file systems (like tmpfs, ...) in case we are still missing some in the OVERLAY_DO_NOT_MOUNT - and kind of read only storage (such as iso9660, etc..). Co-authored-by: Michal Hečko <[email protected]> (cherry picked from commit a81ebb0)
…ong with xfs changes
…nsure_enough_diskimage_space
Thank you for contributing to the Leapp project!Please note that every PR needs to comply with the Leapp Guidelines and must pass all tests in order to be mergable.
Please open ticket in case you experience technical problem with the CI. (RH internal only) Note: In case there are problems with tests not being triggered automatically on new PR/commit or pending for a long time, please consider rerunning the CI by commenting leapp-ci build (might require several comments). If the problem persists, contact leapp-infra. |
leapp-ci build |
@Monstrofil I believe that automation is disabled in our fork. |
(cherry picked from commit aebbf63)
This pull request backports new approach of overlayfs creation made in upstream leapp v19.
Basically, before these changes size of the virtual fs was determited by LEAPP_OVL_SIZE environment variable and was not dynamically changed accoring to real usage which lead to disk space problems during upgrade.
New solution has better support for ftype=0 mounts and also better space estimation.
You can find details in commit message.