-
Notifications
You must be signed in to change notification settings - Fork 0
feat: rechunk with rpm-ostree #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The legacy rechunker is basically unmaintained, this is a much safer option to reduce our (delta) image size. https://coreos.github.io/rpm-ostree/build-chunked-oci/ https://gitlab.com/fedora/bootc/base-images/-/blob/main/bootc-base-imagectl
enable when coreos/rpm-ostree#5545 fixed
We need a newer podman version than ubuntu 24.04 provides, homebrew is the most elegant solution. We need the layer annotations for the rpm-ostree rechunker so that updates are more efficient. workaround for: containers/podman#27796
The metadata from a prior built image is used to construct the new one so that the difference between them is minimized if possible.
Theoretically way more layers possible (I tested with 2000+) but you can't podman run the image anymore with 500+ I think the limit used to be 255, no longer the case with Kernel 6.15+. We want as much layers as we can get away with because it makes rechunking much more efficient. See: https://github.com/containers/storage/blob/83cf57466529353aced8f1803f2302698e0b5cb7/drivers/overlay/overlay.go#L56-L81
This reverts commit e9dd3f8. Seemingly random failures I saw on nvidia iamges, let's see if this is the root cause
Summary of ChangesHello @renner0e, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the image rechunking mechanism by transitioning from a custom, multi-stage container-based process to using the integrated rpm-ostree bootc-base-imagectl rechunk command. This change streamlines the build system and aligns it with upstream rpm-ostree practices. Additionally, it introduces a crucial system-level fix to prevent boot issues for users migrating from older images that relied on a legacy rechunker and nss-altfiles for group management. Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the image rechunking process by transitioning from a custom legacy rechunker to the official bootc-base-imagectl tool. Key changes include removing the custom rechunker image, simplifying the rechunk Justfile recipe to use bootc-base-imagectl, and deleting the load-rechunk recipe. Additionally, a new 20-layer-definitions.sh script was added (though mostly commented out) to prepare filesystem properties for rechunking, and a rechunker-group-fix script with an accompanying systemd service was introduced to resolve issues with group files when rebasing images that previously utilized nss-altfiles with the legacy rechunker. Review feedback points out that the podman image scp command in the Justfile is missing a destination and SUDOIF prefix, criticizes the rechunker-group-fix script for its complex and potentially flawed logic regarding group file manipulation, and notes a superfluous echo "::endgroup::" in the 20-layer-definitions.sh script that should be removed.
| ID=$(${PODMAN} inspect --format={{ '{{.Digest}}' }} localhost/"${image_name}":"${tag}") | ||
| ID_ROOT=$(sudo ${PODMAN} inspect --format={{ '{{.Digest}}' }} localhost/"${image_name}":"${tag}") | ||
| if [[ ! "${PODMAN}" =~ "docker" ]] && [[ -n "$ID" ]] && [[ "$ID" != "$ID_ROOT" ]]; then | ||
| ${PODMAN} image scp $(whoami)@localhost::localhost/"${image_name}":"${tag}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This podman image scp command has two issues:
- It's missing the destination to copy the image to. It should target the rootful store, for example
root@localhost::. - It's missing
${SUDOIF}. Copying to the rootful store requires root privileges.
Without these changes, the image won't be available in the rootful context for the next rechunking step.
${SUDOIF} ${PODMAN} image scp localhost/"${image_name}":"${tag}" root@localhost::
| for manually_created_user in $(grep -E -e ".*:[1-3][[:digit:]]{3}:.*" "$1") ; do | ||
| # `grep` matched on a group with GID [1-3]000 or so | ||
| if [ ! -z "$(cut -f4 -d: <<< "${manually_created_user}")" ] ; then | ||
| continue | ||
| fi | ||
| IGNORE_MANUAL="$(cut -f1,3 -d: --output-delimiter="|" <<< "${manually_created_user}")|${IGNORE_MANUAL:-}" | ||
| for related_group in $(grep "$(cut -f1 -d: <<< ${manually_created_user})" "$1"); do | ||
| # Deduplicates matches for the same group/user | ||
| if [ "$(cut -f1 <<< "${related_group}")" == "$(cut -f1 <<< "${manually_created_user}")" ] ; then | ||
| continue | ||
| fi | ||
| IGNORE_MANUAL="$(cut -f1 -d: <<< "${related_group}")|${IGNORE_MANUAL:-}" | ||
| done | ||
| done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block of code is critical for system stability but is very difficult to understand, which increases the risk of bugs. I have a few concerns:
- Misleading Variable Name: The variable
manually_created_useractually holds a line from the group file. Renaming it togroup_linewould improve clarity. - Questionable Logic: The condition on line 10 (
if [ ! -z "$(cut -f4...)" ]) causes the loop to skip any group that has members. This seems overly restrictive and might miss groups that should be preserved. - Flawed Inner Loop: The inner loop starting on line 14 appears to have flawed logic. It greps for the group name as a substring, which is unreliable, and then explicitly skips exact matches. This means it will only act on partial/substring matches, which is probably not the intended behavior.
Given the high impact of this script, I strongly recommend refactoring this section to be more straightforward and robust. Adding comments to explain each step of the logic would also be very helpful for future maintenance.
| #echo "listing $COUNT biggest ones" | ||
| #jq -r --arg n "$COUNT" ' sort_by(.size) | reverse | .[0:($n | tonumber)] | .[] | "\(.size)\t\(.path)" ' "$OUTPUT" | numfmt --to=iec --field=1 | ||
|
|
||
| echo "::endgroup::" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No description provided.