-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
archive - add reproducible_tar option #8691
base: main
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution!
gid=0, | ||
uname="", | ||
gname="", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this is a very special interpretation of 'reproducible tarfile' IMO. Also this is potentially dangerous since files that are protected (only readable by specific users/groups) before archiving are suddenly publicly readable after extraction.
Maybe it would be better to make the level of reproducibility configurable? On the other hand, that would make the interface also pretty complicated.
I guess this needs to be discussed first. Maybe create a thread in https://forum.ansible.com/c/project/collection-development/27 for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not intended to be particularly special. I compare it to tar
in the summary.
https://reproducible-builds.org/docs/archives/
https://www.gnu.org/software/tar/manual/html_section/Reproducibility.html
https://github.com/drivendataorg/repro-tarfile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make a thread when I have more time, I'll just leave some initial notes here.
- The danger is on extract of course.
- Unless you are extracting into a path that's already readable, it won't become readable:
root@dev-master-1:~# tar -xzf coreutils-0.0.27-x86_64-unknown-linux-musl.tar.gz root@dev-master-1:~# sudo -u nobody ls /root/coreutils-0.0.27-x86_64-unknown-linux-musl ls: cannot access '/root/coreutils-0.0.27-x86_64-unknown-linux-musl': Permission denied root@dev-master-1:~# sudo -u nobody ls /root/coreutils-0.0.27-x86_64-unknown-linux-musl/LICENSE ls: cannot access '/root/coreutils-0.0.27-x86_64-unknown-linux-musl/LICENSE': Permission denied
- Extracting files as root (common with ansible) with odd source attributes (localhost ansible is not running as root), which was what was happening without this for me, is also potentially dangerous - UID collision leads to write privilege escalation from non-root.
@glennpratt This PR contains |
a09d6da
to
3445dee
Compare
description: | ||
- Set tar metadata and gzip headers to vary less given the same input file content. | ||
- Useful for minimizing unneeded archive changes and avoiding handlers that may trigger on such changes. | ||
type: bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think as the bare minimum this shouldn't be of type bool
, but of type str
with choices
, so that it's possible to add other ways to make it reproducible later.
@glennpratt ping! needs_info |
Hi @felixfontein, I'm about to be away for a week at least, so I won't be getting back to this soon. |
SUMMARY
archive
- addreproducible_tar
option to make tar archives vary less given the same input file contentISSUE TYPE
COMPONENT NAME
community.general.archive
new parameterreproducible_archive
ADDITIONAL INFORMATION
Used
diffoscope
to compare with GNU tar on an example directoryThe most obvious remaining improvement would be to add the root directory and sort all files.
I removed such a change here because it caused test failures I haven't debugged.
Before diffoscope
After diffoscope