Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive - add reproducible_tar option #8691

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

glennpratt
Copy link

@glennpratt glennpratt commented Jul 30, 2024

SUMMARY

archive - add reproducible_tar option to make tar archives vary less given the same input file content

ISSUE TYPE
  • Feature Pull Request
COMPONENT NAME

community.general.archive new parameter reproducible_archive

ADDITIONAL INFORMATION

Used diffoscope to compare with GNU tar on an example directory

tar --create --file ../roles/k8s_node/files/lelastic2.tgz --gzip \
	--mtime='@0' \
	--sort=name \
	--numeric-owner --owner=0 --group=0 \
	--mode=go+u,go-w \
	--directory ../charts lelastic)

The most obvious remaining improvement would be to add the root directory and sort all files.

I removed such a change here because it caused test failures I haven't debugged.

Before diffoscope
diffoscope ./roles/k8s_node/files/lelastic2.tgz ./roles/k8s_node/files/lelastic.tgz
--- ./roles/k8s_node/files/lelastic2.tgz
+++ ./roles/k8s_node/files/lelastic.tgz
├── filetype from file(1)
│ @@ -1 +1 @@
│ -gzip compressed data, from Unix
│ +gzip compressed data, was "lelastic.tgz", last modified: Tue Jul 30 16:09:12 2024, max compression
│   --- lelastic2.tgz-content
├── +++ lelastic.tgz-content
│ ├── file list
│ │ @@ -1,6 +1,5 @@
│ │ -drwxr-xr-x   0        0        0        0 1970-01-01 00:00:00.000000 lelastic/
│ │ --rw-r--r--   0        0        0      349 1970-01-01 00:00:00.000000 lelastic/.helmignore
│ │ --rw-r--r--   0        0        0     1158 1970-01-01 00:00:00.000000 lelastic/Chart.yaml
│ │ -drwxr-xr-x   0        0        0        0 1970-01-01 00:00:00.000000 lelastic/templates/
│ │ --rw-r--r--   0        0        0     1855 1970-01-01 00:00:00.000000 lelastic/templates/lelastic-deployment.yaml
│ │ --rw-r--r--   0        0        0     1428 1970-01-01 00:00:00.000000 lelastic/values.yaml
│ │ +drwxr-xr-x   0 gpratt     (503) staff       (20)        0 2024-07-26 19:51:37.315485 lelastic/templates/
│ │ +-rw-r--r--   0 gpratt     (503) staff       (20)     1158 2023-09-14 23:52:21.421117 lelastic/Chart.yaml
│ │ +-rw-r--r--   0 gpratt     (503) staff       (20)      349 2023-09-14 23:52:21.421037 lelastic/.helmignore
│ │ +-rw-r--r--   0 gpratt     (503) staff       (20)     1428 2023-09-14 23:52:21.421620 lelastic/values.yaml
│ │ +-rw-r--r--   0 gpratt     (503) staff       (20)     1855 2024-07-26 19:51:37.315540 lelastic/templates/lelastic-deployment.yaml
│ ├── filetype from file(1)
│ │ @@ -1 +1 @@
│ │ -POSIX tar archive (GNU)
│ │ +POSIX tar archive
After diffoscope
❯ diffoscope ./roles/k8s_node/files/lelastic2.tgz ./roles/k8s_node/files/lelastic.tgz 
--- ./roles/k8s_node/files/lelastic2.tgz
+++ ./roles/k8s_node/files/lelastic.tgz
├── filetype from file(1)
│ @@ -1 +1 @@
│ -gzip compressed data, from Unix
│ +gzip compressed data
│   --- lelastic2.tgz-content
├── +++ lelastic.tgz-content
│ ├── file list
│ │ @@ -1,6 +1,5 @@
│ │ -drwxr-xr-x   0        0        0        0 1970-01-01 00:00:00.000000 lelastic/
│ │ --rw-r--r--   0        0        0      349 1970-01-01 00:00:00.000000 lelastic/.helmignore
│ │ --rw-r--r--   0        0        0     1158 1970-01-01 00:00:00.000000 lelastic/Chart.yaml
│ │  drwxr-xr-x   0        0        0        0 1970-01-01 00:00:00.000000 lelastic/templates/
│ │ --rw-r--r--   0        0        0     1855 1970-01-01 00:00:00.000000 lelastic/templates/lelastic-deployment.yaml
│ │ +-rw-r--r--   0        0        0     1158 1970-01-01 00:00:00.000000 lelastic/Chart.yaml
│ │ +-rw-r--r--   0        0        0      349 1970-01-01 00:00:00.000000 lelastic/.helmignore
│ │  -rw-r--r--   0        0        0     1428 1970-01-01 00:00:00.000000 lelastic/values.yaml
│ │ +-rw-r--r--   0        0        0     1855 1970-01-01 00:00:00.000000 lelastic/templates/lelastic-deployment.yaml
│ ├── filetype from file(1)
│ │ @@ -1 +1 @@
│ │ -POSIX tar archive (GNU)
│ │ +POSIX tar archive

@ansibullbot
Copy link
Collaborator

cc @bendoh
click here for bot help

@ansibullbot ansibullbot added WIP Work in progress feature This issue/PR relates to a feature request module module new_contributor Help guide this first time contributor plugins plugin (any type) labels Jul 30, 2024
@ansibullbot

This comment was marked as outdated.

@ansibullbot ansibullbot added tests tests unit tests/unit labels Jul 30, 2024
@glennpratt glennpratt marked this pull request as ready for review July 30, 2024 05:17
@ansibullbot ansibullbot removed the WIP Work in progress label Jul 30, 2024
@glennpratt glennpratt changed the title Create reproducible tar archives archive - add reproducible_tar option Jul 30, 2024
@felixfontein felixfontein added check-before-release PR will be looked at again shortly before release and merged if possible. backport-9 Automatically create a backport for the stable-9 branch labels Jul 30, 2024
Copy link
Collaborator

@felixfontein felixfontein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution!

plugins/modules/archive.py Outdated Show resolved Hide resolved
plugins/modules/archive.py Outdated Show resolved Hide resolved
plugins/modules/archive.py Outdated Show resolved Hide resolved
gid=0,
uname="",
gname="",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is a very special interpretation of 'reproducible tarfile' IMO. Also this is potentially dangerous since files that are protected (only readable by specific users/groups) before archiving are suddenly publicly readable after extraction.

Maybe it would be better to make the level of reproducibility configurable? On the other hand, that would make the interface also pretty complicated.

I guess this needs to be discussed first. Maybe create a thread in https://forum.ansible.com/c/project/collection-development/27 for that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@glennpratt glennpratt Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make a thread when I have more time, I'll just leave some initial notes here.

  • The danger is on extract of course.
  • Unless you are extracting into a path that's already readable, it won't become readable:
     root@dev-master-1:~# tar -xzf coreutils-0.0.27-x86_64-unknown-linux-musl.tar.gz
     root@dev-master-1:~# sudo -u nobody ls /root/coreutils-0.0.27-x86_64-unknown-linux-musl
     ls: cannot access '/root/coreutils-0.0.27-x86_64-unknown-linux-musl': Permission denied
     root@dev-master-1:~# sudo -u nobody ls /root/coreutils-0.0.27-x86_64-unknown-linux-musl/LICENSE
     ls: cannot access '/root/coreutils-0.0.27-x86_64-unknown-linux-musl/LICENSE': Permission denied
    
  • Extracting files as root (common with ansible) with odd source attributes (localhost ansible is not running as root), which was what was happening without this for me, is also potentially dangerous - UID collision leads to write privilege escalation from non-root.

@ansibullbot
Copy link
Collaborator

@glennpratt This PR contains @ mentions in at least one commit message. Those mentions can cause cascading notifications through GitHub and need to be removed. Please squash or amend your commits to remove the mentions.

click here for bot help

@ansibullbot ansibullbot added the needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR label Aug 6, 2024
@ansibullbot ansibullbot removed the needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR label Aug 6, 2024
@ansibullbot ansibullbot added the stale_ci CI is older than 7 days, rerun before merging label Aug 16, 2024
@felixfontein felixfontein removed the backport-9 Automatically create a backport for the stable-9 branch label Oct 7, 2024
description:
- Set tar metadata and gzip headers to vary less given the same input file content.
- Useful for minimizing unneeded archive changes and avoiding handlers that may trigger on such changes.
type: bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as the bare minimum this shouldn't be of type bool, but of type str with choices, so that it's possible to add other ways to make it reproducible later.

@felixfontein felixfontein added the backport-10 Automatically create a backport for the stable-10 branch label Nov 4, 2024
@felixfontein
Copy link
Collaborator

@glennpratt ping!

needs_info

@ansibullbot ansibullbot added the needs_info This issue requires further information. Please answer any outstanding questions label Dec 30, 2024
@glennpratt
Copy link
Author

glennpratt commented Dec 31, 2024

Hi @felixfontein, I'm about to be away for a week at least, so I won't be getting back to this soon.

@ansibullbot ansibullbot removed the needs_info This issue requires further information. Please answer any outstanding questions label Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-10 Automatically create a backport for the stable-10 branch check-before-release PR will be looked at again shortly before release and merged if possible. feature This issue/PR relates to a feature request module module new_contributor Help guide this first time contributor plugins plugin (any type) stale_ci CI is older than 7 days, rerun before merging tests tests unit tests/unit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants