-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
containertool: Use same gzip headers on Linux and macOS #37
Conversation
93f8d15
to
47ff86b
Compare
5f4b9d9
to
b7abac8
Compare
Motivation ---------- Packaging the same binary using the same version of `containertool` produces different application image layers on macOS and Linux: ``` linux% swift run containertool --verbose --repository registry.test:5000/hello hello-world --from scratch ... Uploading application layer application layer: sha256:54a282d5cd082320d2d4976e7d9a952da46e3bc4bab3ce1e0b3931ccf945b849 (80394382 bytes) image configuration: sha256:fdcb887ef6e27a09456419b03b1d8353b15d68d088b8ea023f38af892fca69be (462 bytes) ... macos% swift run containertool --verbose --repository registry.test:5000/hello hello-world --from scratch ... Uploading application layer application layer: sha256:08a21093e79423c17b58325decc48d7196481ed55276c2d168de23a75d38727e (80394382 bytes) image configuration: sha256:2648cd8cca1cad7ec5b386e8433e36ca77a40e31859e5994260b2ef1d07f0753 (462 bytes) ... ``` The `application layer` hashes are different, even though they contain the same binary. The `image configuration` metadata blob hashes also differ, but they contain timestamps so this will continue to happen even after this PR is merged. A future change could make these timestamps default to the epoch, allowing identical metadata blobs to be created on Linux and macOS as well. The image layer is a gzipped TAR archive containing the executable. Saving the intermediate steps shows that the TAR archives are identical and the gzipped streams are different, but only by one byte: ``` % diff <(hexdump -X linux-image.tar.gz) <(hexdump -X darwin-image.tar.gz) 1c1 < 0000000 1f 8b 08 00 00 00 00 00 00 03 ed 57 eb 6e 1c b7 --- > 0000000 1f 8b 08 00 00 00 00 00 00 13 ed 57 eb 6e 1c b7``` ``` The difference is in the 10th byte of the gzip header: the [OS field](https://datatracker.ietf.org/doc/html/rfc1952#page-5). RFC 1952 defines a list of [known operating systems](https://datatracker.ietf.org/doc/html/rfc1952#page-8): `0x03` is the OS code for Unix, however the RFC was written in 1996 so `Macintosh` refers to the classic MacOS. Zlib uses an updated operating system list madler/zlib@ce12c5c which defines `19` / `0x13` as the OS code for Darwin. Interestingly, using `gzip` to compress a file directly produces identical results on macOS and Linux (`-n` is needed to prevent `gzip` from including the current timestamp on macOS): ``` linux% cat hello-world | gzip -n | md5sum ef64adbee9e89e78114000442a804e0e - macos% cat hello-world | gzip -n | md5sum ef64adbee9e89e78114000442a804e0e - ``` Modifications ------------- By default, Zlib uses the value of `OS_CODE` [set at compile time](https://github.com/madler/zlib/blob/ef24c4c7502169f016dcd2a26923dbaf3216748c/deflate.c#L1054). This commit uses [deflateSetHeader()](https://github.com/madler/zlib/blob/ef24c4c7502169f016dcd2a26923dbaf3216748c/deflate.c#L705) to override the default gzip header, forcing the OS code to be 0x03 (Unix) on both Linux and macOS. Result ------ After this change, image layers containing the same binary will use identical gzip headers and should have the same hash whether they are built on Linux or macOS. It is still possible that different versions of Zlib might produce different compressed data, causing the overall hashes to change. Test Plan --------- Tested manually on macOS and Linux, verifying that image layers containing identical binaries have identical hashes.
b7abac8
to
9fe0113
Compare
Discovered while investigating #34, however I don't think this difference is the cause of the reported problem because in both cases the checksum which is sent is a valid SHA256 checksum, just of a slightly different stream of data. When the registry recomputes the checksum over the data it has received, it should produce the same result. I think the problem in #34 is more likely to come from the serialisation of the JSON blobs, where reordering or reformatting of the JSON can cause checksum differences. Even there, though, the plugin should calculate the checksum after serialising the JSON data and the registry should check the checksum against the serialised data without parsing it, so there are few opportunities for inconsistencies. |
Motivation
Packaging the same binary using the same version of
containertool
produces different application image layers on macOS and Linux:The
application layer
hashes are different, even though theycontain the same binary. The
image configuration
metadata blobhashes also differ, but they contain timestamps so this will continue
to happen even after this PR is merged. A future change could
make these timestamps default to the epoch, allowing identical
metadata blobs to be created on Linux and macOS as well.
The image layer is a gzipped TAR archive containing the executable. Saving the intermediate steps shows that the TAR archives are identical and the gzipped streams are different, but only by one byte:
The difference is in the 10th byte of the gzip header: the OS
field. RFC
1952 defines a list of known operating
systems:
0x03
is the OS code for Unix, however the RFC was written in 1996so
Macintosh
refers to the classic MacOS. Zlib uses an updatedoperating system list
madler/zlib@ce12c5c
which defines
19
/0x13
as the OS code for Darwin.Interestingly, using
gzip
to compress a file directly produces identical results on macOS and Linux (-n
is needed to preventgzip
from including the current timestamp on macOS):Modifications
By default, Zlib uses the value of
OS_CODE
set at compile time. This commit usesdeflateSetHeader() to override the default gzip header, forcing the OS code to be 0x03 (Unix) on both Linux and macOS.
Result
After this change, image layers containing the same binary will use identical gzip headers and should have the same hash whether they
are built on Linux or macOS. It is still possible that different
versions of Zlib might produce different compressed data, causing
the overall hashes to change.
Test Plan
Tested manually on macOS and Linux, verifying that image layers containing identical binaries have identical hashes.
Added a test for
containertool
'sgzip
function.