s390x: Add accelerated inflate&deflate #1060

fneddy · 2025-04-14T12:46:24Z

This adds hardware accelerated inflate and deflate for IBMs s390x platform.

This is part of the current ongoing attempt to merge IBMs patches for zlib and get them upstream.
More information about this in #1050 .

If this gets merged #410 may be discarded .

fneddy · 2025-04-14T13:08:30Z

i split up the intrusive code changes into two commits aaafad2 and 36a95f2. I hope reviewing will be easier through this.

Neustradamus · 2025-05-23T00:37:14Z

@fneddy: Good job!

Use vector extensions when compiling for s390x and binutils knows about them. At runtime, check whether kernel supports vector extensions (it has to be not just the CPU, but also the kernel) and choose between the regular and the vectorized implementations. Co-authored-by: Eduard Stefes <[email protected]>

Architecture specific code needs hooks into the current zlib at various points. This commit adds a a hand-full of macros that platform specific code can overwrite.

Architecture specific code may need to call this functions. So we make them non-static and add the symbols to the header.

IBM Z mainframes starting from version z15 provide DFLTCC instruction, which implements deflate algorithm in hardware with estimated compression and decompression performance orders of magnitude faster than the current zlib and ratio comparable with that of level 1. This patch adds DFLTCC support to zlib. It can be enabled using the following build commands: # via configure $ ./configure --dfltcc $ make # via cmake $ cmake -DZLIB_DFLTCC=on .. $ make When built like this, zlib would compress in hardware on level 1, and in software on all other levels. Decompression will always happen in hardware. In order to enable DFLTCC compression for levels 1-6 (i.e., to make it used by default) one could either configure with --dfltcc-level-mask=0x7e or export DFLTCC_LEVEL_MASK=0x7e at run time. Two DFLTCC compression calls produce the same results only when they both are made on machines of the same generation, and when the respective buffers have the same offset relative to the start of the page. Therefore care should be taken when using hardware compression when reproducible results are desired. One such use case - reproducible software builds - is handled explicitly: when the SOURCE_DATE_EPOCH environment variable is set, the hardware compression is disabled. DFLTCC does not support every single zlib feature, in particular: * inflate(Z_BLOCK) and inflate(Z_TREES) * inflateMark() * inflatePrime() * inflateSyncPoint() When used, these functions will either switch to software, or, in case this is not possible, gracefully fail. This patch tries to add DFLTCC support in the least intrusive way. All SystemZ-specific code is placed into separate files, but unfortunately there is still a noticeable amount of changes in the main zlib code. Below is the summary of these changes. DFLTCC takes as arguments a parameter block, an input buffer, an output buffer and a window. Since DFLTCC requires parameter block to be doubleword-aligned, and it's reasonable to allocate it alongside deflate and inflate states, The ZALLOC_STATE(), ZFREE_STATE() and ZCOPY_STATE() macros are introduced in order to encapsulate the allocation details. The same is true for window, for which the ZALLOC_WINDOW() and TRY_FREE_WINDOW() macros are introduced. Software and hardware window formats do not match, therefore, deflateSetDictionary(), deflateGetDictionary(), inflateSetDictionary() and inflateGetDictionary() need special handling, which is triggered using the new DEFLATE_SET_DICTIONARY_HOOK(), DEFLATE_GET_DICTIONARY_HOOK(), INFLATE_SET_DICTIONARY_HOOK() and INFLATE_GET_DICTIONARY_HOOK() macros. deflateResetKeep() and inflateResetKeep() now update the DFLTCC parameter block, which is allocated alongside zlib state, using the new DEFLATE_RESET_KEEP_HOOK() and INFLATE_RESET_KEEP_HOOK() macros. The new DEFLATE_PARAMS_HOOK() macro switches between the hardware and the software deflate implementations when the deflateParams() arguments demand this. The new INFLATE_PRIME_HOOK(), INFLATE_MARK_HOOK() and INFLATE_SYNC_POINT_HOOK() macros make the respective unsupported calls gracefully fail. The algorithm implemented in the hardware has different compression ratio than the one implemented in software. In order for deflateBound() to return the correct results for the hardware implementation, the new DEFLATE_BOUND_ADJUST_COMPLEN() and DEFLATE_NEED_CONSERVATIVE_BOUND() macros are introduced. Actual compression and decompression are handled by the new DEFLATE_HOOK() and INFLATE_TYPEDO_HOOK() macros. Since inflation with DFLTCC manages the window on its own, calling updatewindow() is suppressed using the new INFLATE_NEED_UPDATEWINDOW() macro. In addition to the compression, DFLTCC computes the CRC-32 and Adler-32 checksums, therefore, whenever it's used, the software checksumming is suppressed using the new DEFLATE_NEED_CHECKSUM() and INFLATE_NEED_CHECKSUM() macros. DFLTCC will refuse to write an End-of-block Symbol if there is no input data, thus in some cases it is necessary to do this manually. In order to achieve this, send_bits(), bi_reverse(), bi_windup() and flush_pending() are promoted from local to ZLIB_INTERNAL. Furthermore, since the block and the stream termination must be handled in software as well, enum block_state is moved to deflate.h. Since the first call to dfltcc_inflate() already needs the window, and it might be not allocated yet, inflate_ensure_window() is factored out of updatewindow() and made ZLIB_INTERNAL. Co-authored-by: Eduard Stefes <[email protected]>

fneddy marked this pull request as ready for review April 14, 2025 12:46

fneddy mentioned this pull request Mar 24, 2025

IBM S390X contrib cleanup #1050

Open

10 tasks

iii-i and others added 4 commits August 20, 2025 15:38

Added hook macros at various points,

4d4fa66

Architecture specific code needs hooks into the current zlib at various points. This commit adds a a hand-full of macros that platform specific code can overwrite.

Exported some static functions

9287def

Architecture specific code may need to call this functions. So we make them non-static and add the symbols to the header.

fneddy force-pushed the s390x_dfltcc branch from d96a88b to 1823d08 Compare August 22, 2025 08:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

s390x: Add accelerated inflate&deflate #1060

s390x: Add accelerated inflate&deflate #1060

Uh oh!

fneddy commented Apr 14, 2025

Uh oh!

fneddy commented Apr 14, 2025

Uh oh!

Neustradamus commented May 23, 2025

Uh oh!

Uh oh!

s390x: Add accelerated inflate&deflate #1060

Are you sure you want to change the base?

s390x: Add accelerated inflate&deflate #1060

Uh oh!

Conversation

fneddy commented Apr 14, 2025

Uh oh!

fneddy commented Apr 14, 2025

Uh oh!

Neustradamus commented May 23, 2025

Uh oh!

Uh oh!