Zstd compressor (#3) #1604

clusty · 2023-12-21T14:21:19Z

This PR adds a new compression algorithm optimized for Deep images.
During the development, deep images were shrunk by 20-30% compared to ZIPS with similar compression performance and 35-45% smaller when the compression level is turned to the max (about a 3-4x slowdown compared to ZIPS). The decompression speed slightly faster than ZIPS.

I did not spend a lot of time trying to optimize for non-Deep images, but
certain small non-deep example images with HALF pixels yielded a slightly worse compression rate compared to ZIPS.

The core algorithm is implemented in Blosc, with ZSTD codec with an appropriate data prefiltering.
The main advantage is that it offers a unified API access to multiple codecs and is under active development. Also the binary serialization stays compatible if ever we want to change the codecs/config parameters in the future, without introducing breaking changes.

Open questions:
I still have not figured out how to integrate Blosc properly into the OpenEXR build pipeline.

Co-authored-by: Philippe Leprince <[email protected]> Signed-off-by: Vlad Lazar <[email protected]> * better multy-type compression * Version the Stream --------- Signed-off-by: Vlad Lazar <[email protected]> Co-authored-by: Philippe Leprince <[email protected]>

linux-foundation-easycla · 2023-12-21T14:21:28Z

The committers listed above are authorized under a signed CLA.

✅ login: pleprince / name: Philippe Leprince (06f30a8, a06daec, 0e90ff1, 2ad98fe, 48fa372, a887716, ccac098, bb8754a, 387d2ca, 2d678ae, 6cfc3cc, f52266d, 688d0d6, 21c60db, 9799bf9, 05d9457, 40b1937, f2bdfde, 88c0348, 1d2cd0c, a9ebf07, 15844b5, 903c2c4, 5b0be1d, 5047f7f, cb6a1e0, 2341fb4, 55f37b0, 81e1701, 4c714a2, 17a3fdf)
✅ login: clusty / name: Vlad Lazar (41afe2e, 78ca6c6, d0a24d4, 1e6698e, 0689612, 057dd80, 574988e)

cary-ilm · 2024-01-03T01:35:52Z

A belated thanks for this, it mostly looks pretty straightforward, with some caveats.

Obviously, the blosc configuration needs to be sorted out. If you haven't already, you might look at OpenVDB for template for building against blosc: https://github.com/AcademySoftwareFoundation/openvdb/blob/master/CMakeLists.txt#L99 and https://github.com/AcademySoftwareFoundation/openvdb/blob/master/cmake/FindBlosc.cmake

We'll obviously need to carefully consider the consequences and timing of extensions to the file format, as well as library dependencies, and possibly want to bundle this with other changes, but we have already been anticipating adding new compression algorithms, so having a working example will be helpful.

A few pedantic requests:

We require a CLA, can you sign via the red "NOT COVERED" button above?
We also require copyright notices at the top of each file.
In addition to the example code, there should be tests in the test suite in src/test.
We'll need the parallel implementation in the OpenEXRCore library.
I'll leave a few comments inline about minimizing whitespace changes.
Also, the associated docs should be updated as well.

src/examples/deepExamples.cpp

src/lib/OpenEXR/ImfZstdCompressor.cpp

src/lib/OpenEXR/ImfZstdCompressor.h

cary-ilm

Our preference is to minimize whitespace changes wherever possible, and format code via clang-format.

pleprince · 2024-01-11T08:40:46Z

I will add the cmake changes shortly. Thanks

clusty · 2024-01-11T13:51:53Z

@cary-ilm Is there a way to share the implementation of my compressor between OpenEXR and OpenEXRCore ? The 2 are the the same minus some memory management bits.

Vertexwahn · 2024-01-11T13:56:48Z

src/lib/OpenEXR/CMakeLists.txt

@@ -220,3 +222,7 @@ openexr_define_library(OpenEXR
    OpenEXR::IlmThread
    OpenEXR::OpenEXRCore
  )
+
+target_include_directories(OpenEXR PUBLIC "/home/vladal/bin/include/")


do not forget to change this

yup, I noticed. Thanks

Clang-format deepExamples.cpp and fix a comment typo. Clang-format deepExamples.cpp and fix a comment typo.

cary-ilm · 2024-01-11T17:47:21Z

OpenEXR depends on OpenEXRCore (minimally at this point but that may well increase), so if you able to put an implementation in OpenEXRCore that can be used in OpenEXR, that's fine. But obviously you're aware that OpenEXRCore is a C-language library, so no C++ classes there.

peterhillman · 2024-01-11T23:20:51Z

I see you updated exrheader to dump out the compression type. exrmaketiled, exrenvmap and exrmultiview take a command line string and turn that to a compression type. Should they be extended to support ZSTD compression?

Perhaps it's worth making library functions that turn a string like 'piz' into PIZ_COMPRESSION, and vice versa, to simplify updating these utliities (and end-user code) as new compression types are added.

Vertexwahn · 2024-01-12T06:45:04Z

src/lib/OpenEXR/CMakeLists.txt

+
+target_include_directories(OpenEXR PUBLIC "/home/vladal/bin/include/")
+target_link_directories(OpenEXR PUBLIC "/home/vladal/bin/lib")
+target_link_libraries(OpenEXR PUBLIC "dl" "blosc2")


The library you use here is this one here -> https://github.com/Blosc/c-blosc2 right? Does the newest release of the library (2.12.0) work with this code?

Yes it should work.

pleprince · 2024-01-12T17:21:57Z

Hi @cary-ilm & @peterhillman

I have made a number of cmake modifications based on the Imath lib section. That seem to work but there are a few lines that I copied and pasted that I don't understand and any help would be much appreciated.

  # the install creates this but if we're using the library locally we
  # haven't installed the header files yet, so need to extract those
  # and make a variable for header only usage
  if(NOT TARGET Imath::ImathConfig)
    get_target_property(imathinc Imath INTERFACE_INCLUDE_DIRECTORIES)
    get_target_property(imathconfinc ImathConfig INTERFACE_INCLUDE_DIRECTORIES)
    list(APPEND imathinc ${imathconfinc})
    set(IMATH_HEADER_ONLY_INCLUDE_DIRS ${imathinc})
    message(STATUS "Imath interface dirs ${IMATH_HEADER_ONLY_INCLUDE_DIRS}")
  endif()

my problem here is IMATH_HEADER_ONLY_INCLUDE_DIRS isn't referenced anywhere else in the project. Is that being picked by magic / naming convention ? I didn't find anything in the cmake docs. I am assuming this is adding the Imath API headers to the install, in which case I don't need it.

The other bit I don't understand is:

    # Propagate OpenEXR's setting for pkg-config generation to Imath:
    # If OpenEXR is generating it, the internal Imath should, too.
    set(IMATH_INSTALL_PKG_CONFIG ${OPENEXR_INSTALL_PKG_CONFIG})

What's exactly the point here ? Another bit of magic, I'm sure ! :-)

Thanks in advance

peterhillman · 2024-01-16T03:15:14Z

I got this to build OK, so it looks like the Blosc build is working.
Did you research different settings for the number of scanlines in each chunk? 32 scanlines of deep image data could get very large. It's also a little more complicated to multi-thread efficiently with multi-scanline compression schemes. I wonder whether there should be a ZSTD1 as well as a ZSTD32? Or perhaps just a ZSTD1?

As it happens, it seems the library doesn't properly support writing multi-scanline compression formats to deep images.
I wrote a test tool to read and write deep images with a different compression. It only allocates a single deep scanline at a time, points a DeepFrameBuffer at that, then loops reading the scanline from the input and then calling writePixels(1). It appears LineBufferTask::execute in ImfDeepScanLineOutputFile waits until writePixels() has been called 32 times, then it copies the pixel counts from the FrameBuffer for all 32 scanlines before writing. In my case, I've overwritten the data from the previous 31 scanlines.
I had a brief go at fixing it so it saves the pixel counts as soon as writePixels is called, but haven't got that working properly yet. I hacked the ZSTD_COMPRESSION to be single scanline, and that works.

clusty · 2024-01-16T13:48:37Z

I got this to build OK, so it looks like the Blosc build is working. Did you research different settings for the number of scanlines in each chunk? 32 scanlines of deep image data could get very large. It's also a little more complicated to multi-thread efficiently with multi-scanline compression schemes. I wonder whether there should be a ZSTD1 as well as a ZSTD32? Or perhaps just a ZSTD1?

The 32 number came from some experiements with some real world data I had. You are probably right that 32 might be large, when the number of samples is large. Will test with 1 and see how much does the compression degrade.

As it happens, it seems the library doesn't properly support writing multi-scanline compression formats to deep images. I wrote a test tool to read and write deep images with a different compression. It only allocates a single deep scanline at a time, points a DeepFrameBuffer at that, then loops reading the scanline from the input and then calling writePixels(1). It appears LineBufferTask::execute in ImfDeepScanLineOutputFile waits until writePixels() has been called 32 times, then it copies the pixel counts from the FrameBuffer for all 32 scanlines before writing. In my case, I've overwritten the data from the previous 31 scanlines. I had a brief go at fixing it so it saves the pixel counts as soon as writePixels is called, but haven't got that working properly yet. I hacked the ZSTD_COMPRESSION to be single scanline, and that works.

Thanks very much for looking at this. We had some discussions about this topic. I was experience spurious deadlocks in LineBufferTask. At some point all the reliability issues had gone away, so I chucked it to my imagination :)

I think this all points to the idea to make ZSTD work only on 1 scanline for now. (might make a 32 scanline version later)

pleprince · 2024-01-16T19:25:37Z

Perhaps it's worth making library functions that turn a string like 'piz' into PIZ_COMPRESSION, and vice versa, to simplify updating these utliities (and end-user code) as new compression types are added.

I have worked on that today and will make a separate PR.

peterhillman · 2024-01-17T20:13:34Z

I had issues doing cmake ../openexr -DCMAKE_BUILD_TYPE=Debug - it seems that causes blosc to generate a libblosc2_d.a but OpenEXR still tries to link against libblosc2.a.
In other news I think I have a version of OpenEXR which fixes deep reading and writing to allow single deep scanlines to be written and read with codecs that compress multiple scanlines together. The current code is using the supplied SampleCount slice to store per-pixel sample counts, so that must be valid for all scanlines of the chunk (and the library will write the sample count to your buffer for all scanlines of the chunk, even if you only ask it to read one scanline). There are no issues with the current implementation, as long as all deep codecs are single scanline only.
My changes further enforce that deep images cannot have subsampling other than 1, which means every sample has the same set of channels. Reading is inefficient: I believe reading scanlines consecutively from the same chunk causes the chunk to be decoded every time.

I would suggest that we have a ZSTD single scanline codec, but if decide to add a multiscanline version too, then we will need to make these changes. I'm a little bit timid about pushing my changes unless we need them. If we have multiple versions of ZSTD, then I think they should be in the same release

lji-ilm · 2024-01-18T06:11:37Z

I did not spend a lot of time trying to optimize for non-Deep images, but
certain small non-deep example images with HALF pixels yielded a slightly worse compression rate compared to ZIPS.

I looked up blosc2 a bit and it seems it turns on byte swizzle by default. This is another empirical optimization chasing after the idea that "exponent might be smooth while mantissa might fluctuate rapidly"; in other words, the higher bytes and lower bytes might exhibit different behaviours inside a float and we should collect them into separate chunks. But the blosc2 implementation probably only works for 4 byte IEEE float. I have a hard time imaging how this would optimize half:

https://www.slideshare.net/PyData/blosc-py-data-2014#17

openEXR's internal zip chased a similar idea, however the existing code swizzled by 2 bytes for an entire block of numbers rather than doing a local 16 byte swizzle on a 4 byte pattern: https://github.com/AcademySoftwareFoundation/openexr/blob/86d0fb09859951d1e51a889e4ff2b7b3baecf021/src/lib/OpenEXR/ImfZip.cpp#L50C2-L72

This might be one reason why it works well for float seems not be so performant on half.

Vertexwahn · 2024-01-21T18:21:13Z

FYI: I just created a PR on the Bazel Central Registry to add c-blosc2 2.12.0 (bazelbuild/bazel-central-registry#1349). Once this is merged Bazel builds can be fixed this way:

Add to MODULE.bazel:

bazel_dep(name = "c-blosc2", version = "2.12.0")

Add to BUILD.bazel under the deps section of cc_library `OpenEXR´:

"@c-blosc2//c-blosc2",

* Whitespaces and licensing * WIP OpenEXRCore implementation * Brand new spanking blosc build. * Switch to single Scanline zstd compression and Single implementation * Fixed the tests * Undo whitespace changes * Last touches * Revert extra build changes

Vertexwahn · 2024-04-05T19:38:13Z

@pleprince It turned out that the c-blosc2 Bazel module still has some issues. I opened a PR on the Bazel Central Registry to fix it. Once bazelbuild/bazel-central-registry#1773 is merged you can bump the version to:
bazel_dep(name = "c-blosc2", version = "2.12.0.bcr.2" (last number changed from "1" to "2") and then it should work. Usually merging in the BCR works quite fast (1-2 work days). If you cannot wait for this feel free to set the Bazel GitHub workflow to manual execution only (e.g. ignoring it for PRs) - I will than fix the Bazel build once bazelbuild/bazel-central-registry#1773 is merged.

Vertexwahn · 2024-04-05T20:07:35Z

@pleprince I just added the updated version of c-blosc2 to my personal Bazel registry.
You can create a .bazelrc (at the root dir or the OpenEXR repo, next to MODULE.bazel file) and add the following content:

common --registry=https://raw.githubusercontent.com/Vertexwahn/bazel-registry/main/
common --registry=https://bcr.bazel.build

This way also my personal registry will be considered that has already bazel_dep(name = "c-blosc2", version = "2.12.0.bcr.2") available. Did not test it - but might be a working workaround until bazelbuild/bazel-central-registry#1773 is merged.

Vertexwahn · 2024-04-05T20:14:16Z

@pleprince Just tested on Ubunut 22.04:

diff --git a/.bazelversion b/.bazelversion
index a8907c02..21c8c7b4 100644
--- a/.bazelversion
+++ b/.bazelversion
@@ -1 +1 @@
-7.0.2
+7.1.1
diff --git a/MODULE.bazel b/MODULE.bazel
index 63d3f067..ba4e4e2d 100644
--- a/MODULE.bazel
+++ b/MODULE.bazel
@@ -10,4 +10,4 @@ bazel_dep(name = "bazel_skylib", version = "1.5.0")
 bazel_dep(name = "imath", version = "3.1.11")
 bazel_dep(name = "libdeflate", version = "1.19")
 bazel_dep(name = "platforms", version = "0.0.8")
-bazel_dep(name = "c-blosc2", version = "2.12.0.bcr.1")
+bazel_dep(name = "c-blosc2", version = "2.12.0.bcr.2")

bazel build //... && bazel test //... works for me with the above applied patches

pleprince · 2024-04-07T12:51:14Z

Thank you so much @Vertexwahn !

pleprince · 2024-04-08T14:52:05Z

@cary-ilm @peterhillman
The fuzz test is failing to build. It is compiling with clang++ but fails to link because ld doesn't seem to support DWARF-5 on that system image. clang 14+ now defaults to DWARF-5 and I wonder what's the best option here:

modify the compile flags to generate DWARF-4
link the fuzz binaries with lld instead of ld (pass -fuse-ld=lld to clang++). Not sure the disk image has lld installed though.
Thanks

@phaedon

AcademySoftwareFoundation/openexr#1604 revealed that the c-blosc2 module still has some linker issues. This PR fixes those issues. This PR is an improvement of the c-blosc2 model at version 2.12.0. In detail, missing c files were added to the c-blosc2 target. Also, textual_hdrs were reworked. Besides this cc_tests targets were introduced to make sure the whole thing works across different systems. The defines `HAVE_PLUGINS` and `HAVE_ZSTD` were added. Without the define `HAVE_PLUGINS` zfp plugin is not initialized (which is required by OpenEXR). @phaedon Maybe this fixes also some issues on your side ;). You can create a PR and add yourself as a maintainer to this module if you like. This way you will get notified if "improvements" are made or at least it gives you a chance to veto changes ;). Furthermore, the PR policy was changed in a way that a positive review of a maintainer can lead under some circumstances to a successful merge without the need to get a review of one of the "admins/repo owners"

Vertexwahn · 2024-04-16T10:59:18Z

@pleprince bazelbuild/bazel-central-registry#1773 is merged now. Use bazel_dep(name = "c-blosc2", version = "2.12.0.bcr.2") now! Should fix the Bazel build issues.

peterhillman · 2024-04-19T22:33:29Z

@pleprince I'm a little out of my depth here. I would have thought it was OK to change the fuzz test compile flags, since the purpose of fuzz testing is to verify the code correctness, not that it builds on all architectures. Does this issue mean that adding zstd adds an additional constraint on compiling debug builds in general, or is it just a quirk of the way that the oss-fuzz builds are configured?

clusty · 2024-04-19T23:01:30Z

@pleprince I'm a little out of my depth here. I would have thought it was OK to change the fuzz test compile flags, since the purpose of fuzz testing is to verify the code correctness, not that it builds on all architectures. Does this issue mean that adding zstd adds an additional constraint on compiling debug builds in general, or is it just a quirk of the way that the oss-fuzz builds are configured?

That error seems to pop up because for some reason blosc is compiled with DWAF5 format symbols, but the linker used for the fuzzer does not understand this ( clang/llvm forums seem to mention this error ). I have not encountered this issue locally, but then again I have a fairly recent toolchain.

What I don't understand is what makes blosc so special: I presume the fuzzer uses the same linker as the regular shared object EXR build chain ?

Signed-off-by: Philippe Leprince <[email protected]>

@phaedon

AcademySoftwareFoundation/openexr#1604 revealed that the c-blosc2 module still has some linker issues. This PR fixes those issues. This PR is an improvement of the c-blosc2 model at version 2.12.0. In detail, missing c files were added to the c-blosc2 target. Also, textual_hdrs were reworked. Besides this cc_tests targets were introduced to make sure the whole thing works across different systems. The defines `HAVE_PLUGINS` and `HAVE_ZSTD` were added. Without the define `HAVE_PLUGINS` zfp plugin is not initialized (which is required by OpenEXR). @phaedon Maybe this fixes also some issues on your side ;). You can create a PR and add yourself as a maintainer to this module if you like. This way you will get notified if "improvements" are made or at least it gives you a chance to veto changes ;). Furthermore, the PR policy was changed in a way that a positive review of a maintainer can lead under some circumstances to a successful merge without the need to get a review of one of the "admins/repo owners"

…penexr into main_clusty Signed-off-by: Philippe Leprince <[email protected]>

pleprince · 2024-06-05T14:48:26Z

Hello @kdt3rd and @cary-ilm
I looked into why OSS Fuzz is failing on this PR and it looks like I need to add libblosc2 as a lib to the clang++ invocation.
This seems to hardcoded in https://github.com/google/oss-fuzz/blob/master/projects/openexr/build.sh but I don't know how to proceed. It haven't found a way to, for example, create PR on oss-fuzz and reference it in my openexr PR.
Thanks for your time.

peterhillman · 2024-06-10T04:28:31Z

Perhaps it would be simplest to ignore the oss-fuzz test failure for now. Once this is all merged then we can do a PR to oss-fuzz to update the build.sh
It must be common that the various oss-fuzz projects need to update their build configurations to support newly added features, and would like that to happen without a build failure being reported, but I haven't found out how that's done.

peterhillman · 2024-06-17T00:55:04Z

src/lib/OpenEXRCore/internal_zstd.c

+    // clevel 9 is about a 20% increase in compression compared to 5.
+    // Decompression speed is unchanged.
+    int zstd_level;
+    exr_get_default_zstd_compression_level (&zstd_level);


Should this be getting the current compression level, not the default one? Changing the compression level doesn't seem to make a difference to the output size

peterhillman · 2024-06-17T06:38:59Z

src/lib/OpenEXRCore/internal_zstd.c

+    }
+
+    blosc2_cparams cparams = BLOSC2_CPARAMS_DEFAULTS;
+    int typeSize     = inSize % 4 == 0 ? 4 : 2;


If I follow correctly: if the total data size is a multiple of four, then this will pair together two halfs into a 4 byte 'type', but otherwise it could split floats into two 2 byte types. Is that a performance overhead?

A more pedantic approach might be to do multiple compression blocks, each containing a single data type, so blosc knows the exact size of each type. Perhaps each consecutive group of channels of the same type are compressed into a single group. (For deep, you also need to compute the total number of samples in the data, by looping through the channel list to compute the total size of the samples and dividing the data size by that)

It's been a while since I looked at that code.
There is a hard requirement inside Blosc, if I remember right, that the number of data points needs to be a multiple of number of typesize. I think this code prevented a crash when you had only 1 half channel.
So if you have an even number of half channels, they are treated in 4 byte chunks, where as if you have an even number of halfs everything is a treated as half.

At some point I implemented that channel sorting to have the correct type, but for some reason, the compression rate dropped: I had 1 blosc call per channel. I suspect that 1 scanline was not enough data if it is sliced for every channel (I did not do 1 blosc call per channel size type though)

From my tests, If there are only half float channels, then setting typeSize to 2 gives files 52.7% of uncompressed size, setting it to 4 gives 56.8%. For comparison, ZIPS gives 53.3%, so ZIPS performs better than ZSTD compression unless inSize is not a multiple of four. If there are full float channels in the mix and typeSize is 4, ZSTD performs better than ZIPS all round. All that suggests that it might be worth compressing channels with the correct typeSize. Fortunately, with the common set of channel names (A,B,G,R,Z,ZBack,id) it just happens that the alphanumeric sorting of the channel names groups together the 16 bit channels and the 32 bit channels.

@peterhillman your suggestion is to sort the channel data by size and potentially do 2 blosc calls ? ( when you have mixed float / half channels ).

the complication is that the pixel and the sample count tables are not available to the compressor for deep files. I have a branch that does a refactor to that effect but it was fairly large and probably full of bugs.

I was thinking you do mulitple blosc calls, at worst one per channel. If A and Z are stored as 32 bit float, and B,G,R as 16, you would do A by itself, then B,G,R together, and finallty Z, so you would have three blosc data chunks. That way you can compress channels together and don't need the overhead of rearranging the data.

It would be a minor refactor, yes, but just of the zstd code. You shouldn't need the sample count table, but you do need the ChannelList, which you can get from the header in the same way that other codecs do. You sum up the total size of all the channels and divide inSize by that to give you the number of samples. Then the first (numberofSamples*firstChannelSize) bytes are all the data for the first channel alphanumerically, then the data for the second channel, and so on through the block.

Makes sense…. I’d also need to detect when I am getting called with less data than expected ( if I remember right, first call to the compressor for the scan line is for the sample counts )

Oh good point. You would have to look at the dataWindow and compute the expected size of the sample count table to detect whether the data is sample count or pixel data, and it would be possible the pixel data coincidentally happens to be exactly that size. If that happened, then the data would be compressed as single 4 byte channel, even if it wasn't. That wouldn't do much harm, though, because the data size would have to be relatively small.

I suppose a minor extension to the API would be to pass an extra parameter to the compressors to indicate whether they are getting normal pixel data, sample counts, or deep data.

Presumably zstd compressed data would have some kind of header in it to indicate the number of blocks and the uncompressed size of each, so the decompresser would do the right thing regardless of what the compressor did (and would mean the decompressor doesn't need to look at the header at all - it just goes off the header in the compressed data block). That would mean the API can be changed later without breaking the zstd codec.

Just to make sure I got it right:
If the number of bytes I get is datawindow.x * 4 ( or whatever is the type for counts ), it means its a sample count compress call.

Yeah, the sample count table will always be (datawindow.max.x + 1 - datawindow.min.x)*sizeof(int) bytes for scanline images, and it will be (tile_area)*sizeof(int) bytes for tiled deep images. The actual pixel data will always be divisible by the sum of the channel sizes.

This code will need to be removed before merging. Signed-off-by: Philippe Leprince <[email protected]>

lji-ilm · 2024-08-24T20:41:47Z

While we all agree that zstd is a respectable modern compressor to add, there seems to be more questions regarding blosc2 and its parameter tuning for different data types in OpenEXR. I also think the performance benefit of blosc2's memory marshalling is somewhat related to the hardware architecture, and more exotic CPU<->Memory bridges/caches might not benefit as much, although I am still learning the details. It is also designed to be CPU only, although none of OpenEXR's compression types current can support GPU (but some of the simpler ones might be quite easy to port to a GPU implementation).

I'm wondering if anyone here have experimented simply introducing zstd by itself to OpenEXR but not the blosc2 memory marshaling wrapping? It can be its own compression type, or even simply just replace current libdeflate invocation if we just want to see some results. This could introduce a smaller PR/add less dependency to OpenEXR in this one step; and we can better quantify the benefit of the two technologies, blosc2 and zstd, separately.

clusty · 2024-08-25T00:46:21Z

While we all agree that zstd is a respectable modern compressor to add, there seems to be more questions regarding blosc2 and its parameter tuning for different data types in OpenEXR. I also think the performance benefit of blosc2's memory marshalling is somewhat related to the hardware architecture, and more exotic CPU<->Memory bridges/caches might not benefit as much, although I am still learning the details. It is also designed to be CPU only, although none of OpenEXR's compression types current can support GPU (but some of the simpler ones might be quite easy to port to a GPU implementation).

I'm wondering if anyone here have experimented simply introducing zstd by itself to OpenEXR but not the blosc2 memory marshaling wrapping? It can be its own compression type, or even simply just replace current libdeflate invocation if we just want to see some results. This could introduce a smaller PR/add less dependency to OpenEXR in this one step; and we can better quantify the benefit of the two technologies, blosc2 and zstd, separately.

I had an initial prototype where I integrated zstd directly, so definitely doable. I wanted to mention that a lot of the compression gain can be attributed to the proper data preprocessing: currently using only a byte shuffle provided by blosc. We are in the process of trying to improve on this with some actual knowledge of the data being compressed which hopefully will increase the performance for half pixels and maybe even more.

The main performance of blosc comes for specialized SIMD implementations, so you are right that exotic architectures will not benefit out of the box, but can always be implemented inside blosc if need be.

Once we are done with iterating with the algo, we’ll deal with the dependency issues raised by this PR raises.

lji-ilm · 2024-08-25T00:56:01Z

I had an initial prototype where I integrated zstd directly, so definitely doable. I wanted to mention that a lot of the compression gain can be attributed to the proper data preprocessing: currently using only a byte shuffle provided by blosc. We are in the process of trying to improve on this with some actual knowledge of the data being compressed which hopefully will increase the performance for half pixels and maybe even more.

The main performance of blosc comes for specialized SIMD implementations, so you are right that exotic architectures will not benefit out of the box, but can always be implemented inside blosc if need be.

Once we are done with iterating with the algo, we’ll deal with the dependency issues raised by this PR raises.

I agree that a proper pre-filtering of byte patterns would benefit compression, as we're also investigate this aspect on our side. However, currently my work focus on only RGB-half data and does not look at deep data at all. I believe this PR is much further down the road in terms of deep data compression, and its backward compatibility with half.

For deep data that is mostly float32 streams, I would like to mention the zfp project https://github.com/LLNL/zfp and the scientific publication that outlines its principles https://ieeexplore.ieee.org/document/6876024, but maybe you & co are already familiar with this line of work.

clusty · 2024-08-25T01:00:14Z

Blosc already has a zfp implementation. I was not able to test it properly before since I did not implemented pixel unpacking for tiles ( this is the current bit that we are debugging ). Once that is done, this experiment is very easy to perform in a few lines of code.

pleprince · 2024-08-25T07:30:50Z

FWIW, Zfp was my first idea. I experimented with it a bit but the compression rate was inferior to zstd on my deep datasets. It performed better in lossy mode but this not a fair comparison to zstd anymore and I didn't assess which level of loss is actually acceptable.
Might be worth revisiting with a larger data set though.

aras-p · 2024-09-25T09:10:25Z

In my tests a few years ago, ZFP in lossless mode also was not great on EXR data. Both compression ratio and performance were not impressive. https://aras-p.info/blog/2021/08/27/EXR-Filtering-and-ZFP/

TodicaIonut · 2024-10-13T06:00:30Z

all the codec implementations have moved from OpenEXR/Imf*Compressor to OpenEXRCore/internal_*? @pleprince @clusty

pleprince · 2024-10-13T07:30:12Z

Hi @TodicaIonut. I merged the latest release and a new batch of changes in a private branch. We will update this PR when it's ready.

TodicaIonut · 2024-10-16T07:24:10Z

I merged the latest release

commit a6ee9f0 Zstd Compressor ImfZstdCompressor.cpp, ImfZstdCompressor.h.
@pleprince

Zstd compressor (#3)

0689612

Co-authored-by: Philippe Leprince <[email protected]> Signed-off-by: Vlad Lazar <[email protected]> * better multy-type compression * Version the Stream --------- Signed-off-by: Vlad Lazar <[email protected]> Co-authored-by: Philippe Leprince <[email protected]>