Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current xopen compression levels do not scale gradually. #136

Closed
rhpvorderman opened this issue Dec 26, 2023 · 1 comment
Closed

Current xopen compression levels do not scale gradually. #136

rhpvorderman opened this issue Dec 26, 2023 · 1 comment

Comments

@rhpvorderman
Copy link
Collaborator

Xopen uses a mixture of zlib, isal and zlib-ng. The current default is to prefer isal, then zlib-ng and then zlib.

Below are a few benchmarks on an illumina fastq file. (Can be compressed quite well.)

Compression ratios (relative to original):

compression level zlib isa-l zlib-ng
0 100.01% 25.28% 100.01%
1 23.88% 22.49% 34.99%
2 22.88% 22.47% 22.92%
3 21.66% 22.60% 21.32%
4 21.47% 20.44%
5 20.76% 19.93%
6 19.72% 19.27%
7 19.28% 19.19%
8 19.03% 19.01%
9 18.93% 18.83%

Compression times (seconds)

compression level zlib isa-l zlib-ng
0 1.29 2.89 1.15
1 11.46 2.78 4.61
2 12.92 2.87 8.28
3 22.12 6.77 11.57
4 18.69 17.40
5 35.37 21.68
6 92.65 37.62
7 168.00 112.92
8 241.37 143.97
9 327.82 208.06

From this I take it is a good thing to prefer python-isal. On the lowest compression levels, it provides the best performance as well as the best compression. However this results in the following weird behaviours:

  • If compression level is 0 and isal is available, the file will be compressed rather than uncompressed.
  • Levels 1,2,3 are virtually indistuingishable in filesize
  • Level 3 is significantly slower whilst not differing significantly in filesize. This is because it uses avx-512, but on a non-avx512 processor, this leads to slower results. Level 2 uses avx2 by the way.

I propose that only levels 1 and 2 are forwarded to isal. Levels 3-9 should be done by zlib-ng. Level 0 should be the uncompressed block format that zlib gives by default. This way using xopen across multiple systems with different available libraries gives a relatively consistent compression experience.

@rhpvorderman
Copy link
Collaborator Author

Was fixed with #142

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant