Cleanup of ZopfliFindLongestMatch -- gives ~8% overall performance boost #110

jthlim · 2016-04-30T09:02:57Z

Check jthlim@b5a1ea2 for a detailed discussion of each change next to the associated code.

Please help verify that all of the comments/statements are valid from your perspective too!

Note: the 8% improvement is compounded on top of the optimizations discussed before (having asserts off, integer cost comparison, -ffast-math, link time optimization, etc.)

Note: Testing compressing hundreds of files has verified so far that these changes to not affect the compression.

jthlim · 2016-04-30T18:11:05Z

Thanks for the feedback! would be really interesting to see GCC's generated code before and after for that function for ARM. I can't think why there'd be a regression.

Do you see similar results on x86/x64?

I assume this testing is with your modified fork rather than the verbatim zopfli codebase?

jthlim · 2016-05-01T06:16:38Z

I've reverted the order of comparisons change due to the regression that you've noticed.

re: size/space trade off.. A lot of those variables end up in registers in clang, so there's no extra memory tradeoff at all. I guess that'll be platform specific too, but I expect to see that in most cases.

And perhaps gcc is just better at removing the unnecessary masking when appropriate than clang is.

…h some data sets

lorents17 · 2016-05-01T18:03:38Z

enwik8 --i15

Zopfli (original)	Zopfli (jthlim)
34 967 036 byte	34 967 390 byte
6m19.762s	6m1.886s

ECT

Compressor	File Size	Time
ECT -2	35 357 940 byte	0m24.922s
ECT -3	35 018 152 byte	0m30.350s
ECT -4	34 967 607 byte	0m37.274s
ECT -5	34 963 304 byte	0m58.680s
ECT -6	34 942 395 byte	1m19.929s
ECT -7	34 939 105 byte	2m1.326s
ECT -8	34 938 139 byte	5m59.643s

jthlim · 2016-05-02T02:46:58Z

Thanks @lorents17 for the info! What compiler/platform were you testing with?

I see that you've measured a ~5% speedup.. but the difference in file size is interesting to me.

Did your original zopfli results include the changes from 37f6da6 -- specifically katajainen.c?

Otherwise I need to re-examing the changes to see why there could be a change.

Thanks!

lorents17 · 2016-05-02T08:30:19Z

OS - Windows 10
compiler - GCC 5.3

I used original zopfli with the latest changes from Apr 25, 2016

Interests me more as the ECT project could squeeze so quickly?

jthlim · 2016-05-02T09:09:55Z

I've tried before and after my changes on OSX/Apple LLVM v 7.3 and I'm getting a consistent 34967390 bytes in each case. I'll try and setup a system with gcc later to test further.

as to the performance:
A quick look at https://github.com/fhanau/Efficient-Compression-Tool/blob/master/src/zopfli/zopfli_gzip.cpp#L259 seems to imply that there are modifications there to add multithreading support (this fork of zopfli is single threaded). At a guess, it looks like he's basing it on https://github.com/MrKrzYch00/zopfli

fhanau · 2016-05-02T15:45:31Z

ECT (which I am the author of) is much faster than zopfli at similar compression, even without multithreading. Multithreading is off by default as it causes compression losses and needs more processing power per byte.
It achives its speeed improvements by rewriting large parts and replacing slow algorithms(like the current hash-chain based ZopfliFindLongestMatch) with faster ones(like the Binary tree based new one).
The only part of ECT that is based on MrKrzYch00/zopfli is the ZIP support.

jthlim · 2016-05-02T15:56:37Z

@fhanau Thanks for the info and clarifications! Are there any downsides to importing your binary tree based code into the mainline?

fhanau · 2016-05-02T17:13:38Z

Unfortunately, yes:

The binary tree match finder has a very low cost for finding matches as doing so is almost the same as updating the hash. However, updating the hash is very expensive which is why BTMF is only faster when using optimal parsing. I use a match finder based on LZ4 when lazy parsing is used.
BTMF is not compatible w/ the current matches cache.
As BTMF needs much time to update the hash and this is done at the start of each block after the first block to be able to search at posititions in the previous blocks, BTMF needs much time before compression even begins. This can be solved with copying the match finder struct of the previous block.

But you're right, the changes from ECT should be backported.

MrKrzYch00 · 2016-05-13T15:02:15Z

Most of changes don't make any difference on ARMv7 when GCC is used, there is no way for me to efficiently run x86 tests as I run Fedora in VirtualBox (though it now supports AVX instructions set and stuff but the CPU is not "as free" as on Odroid U3). Of course ARM has a lot of registers so that may be the reason. Only the loop part change did a very little speed up, any variable type or order changing (here as well as with previous changes) did not make any difference or were actually slower. However, the differences are so small they could as well fall into a margin of error difference. As far as I'm concerned, GCC 4.8 is faster than GCC 5.3 which is faster than Clang 3.6. Or so it did score on ARMv7, would need to recompare it for original Zopfli again to confirm that Clang is still slower.

Even with my MrKrzYch00@7aa51e1 I don't see any difference in speed. :)

ATM the only thing I'm concerned about in case of Zopfli is why do I get:

*** Error in `zopfli/zopfli': double free or corruption (!prev): 0xb4550038 ***
Aborted

which only occurs on my Odroid U3 (both of them) when freeing ~~heap (randomly) on long runs and (not 100% sure) when lazy matching is used... [can't figure it out so far, with all the debugging included]~~ pointer structure members on not properly aligned structures. That was also the reason I invented restore points to resume compression and by this could confirm that the code ~~should be~~ is right and the ~~Odroid U3 (outdated) kernel or its libs are causing this (or some alignment or small out-of-bounds going on somewhere)...~~ GCC switches need to be altered (either 32-bit aligning or enable unaligned access) which I'm currently testing out.

Cleanup of ZopfliFindLongestMatch -- gives ~8% overall performance boost

b5a1ea2

Revert order of comparisons change due to this causing regression wit…

8ffda7e

…h some data sets

jthlim force-pushed the master branch from 80cacaa to 8ffda7e Compare May 1, 2016 06:33

Simplification of initial dist setup too

84114af

fhanau mentioned this pull request Jul 20, 2016

Better Compression Using ECT ImageOptim/ImageOptim#151

Open

andrews05 mentioned this pull request Oct 24, 2022

Performance improvement ideas zopfli-rs/zopfli#12

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup of ZopfliFindLongestMatch -- gives ~8% overall performance boost #110

Cleanup of ZopfliFindLongestMatch -- gives ~8% overall performance boost #110

jthlim commented Apr 30, 2016 •

edited

Loading

jthlim commented Apr 30, 2016 •

edited

Loading

jthlim commented May 1, 2016 •

edited

Loading

lorents17 commented May 1, 2016 •

edited

Loading

jthlim commented May 2, 2016

lorents17 commented May 2, 2016 •

edited

Loading

jthlim commented May 2, 2016 •

edited

Loading

fhanau commented May 2, 2016

jthlim commented May 2, 2016 •

edited

Loading

fhanau commented May 2, 2016

MrKrzYch00 commented May 13, 2016 •

edited

Loading

Cleanup of ZopfliFindLongestMatch -- gives ~8% overall performance boost #110

Are you sure you want to change the base?

Cleanup of ZopfliFindLongestMatch -- gives ~8% overall performance boost #110

Conversation

jthlim commented Apr 30, 2016 • edited Loading

jthlim commented Apr 30, 2016 • edited Loading

jthlim commented May 1, 2016 • edited Loading

lorents17 commented May 1, 2016 • edited Loading

jthlim commented May 2, 2016

lorents17 commented May 2, 2016 • edited Loading

jthlim commented May 2, 2016 • edited Loading

fhanau commented May 2, 2016

jthlim commented May 2, 2016 • edited Loading

fhanau commented May 2, 2016

MrKrzYch00 commented May 13, 2016 • edited Loading

jthlim commented Apr 30, 2016 •

edited

Loading

jthlim commented Apr 30, 2016 •

edited

Loading

jthlim commented May 1, 2016 •

edited

Loading

lorents17 commented May 1, 2016 •

edited

Loading

lorents17 commented May 2, 2016 •

edited

Loading

jthlim commented May 2, 2016 •

edited

Loading

jthlim commented May 2, 2016 •

edited

Loading

MrKrzYch00 commented May 13, 2016 •

edited

Loading