Further optimizations #25

KillingSpark · 2022-05-27T15:32:41Z

I need a place to put down some ideas for further optimizing this crate:

We only need to call reserve once for each block of sequences. We can calculate how many bytes will be added to the decode buffer by a list of sequences. This might save some re-allocations.
The way the zstd_streaming binary works is not optimal. It should just use the drain_to_writer() functions instead of reading into an intermediary buffer. That's why we have these functions.
Read https://fgiesen.wordpress.com/2018/02/19/reading-bits-in-far-too-many-ways-part-1/ and https://fgiesen.wordpress.com/2018/02/20/reading-bits-in-far-too-many-ways-part-2/ again carefully and optimize the bitreaders further
The ReversedBitreader performance can be enhanced quite a bit by being less useful in the generic case. Just returning wrong values for requests of >56 bits eliminates the need for error handling on calls to the get_bits_(triple) started in don't return errors on too large requests on a reversed bitreader #58
The RingBuffer::extend_from_within does a lot of small memcpy calls. These can be sped up a lot by not caring about precise copying of values behind the range we want to copy. Copying a/multiple u128 (where possible) speeds this up by a lot.

The text was updated successfully, but these errors were encountered:

Provide feedback