Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blosc 1.7.0 compression uses more than uncompressed_size + BLOSC_MAX_OVERHEAD #159

Open
FlorianLuetticke opened this issue Feb 28, 2016 · 8 comments

Comments

@FlorianLuetticke
Copy link

I have observed the following, but I am unsure, if this can be called an Issue.

Using a comp_buffer_size much larger than uncompressed_size and executing

int compressedSize = blosc_compress(compress_level,shuffel,block_size,
uncompressed_size,uncompressed_buffer,
comp_buffer,comp_buffer_size);

the compressedSize can be larger than uncompressed_size + BLOSC_MAX_OVERHEAD, but will always be smaller than comp_buffer_size. For me, this was not clear from the documentation, I had the assumption, that compressedSize <= uncompressed_size + BLOSC_MAX_OVERHEAD would always hold true.

This can be cured by working with

int compressedSize = blosc_compress(compress_level,shuffel,block_size,
uncompressed_size,uncompressed_buffer,
comp_buffer,uncompressed_size + BLOSC_MAX_OVERHEAD );

Is there a performancechange between the two? Is this expected behavior?

@FrancescAlted
Copy link
Member

Well, I think what you describe is completely compatible with the docstrings for blosc_compress():

The dest buffer must have at least the size of destsize. Blosc guarantees that if you set destsize to, at least, (nbytes+BLOSC_MAX_OVERHEAD), the compression will always succeed.

but I agree that guaranteeing compressedSize <= uncompressed_size + BLOSC_MAX_OVERHEAD would be a good thing. A pull request on this is welcome.

@FrancescAlted
Copy link
Member

Reviewing this, I think I did not understand well the question. In fact, Blosc ensures that destsize <= nbytes + BLOSC_MAX_OVERHEAD. Do you have a use case breaking this rule? If so, please attach it here.

@esc
Copy link
Member

esc commented Dec 1, 2018

@FlorianLuetticke any updates?

@FlorianLuetticke
Copy link
Author

I think the behaviour was due to an error in my code.
I selected a type_size which did not divide the uncompressed buffer size cleanly. (Example: Buffer of 100 byte, type_size of 8).

In this case, destsize <= nbytes + BLOSC_MAX_OVERHEAD did not hold true.

@esc
Copy link
Member

esc commented Dec 2, 2018

Interesting. This would indicate an API breakage. Blosc is supposed to guarantee, the data doesn't get bigger during compression and the above inequality should always hold true. Is there any chance you could put together a minimal example to illustrate your case?

@davidhcefx
Copy link

davidhcefx commented Aug 2, 2024

I encountered the same issue, except that my type_size == 1.
When I provide destsize generously, it used up more spaces than expected, resulted in it running out of space:

total_size = /* sum_of_each_block */
total_size += (nb_block * BLOSC_MAX_OVERHEAD);
...
for (i = 0; i < nb_block; i++) {
    int r = blosc_compress_ctx(5, BLOSC_SHUFFLE, 1, blocksize[i], blockbuf[i],
            &dst[off], total_size - off, BLOSC_LZ4_COMPNAME, 0, 1);
    /* eventually r == 0 */
    off += r;
}

But when I give destsize more harshly, it didn't run out of space:

int r = blosc_compress_ctx(5, BLOSC_SHUFFLE, 1, blocksize[i], blockbuf[i],
        &dst[off], blocksize[i] + BLOSC_MAX_OVERHEAD, BLOSC_LZ4_COMPNAME, 0, 1);
 /* always r != 0 */

@FrancescAlted
Copy link
Member

Could you provide a minimal, self-contained example of this so that we can check this out?

@davidhcefx
Copy link

davidhcefx commented Aug 2, 2024

OK, I've managed to reduce to a minimal test case

#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#include "/usr/local/include/blosc.h"

int main()
{
    char *src = "\xe0\x42\xcb\xfd\x23\x9b\xe1\x06\x50\x8d\x13\x10\x4c\xbf\xdd\xd2\x50\xdb\x87\x92\x42\x3e\xa2\xf1\x53\xd8\x43\x3c\x28\xb7\x78\x09\xfa"
                "\x43\x06\x1d\xde\xe8\x23\x2e\x75\x36\x3e\xc1\xf5\x1c\x93\x46\xf7\x1b\xd8\x39\x59\x7a\x2a\xad\x52\x6d\xe9\x7b\x25\x61\x84\x1f\xa4\x8a"
                "\x3c\x82\x72\x5f\xb0\xe8\x96\xef\xa9\x8b\x0b\x3d\xd1\x02\x58\xaa\x3b\xb1\x24\x65\x5e\x77\xd2\x47\xf2\xf7\xa8\x76\x16\x4c\x00\x52\xce"
                "\x73\xb2\x7f\x5b\x48\x6e\x04\xd3\x79\x41\xa5\x7b\x99\x4f\xb6\x4b\x73\x1b\xa9\xea\xed\xf1\xdc\xe5\x99\x52\xfb\xe6\x53\x4e\xb4";
    int src_size = 130;
    int total_len = src_size + BLOSC_MAX_OVERHEAD + 100;
    int i;
    uint8_t dst[1024];
    int r = blosc_compress_ctx(5,  // clevel
            BLOSC_SHUFFLE,         // doshuffle
            1,                     // typesize
            src_size,              // nbytes
            src,                   // src
            dst,                   // dest
            total_len,             // <-- try change to src_size + BLOSC_MAX_OVERHEAD
            BLOSC_LZ4_COMPNAME,    // compressor
            0,                     // blocksize, 0: automatic blocksize
            1                      // numinternalthreads
    );
    printf("%d => %d (%d)\n", src_size, r, r - src_size);

    return 0;
}

It compressed 130 bytes to 154 bytes, which is obviously greater than nbytes + BLOSC_MAX_OVERHEAD. If changed line 24 to a more harsh version, then it would not exceed nbytes + BLOSC_MAX_OVERHEAD. Although technically it is not violating the description among the docstrings, this behavior surprised us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants
@esc @FrancescAlted @FlorianLuetticke @davidhcefx and others