Skip to content

Inconsistent output when using ZlibEncoder vs. ZstdEncoder for PDF Compression #446

@unkcpz

Description

@unkcpz

I'm encountering an issue where compressing a PDF file using ZlibEncoder results in a different digest value compared to using ZstdEncoder. Interestingly, ZstdEncoder behaves as expected, producing the correct hash.
The goal of my code is to compress the file and at the same time compute the digest sha256 of the whole stream. I did this by wrap the ZlibEncoder into a HashWriter which uses ring::digest to get the digest of whole stream.

Code snippet

    let (bytes_read, hash_hex, compressed) =
        match (compression, rmaker.maybe_content_format()) {
            (Compression::Zlib(level), Ok(MaybeContentFormat::MaybeLargeText)) => {
                dbg!("zlib");
                let mut writer =
                    ZlibEncoder::new(&mut cwp, flate2::Compression::new(*level));
                let mut hwriter = HashWriter::new(&mut writer, &digest::SHA256);
                let bytes_copied = copy_by_chunk(&mut stream, &mut hwriter, chunk_size)?;

                let hash = hwriter.finish();
                let hash_hex = hex::encode(hash);

                dbg!(bytes_copied);
                (bytes_copied, hash_hex, true)
            }
            (Compression::Zstd(lv), Ok(MaybeContentFormat::MaybeLargeText)) => {
                dbg!("zstd");
                let mut writer = ZstdEncoder::new(&mut cwp, *lv)?;
                let mut hwriter = HashWriter::new(&mut writer, &digest::SHA256);
                let bytes_copied = copy_by_chunk(&mut stream, &mut hwriter, chunk_size)?;

                let hash = hwriter.finish();
                let hash_hex = hex::encode(hash);

                (bytes_copied, hash_hex, true)
            }
            _ => {
                let mut hwriter = HashWriter::new(&mut cwp, &digest::SHA256);
                let bytes_copied = copy_by_chunk(&mut stream, &mut hwriter, chunk_size)?;
                let hash = hwriter.ctx.finish();
                let hash_hex = hex::encode(hash);

                (bytes_copied, hash_hex, false)
            }
        };

Could you help identify why ZlibEncoder is producing a different hash_hex compared to ZstdEncoder when compressing PDF files?

Let me know if I need provide more detail information such as HashWriter and copy_by_chunk, I didn't paste here in case the issue is too long.

I was guessing the issue might because I didn't call .finish() to the compress writer. But since I am new to rust, I just don't know how to test it. I put writer.finish() before compute the hash but the borrow checker complains that it is borrowed.

environment

"flate2 = 1.0.31"

EDIT: I tried with GzEncoder and it give same unexpected hash_hex as ZlibEncoder.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions