Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support resumable decompression #41

Open
phord opened this issue Mar 31, 2023 · 3 comments
Open

Support resumable decompression #41

phord opened this issue Mar 31, 2023 · 3 comments

Comments

@phord
Copy link
Contributor

phord commented Mar 31, 2023

I want to be able to decode all or part of a file and then rewind (Seek) to some earlier point in the file and re-decode it. This would enable the file to be read as random-access, once the initial read was done.

Doing this requires being able to save the state of some earlier decode point. I think I can clone the FrameDecoder (once I make it and its children Cloneable) and use this and the source file position to restore my read to some earlier point.

Supporting this mode will make it possible to support something almost like zstd-seekable without first having to compress the file in a multi-frame format.

libz has support for this; it's seldom used but it comes in handy when needed.

I can work on this some myself but I could use some guidance. For example, I suspect I don't need to clone the entire FrameDecoder struct to make this work; I don't think I need all the buffers it holds. I may only need the relevant context information (dictionaries, bit positions, etc.) and some feeder data.

I wouldn't mind only saving state at the end of a block, but I'm not sure if that's advantageous.

@phord
Copy link
Contributor Author

phord commented Mar 31, 2023

I'm implementing a reader that will read multi-frame zstd files with similar bookmarking, but that one is rather straightfoward.

@KillingSpark
Copy link
Owner

KillingSpark commented Mar 31, 2023

I suspect I don't need to clone the entire FrameDecoder struct to make this work; I don't think I need all the buffers it holds.

I think you do need to save pretty much everything. For example blocks can specify to use the same FSE tables that the previous block used. So to restart at a random block you need to know the state the decoder was in before starting to decode the block.

Also the DecodeBuffer needs to be cloned because decoding uses previously decoded data. This will require a manual implementation of clone for the RingBuffer struct which uses raw allocations.

@KillingSpark KillingSpark mentioned this issue Mar 31, 2023
1 task
@phord
Copy link
Contributor Author

phord commented Oct 24, 2024

This will require a manual implementation of clone for the RingBuffer struct which uses raw allocations.

That's surprising, because of my unit tests. But I definitely haven't tried this on a larger scale, yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants