SEP Framework

This is an official implementation for the paper: SEP: A General Lossless Compression Framework with Semantics Enhancement and Multi-Stream Pipelines. SEP is a deep-learning-based lossless compression framework, designed to improve both the compression speed and compression ratio. SEP framework stands out by transforming existing single-byte symbols into sub-sequence level patches, enabling a more efficient capture of compressed sequence correlations. At the same time, SEP uses GPU memory optimization and pipelines.

Key Designs

🌟 Semantics enhancement block: We propose a series of novel approaches to capture complex semantic information of adjacent byte sequences, which achieve higher compression ratios across diverse data types.

🌟 Multi-stream pipelines: We propose multi-stream pipeline mechanism for parallel compression. By hiding disk I/O and CPU-GPU data transfer, such as Host to Device(H2D) and Device to Host(D2H), we try to maximize the hardware utilization for reducing the compression time.

🌟 GPU Memory Optimization: We propose an innovative approach to reuse GPU memory across streams, while GPU memory is isolated for PyTorch streams. This approach achieves an average decrease of 36% GPU memory usage in multi-stream environments.

🌟 State-of-the-art(SOTA) compression performance: We conduct experiments to prove that our framework can increase the compression ratio of the networks by an average of 5% and increase the compression speed by 30%. Our pre-training models can further improve the average compression ratio to 7.6%.

Overall Architecture

The SEP framework with the workflow detail of the Semantics Enhancement Block (SEB). All types of uncompressed files are firstly converted to byte streams and sent to the SEB. The SEB then extracts the features of the byte stream and turns it into fusion patch with adaptive stride. By feeding the enhanced matrix $\mathbf{Z}$ into alternative networks, the probability distribution of the next byte is predicted. Finally, the byte stream with probability distribution is compressed by the arithmetic coder. The entire compression process is accelerated by the Multi-Stream Pipelines method.

Usage

Install Pytorch and necessary dependencies.

pip install -r requirements.txt

Download data. You can download all the datasets. Create a seperate folder ./data and put all the files in the directory.
Train and evaluate the model. All the scripts are in the directory ./SEP/Model/run.sh. For example, if you want to get the TRACE results for Enwik9 dataset, just run the following command, and you can open nohup.log to see the results once the training is done.

nohup ./run.sh &
tail -f nohup.log

Model Comparision

We use Compression Speed and Compression Ratio as model evaluation metrics. Overall, SEP model achieves the best performance in the vast majority of cases.

Pre-training Models

Available at: https://drive.google.com/file/d/18ltzjRFGDQrpu9Kz2zz28xQ9Obcnazpu/view?usp=sharing

Pytorch Memory Sharing

cd PytorchMemorySharing/sharing

cp CUDACachingAllocator.cpp /PyTorch/xxx

Main Results

To improve both compression ratio and compression speed of deep-learning-based lossless compressors, the existing research focus on different networks. The recent deep-learning-based lossless compressors are mainly built based on PyTorch. In Table, we present the various characteristics of several deep-learning-based compressors. Compressors like tensorflow-compress and DecMac use Long Short-Term Memory(LSTM) to capture long-term dependencies within the input data stream. Through the studies on different networks, the latest research suggests that the mechanisms of transformer and multi-layer perception (MLP) can achieve even more accurate estimation and also higher compression speed compared to other deep-learning-based lossless compressors. NNCP, TRACE, OREO and PAC are the typical compressors which use transformer or MLP. The most deep-learning-based lossless compressors train their networks with GPU to speed up the compression process. PAC reduces the traffic of CPU-GPU data transfer using software cache in GPU memory, which helps PAC compressing faster. The work on GPU memory optimization is mainly focused on offloading, recomputation and defragmentation, with no attention paid to multi-stream reusage.

We put all the original experiment logs and results in the "/XXXmodel/results" directory.

Appendix

View the Appendix: APPENDIX_sep.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
1PAC		1PAC
2SEP+PAC		2SEP+PAC
3TRACE		3TRACE
4SEP+TRACE		4SEP+TRACE
5PatchTST		5PatchTST
6SEP+PatchTST		6SEP+PatchTST
PytorchMemorySharing		PytorchMemorySharing
images		images
APPENDIX_sep.pdf		APPENDIX_sep.pdf
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SEP Framework

Key Designs

Overall Architecture

Usage

Model Comparision

Pre-training Models

Pytorch Memory Sharing

Main Results

Appendix

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

damonwan1/SEP

Folders and files

Latest commit

History

Repository files navigation

SEP Framework

Key Designs

Overall Architecture

Usage

Model Comparision

Pre-training Models

Pytorch Memory Sharing

Main Results

Appendix

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages