Hi! After reading your paper, we think PruLong would be a great addition to the NVIDIA/KVPress library! KVPress is a library to easily implement and benchmark KV Cache compression methods. It allows standardized benchmarking and helps broader community access and adoption. We've already received contributions for many recent works (including DuoAttention), and would be happy to receive yours as well 🙂
As you might be interested in contributing an implementation, you can take a look at this notebook that shows some ways to implement a new compression method.
Please feel free to open a PR when convenient, and don't hesitate to reach out for any help or guidance!
Hi! After reading your paper, we think PruLong would be a great addition to the NVIDIA/KVPress library! KVPress is a library to easily implement and benchmark KV Cache compression methods. It allows standardized benchmarking and helps broader community access and adoption. We've already received contributions for many recent works (including DuoAttention), and would be happy to receive yours as well 🙂
As you might be interested in contributing an implementation, you can take a look at this notebook that shows some ways to implement a new compression method.
Please feel free to open a PR when convenient, and don't hesitate to reach out for any help or guidance!