Add LUKVPress 🤖🤖🤖#236
Conversation
maxjeblick
left a comment
There was a problem hiding this comment.
Thanks a lot for submitting the PR of LUKVPress!
I've done an initial round of review on the code, please find attached a proposed refactoring of the code here.
- Added eager-mode guard as masking is silently ignored under eager
- Uniform fallback is removed in favor of explicit failure
- BUDGET_CURVE_URLS is now a dict, similar to other compression methods -> it allows to add more methods which are then looked up
- Breaking change: In the refactor I don't enforce strict equality of expected attention press params. I'm open to dicussion here. The reason here is to 1. remove code that may be fragile when supporting more methods. 2. allow for experimentation under slight disagrement of the parameters. Another option would be to add a
required_paramsfieild to the budget curve dictionary and compare values. - As in DuoAttention, we don't save the npy files to disc. Adding a global cache dir mechanism could be useful in the future.
- load_budget_curve has been collapsed to a single method.
- Additional refactors using codex
Please review the attached code, happy to discuss breaking changes (e.g. removal of params checks).
As for merging this PR, please also include either code to create additional budget_curves, or add instructions for it into the press' docstring, so it is possible to etend to more methods/llms.
This is the proposed refactoring of the code. |
Signed-off-by: tangziyao <672208690@qq.com>
|
/ok to test molanyu@4517790 |
|
Hi! Please also merge latest main into your branch, we fixed an error w.r.t. github runner, causing failing tests. |
Signed-off-by: tangziyao <672208690@qq.com>
Signed-off-by: tangziyao <672208690@qq.com>
Signed-off-by: tangziyao <672208690@qq.com>
|
/ok to test f457238 |
maxjeblick
left a comment
There was a problem hiding this comment.
Thanks a lot for the contribution, LGTM!
PR description
This PR adds a new KV cache compression method, LUKV:
arxiv: https://arxiv.org/abs/2602.08585
code: https://github.com/baidu-baige/LU-KV
Checklist
Before submitting a PR, please make sure:
Tests are working (
make test)Code is formatted correctly (
make style, on errors try fix withmake format)Copyright header is included
All commits are signed-off using
git commit -s(new press)
mypress_press.pyis in thepressesdirectory(new press)
MyPressis in__init__.py(new press)
README.mdis updated with a 1 liner about the new press in the Available presses section(new press) New press is in the
default_presseslist intests/default_presses.py(new press) A docstring is provided that follows the same structure as the existing ones