Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added HHCache class implementing H2O Cache #31623

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

belericant
Copy link

What does this PR do?

This PR adds the feature requested in #30758. The HHCache class is almost directly taken from the original H2O paper's authors code found here. Currently the PR only adds the changes required to Llama model class. As of now I have taken @gante 's suggestion of adding Cache.post_process() and calling it within LlamaAttention.forward.

To-Do

  1. I'm not sure if the logic for RoPE rerotation is 100% correct. I think the recent tokens are correct, but not the hh tokens after eviction. Would love to have another set of eyes on that.
  2. Write tests to ensure that this HHCache class has the same behavior compared to the original code by paper authors.
  3. Benchmarking(?)

Feedback and/or help would be appreciated. Thanks!

@amyeroberts
Copy link
Collaborator

cc @gante @ArthurZucker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants