Added HHCache class implementing H2O Cache #31623

belericant · 2024-06-26T02:02:04Z

What does this PR do?

This PR adds the feature requested in #30758. The HHCache class is almost directly taken from the original H2O paper's authors code found here. Currently the PR only adds the changes required to Llama model class. As of now I have taken @gante 's suggestion of adding Cache.post_process() and calling it within LlamaAttention.forward.

To-Do

I'm not sure if the logic for RoPE rerotation is 100% correct. I think the recent tokens are correct, but not the hh tokens after eviction. Would love to have another set of eyes on that.
Write tests to ensure that this HHCache class has the same behavior compared to the original code by paper authors.
Benchmarking(?)

Feedback and/or help would be appreciated. Thanks!

amyeroberts · 2024-06-26T10:57:39Z

cc @gante @ArthurZucker

belericant added 2 commits June 25, 2024 18:43

add hh cache class

e9bf7cb

llama model support for HHCache

2420148

amyeroberts added Cache labels Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added HHCache class implementing H2O Cache #31623

Added HHCache class implementing H2O Cache #31623

belericant commented Jun 26, 2024

amyeroberts commented Jun 26, 2024

Added HHCache class implementing H2O Cache #31623

Are you sure you want to change the base?

Added HHCache class implementing H2O Cache #31623

Conversation

belericant commented Jun 26, 2024

What does this PR do?

To-Do

amyeroberts commented Jun 26, 2024