Reuse KV cache of prefixes #484

tohtana · 2024-05-27T20:27:35Z

This PR implements reusing of KV cache across multiple requests. You can set enable_prefix_cache to True in RaggedInferenceEngineConfig to enable this feature.

config = RaggedInferenceEngineConfig(enable_prefix_cache=True)

This feature keeps KV cache blocks as long as we have free space. When a new request has a prefix that matches the KV cache blocks, FastGen reuses them. The blocks can also be reused by multiple requests. This drastically reuses the computation for prompt and memory usages for KV cache when many requests have common prefixes.
Note that looking up the cache has some overhead. You can disable this feature when prompts don't have much overlap.

Here is a benchmark result using this feature. We used prompts that have the same prefix when using this feature.

When the prompts are short and generation are long, the benefit will be smaller.

tohtana added 6 commits April 23, 2024 01:45

reuse kv cache prefix

8c90e4d

fix argument

18da144

fix alloc arg

08f579d

add option to prefix cache

cfb9b3a

flush the last request

7444160

add debug func

42a0386

tohtana requested review from mrwyattii and awan-10 as code owners May 27, 2024 20:27

tohtana and others added 2 commits May 27, 2024 20:30

remove debug function

b94bc7a

Merge branch 'main' into tohtana/cache_prefix

05a9bb8

tohtana changed the title ~~Tohtana/cache prefix~~ Reuse KV cache of prefix May 27, 2024

tohtana changed the title ~~Reuse KV cache of prefix~~ Reuse KV cache of prefixes May 27, 2024

tohtana marked this pull request as draft May 27, 2024 20:39

tohtana mentioned this pull request May 27, 2024

Reuse KV cache of prefixes microsoft/DeepSpeed#5572

Closed

tohtana added 3 commits May 30, 2024 02:41

save prefix cache at every iteration

414f696

fix token count for prefix cache

263bf21

fix prefix cache target

584407c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse KV cache of prefixes #484

Reuse KV cache of prefixes #484

tohtana commented May 27, 2024 •

edited

Loading

Reuse KV cache of prefixes #484

Are you sure you want to change the base?

Reuse KV cache of prefixes #484

Conversation

tohtana commented May 27, 2024 • edited Loading

tohtana commented May 27, 2024 •

edited

Loading