Skip to content

Commit

Permalink
docs: add table of contents to llama-kv-cache.md
Browse files Browse the repository at this point in the history
  • Loading branch information
danbev committed Oct 30, 2024
1 parent 333b7a2 commit bff4287
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion notes/llama-kv-cache.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@ I've gone through the theory of Key-Value caching in the transformer architectur
in [llama.md](llama.md). This document is a more detailed look at the
implementation of the key-value cache in the llama.cpp codebase.

## Table of Contents
- [KV-Cache at inference time](#inference-with-kv-cache)
- [llama_kv_cache details](#llama_kv_cache)

### Inference with KV-Cache
Lets set a break point before `llama_decode` and see how this interacts with
the kv-cache.
Expand Down Expand Up @@ -1086,7 +1090,7 @@ the key and values cache will use 6 when calculating the offset to store the
roped k and value cache entried for the next token.


### `kv_self`
### llama_kv_cache
A `llama_context` contains a member named `kv_self` (self as in self attention)
which is of type `llama_kv_cache`. This struct is defined in `llama.cpp`:
```c++
Expand Down

0 comments on commit bff4287

Please sign in to comment.