docs: fix typo in transformers.md

danbev · Dec 10, 2024 · ca0a6fd · ca0a6fd
1 parent b725a5e
commit ca0a6fd
Showing 1 changed file with 5 additions and 1 deletion.
diff --git a/notes/architectures/transformers.md b/notes/architectures/transformers.md
@@ -183,7 +183,7 @@ Standard attention uses 3 martixes, a query matrix, a key matrix, and a value
 matrix. 
 
 Let's start with the following input sentence "Dan loves icecream". The first
-step it split this into tokens, so we will have might get 4 tokens:
+step is to split this into tokens, so we will have might get 4 tokens:
 ```
 ["Dan", "loves", "ice", "cream"]
 ```
@@ -204,6 +204,10 @@ be used for each occurance. So there is currently no context or association
 between these words/token embeddings. They only contain information about each
 word/token itself, and nothing about the context in which it appears.
 
+This mapping can happen using somethin like `ggml_get_rows` which uses the a
+tensor that contains the embeddings for each token, and an index tensor which
+contains the token ids. The index tensor is used to index into the embeddings.
+
 So with these embeddings the first thing in the model does is to add a
 positional encoding to each of the embeddings. In the original paper this used
 absolute position encoding. I've written about this is