From e5c2aaaaa63a9a627bcc4630ae78e3911ae6b4aa Mon Sep 17 00:00:00 2001
From: Daniel Bevenius <daniel.bevenius@gmail.com>
Date: Tue, 27 Aug 2024 15:06:07 +0200
Subject: [PATCH] docs: move tokenization links to the top of README

---
 notes/tokenization/README.md | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/notes/tokenization/README.md b/notes/tokenization/README.md
index 9138815c..d29f9c3d 100644
--- a/notes/tokenization/README.md
+++ b/notes/tokenization/README.md
@@ -7,6 +7,15 @@ model. There might be a configuration file in addition to this that specifies
 the type of tokenizing that the model uses, like Byte-Pair Encoding (BPE),
 WordPiece, SentencePiece, or Unigram, etc.
 
+#### Tokenization notes
+The following notes are individual walkthroughs of the tokenization process for
+different tokenization types in llama.cpp:
+
+* [Byte Pair Encoding (BPE)](./bpe.md)
+* [WordPiece](./wordpiece.md) TODO
+* [SentencePiece](./sentencepiece.md)
+* [Unigram](./unigram.md) TODO
+
 ### Tokenization in llama.cpp
 Llama.cpp supports the following types of tokenization:
 ```c
@@ -295,11 +304,3 @@ $56 = std::forward_list = {
 [1] = {type = FRAGMENT_BUFFER_VARIANT_TYPE_RAW_TEXT, token = -1, _dummy = "", raw_text = "<s>What is LoRA?</s>", offset = 3, length = 13},
 [2] = {type = FRAGMENT_BUFFER_VARIANT_TYPE_TOKEN, token = 2, _dummy = "", raw_text = "", offset = 0, length = 0}}
 ```
-#### Tokenization notes
-The following notes are individual walkthroughs of the tokenization process for
-different tokenization types in llama.cpp:
-
-* [Byte Pair Encoding (BPE)](./bpe.md)
-* [WordPiece](./wordpiece.md) TODO
-* [SentencePiece](./sentencepiece.md)
-* [Unigram](./unigram.md) TODO