-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama : support StableLM 2 1.6B #5052
Conversation
* convert : fix Qwen's set_vocab wrongly naming all special tokens [PAD{id}] * convert : refactor Qwen's set_vocab to use it for StableLM 2 too * nix : add tiktoken to llama-python-extra
great work! 🙌 worked great for me, was able to generate a full suite of k-quants + 8_0 & fp16, on huggingface! fp16 conversion output
ran conversions on colab Separately, does it make sense to add I’m not sure what the typical approach is for model-specific dependencies like this, but it would seem if this is a new requirement for model conversion, perhaps it should be declared here. Or maybe a new file like persimmon? thanks again! |
I think dependencies should only be added to requirements.txt if they are unconditionally required - conditional requirements should simply throw a clear exception if they are needed but not found. And persimmon is only a separate file because it wasn't working when the convert scripts were merged; new code should go in convert-hf-to-gguf.py. |
…zer loader It's a less arbitrary heuristic than the vocab size.
Agreed, and it already throws an helpful exception when And to be clear, I added the Running I assume the |
Yes, that is why it exists. Originally |
* llama : support StableLM 2 1.6B * convert : fix Qwen's set_vocab wrongly naming all special tokens [PAD{id}] * convert : refactor Qwen's set_vocab to use it for StableLM 2 too * nix : add tiktoken to llama-python-extra * convert : use presence of tokenizer.json to determine StableLM tokenizer loader It's a less arbitrary heuristic than the vocab size.
* llama : support StableLM 2 1.6B * convert : fix Qwen's set_vocab wrongly naming all special tokens [PAD{id}] * convert : refactor Qwen's set_vocab to use it for StableLM 2 too * nix : add tiktoken to llama-python-extra * convert : use presence of tokenizer.json to determine StableLM tokenizer loader It's a less arbitrary heuristic than the vocab size.
Stable LM 2 1.6B was recently released (see https://stability.ai/news/introducing-stable-lm-2). It's different enough from their older 3B model that it requires some changes in
llama.cpp
in order to work.It's mostly the same model architecture as
stablelm-3b-4e1t
, but they seem to have added bias tensors (or whatever they are called) for Q, K, and V, so this is now also handled for theLLM_ARCH_STABLELM
model type.The tokenizer is also different from the
stablelm-3b-4e1t
; in StableLM 2, it is defined in thetiktoken
format, in a very similar way than with the Qwen models.To avoid unnecessary code duplication, I added
_set_vocab_qwen
to theModel
class so that both Qwen and StableLM 2 could make their vocab in the same way.In doing so, I noticed a bug in the previous implementation: all special tokens were named
[PAD{id}]
. This is because unlike intokenizers.json
, the special tokens for Qwen-style tokenizers are not a subset of the vocab. So special tokens could not be found in thereverse_vocab
and were always named like padding tokens. Combining theadded_vocab
with thevocab
when making thereverse_vocab
fixes this. (this is not necessarily relevant for_set_vocab_gpt2
, because intokenizer.json
, thevocab
usually contains all tokens, including special ones)In
convert-hf-to-gguf.py
, to know which kind of tokenizer to look for when converting aStableLMModel
,I used the vocab size instead of something like the number of layers because Qwen-style tokenizers seem to have a lot more tokens than others, so it seems like a good enough heuristic for at least this specific case.A better way would perhaps be to check for the absence oftokenizer.json
. (EDIT: now implemented this way (with thetokenizer.json
presence check). It should have the same behavior as with the vocab size check (nothing in the actual conversion was changed, so resulting converted models are the same as before))Oh, and since the
tiktoken
library is used when converting, I added it to thellama-python-extra
package list in the nix package so that it's included when using a devShell like withnix develop .#default-extra
.Since I moved the code for Qwen's
set_vocab
, I recommend usinggit log -p --color-moved
when reviewing this.