-
Notifications
You must be signed in to change notification settings - Fork 25.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The behavior of the tokenizer loaded from GGUF file is incorrect. #31630
Comments
Thank you @ArthurZucker , now the tokenizer works well. However, when I try to save and then load it, another error occurs: Code: from transformers import AutoTokenizer
model_id = "QuantFactory/Meta-Llama-3-8B-GGUF"
filename = "Meta-Llama-3-8B.Q4_K_M.gguf"
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
save_dir = '../../deq_models/test'
tokenizer.save_pretrained(save_dir)
tokenizer2 = AutoTokenizer.from_pretrained(save_dir) The package version:
|
Sorry for the misoperation. |
System Info
transformers
version: 4.42.0.dev0Who can help?
@ArthurZucker @younesbelkada
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I install
transformers
from #30391 (comment) :because the newest released version v4.41.2 cannot load tokenizer from gguf file correctly.
Here is my code:
the output is:
The output of
decode()
should be identical to the text, shouldn't it? I also tried to encode the same text usingllama-cpp-python 0.2.79
and the same model:The output is right:
Expected behavior
The result of
decode()
should be identical to the raw text.The text was updated successfully, but these errors were encountered: