-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix RWKV v6 model conversion #10913
Fix RWKV v6 model conversion #10913
Conversation
Signed-off-by: Molly Sophia <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely caused by this change where I removed the squeeze()
during conversion:
llama.cpp/convert_hf_to_gguf.py
Lines 298 to 301 in 0bf2d10
for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)): | |
# TODO: why do we squeeze here? | |
# data = data_torch.squeeze().numpy() | |
data = data_torch.numpy() |
I see. So there's another way to fix this: squeeze them in rwkv6's modify_tensors(), rather than adding them to the F32 list? |
Don't think there is any significant advantage one way or the other. Maybe squeezing in modify_tensors is a bit more localized. |
Yeah that's what I meant. Let me change to use the more localized way then. |
Signed-off-by: Molly Sophia <[email protected]>
8699330
to
a20a94f
Compare
This is good to merge? |
Yes. Thanks a lot for your time! |
Make sure to read the contributing guidelines before submitting a PR
It seems that some rwkv tensors are made FP16 rather than FP32 after specific commits. However,
ggml_cuda_op_bin_bcast
requires src1->type == FP32. As a result, newly converted RWKV models cannot run with cuda, while existing files aren't affected. This PR fixes the issue above.Also add LLAMA_EXAMPLE_PERPLEXITY in the examples list of parameter
--no-context-shift
so that models without context shift support can do llama-perplexity again.