Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix RWKV v6 model conversion #10913

Merged
merged 2 commits into from
Dec 20, 2024
Merged

Conversation

MollySophia
Copy link
Contributor

Make sure to read the contributing guidelines before submitting a PR

It seems that some rwkv tensors are made FP16 rather than FP32 after specific commits. However, ggml_cuda_op_bin_bcast requires src1->type == FP32. As a result, newly converted RWKV models cannot run with cuda, while existing files aren't affected. This PR fixes the issue above.

Also add LLAMA_EXAMPLE_PERPLEXITY in the examples list of parameter --no-context-shift so that models without context shift support can do llama-perplexity again.

@github-actions github-actions bot added the python python script changes label Dec 20, 2024
Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely caused by this change where I removed the squeeze() during conversion:

for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):
# TODO: why do we squeeze here?
# data = data_torch.squeeze().numpy()
data = data_torch.numpy()

@MollySophia
Copy link
Contributor Author

This is likely caused by this change where I removed the squeeze() during conversion:

for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):
# TODO: why do we squeeze here?
# data = data_torch.squeeze().numpy()
data = data_torch.numpy()

I see. So there's another way to fix this: squeeze them in rwkv6's modify_tensors(), rather than adding them to the F32 list?
Should both solve the issue. I wonder which way is better

@ggerganov
Copy link
Owner

Don't think there is any significant advantage one way or the other. Maybe squeezing in modify_tensors is a bit more localized.

@MollySophia
Copy link
Contributor Author

Don't think there is any significant advantage one way or the other. Maybe squeezing in modify_tensors is a bit more localized.

Yeah that's what I meant. Let me change to use the more localized way then.

@ggerganov
Copy link
Owner

This is good to merge?

@MollySophia
Copy link
Contributor Author

This is good to merge?

Yes. Thanks a lot for your time!

@ggerganov ggerganov merged commit 0a11f8b into ggerganov:master Dec 20, 2024
50 of 51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants