Fix RWKV v6 model conversion #10913

MollySophia · 2024-12-20T07:10:39Z

Make sure to read the contributing guidelines before submitting a PR

It seems that some rwkv tensors are made FP16 rather than FP32 after specific commits. However, ggml_cuda_op_bin_bcast requires src1->type == FP32. As a result, newly converted RWKV models cannot run with cuda, while existing files aren't affected. This PR fixes the issue above.

Also add LLAMA_EXAMPLE_PERPLEXITY in the examples list of parameter --no-context-shift so that models without context shift support can do llama-perplexity again.

Signed-off-by: Molly Sophia <[email protected]>

ggerganov

This is likely caused by this change where I removed the squeeze() during conversion:

llama.cpp/convert_hf_to_gguf.py

Lines 298 to 301 in 0bf2d10

    
           for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)): 
        
               # TODO: why do we squeeze here? 
        
               # data = data_torch.squeeze().numpy() 
        
               data = data_torch.numpy()

MollySophia · 2024-12-20T09:04:02Z

This is likely caused by this change where I removed the squeeze() during conversion:

llama.cpp/convert_hf_to_gguf.py

Lines 298 to 301 in 0bf2d10

for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):

# TODO: why do we squeeze here?

# data = data_torch.squeeze().numpy()

data = data_torch.numpy()

I see. So there's another way to fix this: squeeze them in rwkv6's modify_tensors(), rather than adding them to the F32 list?
Should both solve the issue. I wonder which way is better

ggerganov · 2024-12-20T09:22:31Z

Don't think there is any significant advantage one way or the other. Maybe squeezing in modify_tensors is a bit more localized.

MollySophia · 2024-12-20T09:24:54Z

Don't think there is any significant advantage one way or the other. Maybe squeezing in modify_tensors is a bit more localized.

Yeah that's what I meant. Let me change to use the more localized way then.

Signed-off-by: Molly Sophia <[email protected]>

ggerganov · 2024-12-20T09:41:39Z

This is good to merge?

MollySophia · 2024-12-20T09:42:33Z

This is good to merge?

Yes. Thanks a lot for your time!

Enable --no-context-shift for llama-perplexity example

ff3d226

Signed-off-by: Molly Sophia <[email protected]>

github-actions bot added the python python script changes label Dec 20, 2024

ggerganov approved these changes Dec 20, 2024

View reviewed changes

RWKV 6: Fix error in ggml_cuda_op_bin_bcast

a20a94f

Signed-off-by: Molly Sophia <[email protected]>

MollySophia force-pushed the fix-rwkv-converter branch from 8699330 to a20a94f Compare December 20, 2024 09:29

ggerganov merged commit 0a11f8b into ggerganov:master Dec 20, 2024
50 of 51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix RWKV v6 model conversion #10913

Fix RWKV v6 model conversion #10913

MollySophia commented Dec 20, 2024

ggerganov left a comment

MollySophia commented Dec 20, 2024

ggerganov commented Dec 20, 2024

MollySophia commented Dec 20, 2024

ggerganov commented Dec 20, 2024

MollySophia commented Dec 20, 2024

	for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):
	# TODO: why do we squeeze here?
	# data = data_torch.squeeze().numpy()
	data = data_torch.numpy()

Fix RWKV v6 model conversion #10913

Fix RWKV v6 model conversion #10913

Conversation

MollySophia commented Dec 20, 2024

ggerganov left a comment

Choose a reason for hiding this comment

MollySophia commented Dec 20, 2024

ggerganov commented Dec 20, 2024

MollySophia commented Dec 20, 2024

ggerganov commented Dec 20, 2024

MollySophia commented Dec 20, 2024