Skip to content

Bug: conversion to BF16 fails for Kimi K2 Thinking #942

@Lissanro

Description

@Lissanro

What happened?

When trying to convert https://huggingface.co/moonshotai/Kimi-K2-Thinking to BF16 using this command:

python3 ~/pkgs/ik_llama.cpp/convert_hf_to_gguf.py --outtype bf16 \
--outfile /mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16.gguf  \
/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking --split-max-size 50G

...it fails (please check the log below). The same command works when using mainline llama.cpp, so likely updates from ggml-org/llama.cpp#17069 for the convert_hf_to_gguf.py script are not included yet. Ubergram mentioned success making quants for ik_llama.cpp using the mainline conversion script, so I assume this should work as a workaround in the meantime (currently I am still running the conversion, I want to experiment with different Ubegram recipes and quantization settings, and integrate jinja chat template withUnsloth fixes, hence why I downloaded the original release to generate my own GGUFs, but it would be great if ik_llama.cpp tools worked too to convert to BF16, if possible).

Name and Version

Latest git

What operating system are you seeing the problem on?

No response

Relevant log output

> python3 ~/pkgs/ik_llama.cpp/convert_hf_to_gguf.py --outtype bf16 --outfile /mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16.gguf /mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking --split-max-size 50G
INFO:hf-to-gguf:Loading model: Kimi-K2-Thinking
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-000062.safetensors'
INFO:hf-to-gguf:blk.0.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.0.ffn_down.weight,        torch.bfloat16 --> BF16, shape = {18432, 7168}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,        torch.bfloat16 --> BF16, shape = {7168, 18432}
INFO:hf-to-gguf:blk.0.ffn_up.weight,          torch.bfloat16 --> BF16, shape = {7168, 18432}
INFO:hf-to-gguf:blk.0.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.0.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.0.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.0.attn_kv_b.weight,       torch.bfloat16 --> BF16, shape = {512, 16384}
INFO:hf-to-gguf:blk.0.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 32768}
INFO:hf-to-gguf:blk.0.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 8192}
INFO:hf-to-gguf:blk.0.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.0.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.0.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.0.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:gguf: loading model part 'model-00002-of-000062.safetensors'
INFO:hf-to-gguf:blk.1.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
Traceback (most recent call last):
  File "/home/lissanro/pkgs/ik_llama.cpp/convert_hf_to_gguf.py", line 4860, in <module>
    main()
    ~~~~^^
  File "/home/lissanro/pkgs/ik_llama.cpp/convert_hf_to_gguf.py", line 4854, in main
    model_instance.write()
    ~~~~~~~~~~~~~~~~~~~~^^
  File "/home/lissanro/pkgs/ik_llama.cpp/convert_hf_to_gguf.py", line 430, in write
    self.prepare_tensors()
    ~~~~~~~~~~~~~~~~~~~~^^
  File "/home/lissanro/pkgs/ik_llama.cpp/convert_hf_to_gguf.py", line 3748, in prepare_tensors
    super().prepare_tensors()
    ~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/home/lissanro/pkgs/ik_llama.cpp/convert_hf_to_gguf.py", line 285, in prepare_tensors
    for new_name, data in ((n, d.squeeze().numpy()) for n, d in self.modify_tensors(data_torch, name, bid)):
                                                                ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lissanro/pkgs/ik_llama.cpp/convert_hf_to_gguf.py", line 3710, in modify_tensors
    datas.append(self._experts[bid][ename])
                 ~~~~~~~~~~~~~~~~~~^^^^^^^
KeyError: 'model.layers.1.mlp.experts.0.down_proj.weight'

Metadata

Metadata

Assignees

No one assigned

    Labels

    wontfixThis will not be worked on

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions