Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Behavior when missing quantization version #447

Open
Reichenbachian opened this issue Dec 20, 2023 · 1 comment
Open

Behavior when missing quantization version #447

Reichenbachian opened this issue Dec 20, 2023 · 1 comment

Comments

@Reichenbachian
Copy link

The problem happened below. Turns out it didn't include the "general.quantization_version" metadata. In the case that llama.cpp reads a file without a version, it assumes 2 (grep for the line gguf_set_val_u32(ctx_out, "general.quantization_version", GGML_QNT_VERSION);), so this model works with llama.cpp but fails with rusformers/llm.

model_name = "meta-llama/Llama-2-7b-chat-hf"
    model = AutoModelForCausalLM.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    tokenizer.save_pretrained(local_dir)
    torch.save(model.state_dict(), os.path.join(local_dir, "pytorch_model.bin"))
python llm/crates/ggml/sys/llama-cpp/convert.py models/ --vocab-dir models/ --ctx 4096 --outtype q8_0
let model = llm::load(
                path,
                llm::TokenizerSource::Embedded,
                parameters,
                llm::load_progress_callback_stdout,
            )
            .unwrap_or_else(|err| panic!("Failed to load model: {err}"));

thread '<unnamed>' panicked at llm/inference/src/llms/local/llama2.rs:45:35:
Failed to load model: quantization version was missing, despite model containing quantized tensors

My solution was to just get rid of this whole block

    let any_quantized = gguf
        .tensor_infos
        .values()
        .any(|t| t.element_type.is_quantized());
    // if any_quantized {
    //     match quantization_version {
    //         Some(MetadataValue::UInt32(2)) => {
    //             // Currently supported version
    //         }
    //         Some(quantization_version) => {
    //             return Err(LoadError::UnsupportedQuantizationVersion {
    //                 quantization_version: quantization_version.clone(),
    //             })
    //         }
    //         None => return Err(LoadError::MissingQuantizationVersion),
    //     }
    // }

Unsure how you want to handle this since it does remove a check.

@Reichenbachian
Copy link
Author

a670aae

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant