Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature flag metal: Fails to load model when n_gpu_layers > 0 #18

Open
phudtran opened this issue Oct 26, 2023 · 8 comments
Open

Feature flag metal: Fails to load model when n_gpu_layers > 0 #18

phudtran opened this issue Oct 26, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@phudtran
Copy link

phudtran commented Oct 26, 2023

Can't utilize GPU on Mac with

llama_cpp_rs = { git = "https://github.com/mdrokz/rust-llama.cpp", version = "0.3.0", features = [
    "metal",
] }

Code

use llama_cpp_rs::{
    options::{ModelOptions, PredictOptions},
    LLama,
};
fn main() {
    let model_options = ModelOptions {
        n_gpu_layers: 1,
        ..Default::default()
    };

    let llama = LLama::new("zephyr-7b-alpha.Q2_K.gguf".into(), &model_options);
    println!("llama: {:?}", llama);
    let predict_options = PredictOptions {
        tokens: 0,
        threads: 14,
        top_k: 90,
        top_p: 0.86,
        token_callback: Some(Box::new(|token| {
            println!("token1: {}", token);

            true
        })),
        ..Default::default()
    };

    llama
        .unwrap()
        .predict(
            "what are the national animals of india".into(),
            predict_options,
        )
        .unwrap();
}

Error

llama_new_context_with_model: kv self size  =   64.00 MB
llama_new_context_with_model: ggml_metal_init() failed
llama: Err("Failed to load model")
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: "Failed to load model"', src/main.rs:40:10
@mdrokz mdrokz added the bug Something isn't working label Oct 26, 2023
@mdrokz
Copy link
Owner

mdrokz commented Oct 26, 2023

Can't utilize GPU on Mac with

llama_cpp_rs = { git = "https://github.com/mdrokz/rust-llama.cpp", version = "0.3.0", features = [
    "metal",
] }

Code

use llama_cpp_rs::{
    options::{ModelOptions, PredictOptions},
    LLama,
};
fn main() {
    let model_options = ModelOptions {
        n_gpu_layers: 1,
        ..Default::default()
    };

    let llama = LLama::new("zephyr-7b-alpha.Q2_K.gguf".into(), &model_options);
    println!("llama: {:?}", llama);
    let predict_options = PredictOptions {
        tokens: 0,
        threads: 14,
        top_k: 90,
        top_p: 0.86,
        token_callback: Some(Box::new(|token| {
            println!("token1: {}", token);

            true
        })),
        ..Default::default()
    };

    llama
        .unwrap()
        .predict(
            "what are the national animals of india".into(),
            predict_options,
        )
        .unwrap();
}

Error

llama_new_context_with_model: kv self size  =   64.00 MB
llama_new_context_with_model: ggml_metal_init() failed
llama: Err("Failed to load model")
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: "Failed to load model"', src/main.rs:40:10

Hmm weird i dont have a mac available currently to test this, i will try to see about this. Thanks

@zackshen
Copy link

zackshen commented Nov 4, 2023

i have the same problem on my Apple M1.

@zackshen
Copy link

zackshen commented Nov 15, 2023

@phudtran i have found root cause. you should put the ggml-metal.metal file next to your binary. i found disable the debug log print build.rs for building metal feature. so print more log to find the error.

build.rs

fn compile_metal(cx: &mut Build, cxx: &mut Build) {
    cx.flag("-DGGML_USE_METAL").flag("-DGGML_METAL_NDEBUG");
    cxx.flag("-DGGML_USE_METAL");

    println!("cargo:rustc-link-lib=framework=Metal");
    println!("cargo:rustc-link-lib=framework=Foundation");
    println!("cargo:rustc-link-lib=framework=MetalPerformanceShaders");
    println!("cargo:rustc-link-lib=framework=MetalKit");

    cx.include("./llama.cpp/ggml-metal.h")
        .file("./llama.cpp/ggml-metal.m");
}

disable GGML_METAL_NDEBUG

fn compile_metal(cx: &mut Build, cxx: &mut Build) {
    cx.flag("-DGGML_USE_METAL"); // <==============  enable print debug log.
    cxx.flag("-DGGML_USE_METAL");

    println!("cargo:rustc-link-lib=framework=Metal");
    println!("cargo:rustc-link-lib=framework=Foundation");
    println!("cargo:rustc-link-lib=framework=MetalPerformanceShaders");
    println!("cargo:rustc-link-lib=framework=MetalKit");

    cx.include("./llama.cpp/ggml-metal.h")
        .file("./llama.cpp/ggml-metal.m");
}

@mdrokz Should add some flags to enable(disable) the debug log ?

@hugonijhuis
Copy link

@zackshen I've tried adding the ggml-metal.metal file next to the binary, but now I get the following message:
-[MTLComputePipelineDescriptorInternal setComputeFunction:withType:]:722: failed assertion 'computeFunction must not be nil.'

@zackshen
Copy link

zackshen commented Nov 17, 2023

@zackshen I've tried adding the ggml-metal.metal file next to the binary, but now I get the following message:

-[MTLComputePipelineDescriptorInternal setComputeFunction:withType:]:722: failed assertion 'computeFunction must not be nil.'

I have never seen this error before. just modified the example code in the this repo for testing gpu utilization. Can you show your code ?

@mdrokz
Copy link
Owner

mdrokz commented Nov 20, 2023

@phudtran i have found root cause. you should put the ggml-metal.metal file next to your binary. i found disable the debug log print build.rs for building metal feature. so print more log to find the error.

build.rs

fn compile_metal(cx: &mut Build, cxx: &mut Build) {
    cx.flag("-DGGML_USE_METAL").flag("-DGGML_METAL_NDEBUG");
    cxx.flag("-DGGML_USE_METAL");

    println!("cargo:rustc-link-lib=framework=Metal");
    println!("cargo:rustc-link-lib=framework=Foundation");
    println!("cargo:rustc-link-lib=framework=MetalPerformanceShaders");
    println!("cargo:rustc-link-lib=framework=MetalKit");

    cx.include("./llama.cpp/ggml-metal.h")
        .file("./llama.cpp/ggml-metal.m");
}

disable GGML_METAL_NDEBUG

fn compile_metal(cx: &mut Build, cxx: &mut Build) {
    cx.flag("-DGGML_USE_METAL"); // <==============  enable print debug log.
    cxx.flag("-DGGML_USE_METAL");

    println!("cargo:rustc-link-lib=framework=Metal");
    println!("cargo:rustc-link-lib=framework=Foundation");
    println!("cargo:rustc-link-lib=framework=MetalPerformanceShaders");
    println!("cargo:rustc-link-lib=framework=MetalKit");

    cx.include("./llama.cpp/ggml-metal.h")
        .file("./llama.cpp/ggml-metal.m");
}

@mdrokz Should add some flags to enable(disable) the debug log ?

I will add an option for enabling / disabling debug

@genbit
Copy link

genbit commented Dec 14, 2023

Encountered the same error. Placing ggml-metal.metal into the project directory leads to the same error as @hugonijmek have seen.

However, this solves the original issue: setting the following env variable to point to llama.cpp sources GGML_METAL_PATH_RESOURCES=/rust-llama.cpp/llama.cpp/ solves the issue. (https://github.com/ggerganov/whisper.cpp/blob/master/ggml-metal.m#L261)

@tbogdala
Copy link

If you want to include it in the build so you don't have to worry about having the shader file parallel or using the environment variable, you can use the solution from the rustformers/llm respository:
rustformers/llm@9d39ff8

To get it working, update the needle to the current string.

The file this puts in the output directory has a prefix to 'ggml-metal.o' so when checking the ggml_type in compile_llama, check for "metal" and if so, search the directory for the file using a call to ends_with("-ggml-metal.o") and then add that with cxx.object(metal_path).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants