Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Build against newer GGML version #428

Merged
merged 13 commits into from
Nov 12, 2023

Conversation

LLukas22
Copy link
Contributor

@LLukas22 LLukas22 commented Sep 16, 2023

  • Update llama.cpp and include ggml-alloc during binding generation
  • Switch to new graph-alllocator
  • Ensure CPU inference works
  • Ensure Cuda inference works
  • Ensure OpenCL inference works
  • Ensure Metal inference works

@LLukas22
Copy link
Contributor Author

CPU inference seams to work, at least for llama. Cuda/OpenCL are currently broken.

@philpax
Copy link
Collaborator

philpax commented Sep 26, 2023

As requested:

llm # git log --pretty=oneline
78b0e25c7164cfa9e56cf6ac648e803432d5a0aa (HEAD -> feat/ggml-update, llukas22/feat/ggml-update) Scope `input_length` and `session_len` to `BuildContext`

llm # cargo run -r infer -a llama -m models/llama2/dolphin-llama2-7b.ggmlv3.q4_K_M.bin -p "Testing 123: "
   Compiling ggml-sys v0.2.0-dev (llm/crates/ggml/sys)
warning: clang: warning: argument unused during compilation: '-mfpu=neon' [-Wunused-command-line-argument]
warning: clang: warning: argument unused during compilation: '-mfpu=neon' [-Wunused-command-line-argument]
warning: clang: warning: argument unused during compilation: '-mfpu=neon' [-Wunused-command-line-argument]
...
    Finished release [optimized] target(s) in 10.26s
     Running `target/release/llm infer -a llama -m models/llama2/dolphin-llama2-7b.ggmlv3.q4_K_M.bin -p 'Testing 123: '`
✓ Loaded 291 tensors (4.1 GB) after 131ms
zsh: segmentation fault  cargo run -r infer -a llama -m  -p "Testing 123: "

llm # cargo run -F metal -r infer -a llama -m models/llama2/dolphin-llama2-7b.ggmlv3.q4_K_M.bin -p "Testing 123: " --use-gpu
   Compiling ggml-sys v0.2.0-dev (llm/crates/ggml/sys)
warning: clang: warning: argument unused during compilation: '-mfpu=neon' [-Wunused-command-line-argument]
warning: clang: warning: argument unused during compilation: '-mfpu=neon' [-Wunused-command-line-argument]
warning: clang: warning: argument unused during compilation: '-mfpu=neon' [-Wunused-command-line-argument]
warning: clang: warning: argument unused during compilation: '-mfpu=neon' [-Wunused-command-line-argument]
   Compiling ggml v0.2.0-dev (llm/crates/ggml)
   Compiling llm-base v0.2.0-dev (llm/crates/llm-base)
error[E0425]: cannot find value `scratch` in this scope
   --> crates/llm-base/src/inference_session.rs:172:28
    |
172 |                 for buf in scratch.iter() {
    |                            ^^^^^^^ not found in this scope

error[E0308]: mismatched types
   --> crates/llm-base/src/inference_session.rs:298:34
    |
298 |                     plan.execute(ctx0);
    |                          ------- ^^^^ expected `&mut Vec<u8>`, found `&mut Context`
    |                          |
    |                          arguments to this method are incorrect
    |
    = note: expected mutable reference `&mut Vec<u8>`
               found mutable reference `&mut ggml::Context`
note: method defined here
   --> llm/crates/ggml/src/lib.rs:355:12
    |
355 |     pub fn execute(&mut self, buffer: &mut Vec<u8>) {
    |            ^^^^^^^

error[E0308]: mismatched types
   --> crates/llm-base/src/inference_session.rs:302:30
    |
302 |                 plan.execute(ctx0);
    |                      ------- ^^^^ expected `&mut Vec<u8>`, found `&mut Context`
    |                      |
    |                      arguments to this method are incorrect
    |
    = note: expected mutable reference `&mut Vec<u8>`
               found mutable reference `&mut ggml::Context`
note: method defined here
   --> llm/crates/ggml/src/lib.rs:355:12
    |
355 |     pub fn execute(&mut self, buffer: &mut Vec<u8>) {
    |            ^^^^^^^

Some errors have detailed explanations: E0308, E0425.
For more information about an error, try `rustc --explain E0308`.
error: could not compile `llm-base` (lib) due to 3 previous errors

@LLukas22
Copy link
Contributor Author

Thanks for the quick test, i don't think i can debug that as i can't reproduce it. The failure with metal enabled is expected as i don't have touched any of the accelerators yet. I think i will focus on getting Cuda and OpenCL functional again.

@philpax
Copy link
Collaborator

philpax commented Sep 27, 2023

Sounds good to me - I'll see if I can make some time to push the macOS effort forward, but I suspect I'm going to be really busy for the next week or two 😓

Awesome work so far, though! Let me know if you need me to look at/consult on anything, but everything looks great so far.

@philpax philpax mentioned this pull request Oct 31, 2023
@philpax
Copy link
Collaborator

philpax commented Nov 1, 2023

The test I did earlier on macOS now works after pulling in your latest changes. I'll fix Metal soon - I'm hoping that it shouldn't be too bad. After that, I'll check each architecture with {CPU, CUDA, Metal}. Maybe OpenCL, but that's honestly kind of a pain to test - might just bounce it against the CI and hope for the best.

I may also try to use this PR to update to the latest version once more, but it'll depend on how large that change is. I want to keep the diff small so that I can get this in before #412 and not have to resolve too many merge conflicts - we'll see how we go!

@LLukas22
Copy link
Contributor Author

LLukas22 commented Nov 2, 2023

I think i stopped implementing this as i got a lot of "memory access errors" while i tried to get cuda working. I guess CPU inference should work, but i kind of gave up on getting CUDA o work as its just very difficult to debug into GGML from the rust side. Especially the CUDA bits.

@philpax
Copy link
Collaborator

philpax commented Nov 2, 2023

Yeah, understandable - it’s very difficult to debug. I’m out of town for the next few days, but I’ll get back to it after that.

@LLukas22
Copy link
Contributor Author

LLukas22 commented Nov 2, 2023

Take your time. I will probably focus on getting the quanitzed cuda kernels working in candle over the weekend.

@philpax philpax changed the base branch from main to develop November 12, 2023 20:35
@philpax philpax marked this pull request as ready for review November 12, 2023 20:35
@philpax philpax merged commit e5e0fe1 into rustformers:develop Nov 12, 2023
10 of 14 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants