-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Support for Qwen2-VL #9246
Comments
+1 This would be another great addition! |
This model is awesome |
I am looking forward to it very much |
+1 I am looking forward to it very much |
We can try Llamafing it |
+1 |
7 similar comments
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
Any updates? |
+1 |
5 similar comments
+1 |
+1 |
+1 |
+1 |
+1 |
I can not wait for it !!! |
Maybe people should also express interest and ask Qwen2-VL devs to implement. |
Expect to use llama.cpp end side inference |
Is anyone already working on this? If not, I would like to give it a try. |
+1 |
+1 |
2 similar comments
+1 |
+1 |
@huucuong1503 Thanks to your help, I was able to implement the model enabled, and there was no garbled characters, and I recompiled the branch according to the instructions of the compilation project on your kaggle. The result is a success!! |
Thank you for your help as well. |
But I also found a problem, I am on two 4090s, I want to try the 7b model, but I get the error CUDA out of memory, even if the model is quantized, it will exceed the video memory, is there any good way to do this? |
try to decrease the -ngl |
@huucuong1503 Thanks, I'm currently working on the deployment, but I don't know how to apply this project to my field of work, like you said about this Jetson Orin NX, I just implemented this project, I want to add features to this project, what should I do, can you talk about it briefly? |
Can you tell me what project you are working on? In my case, im building a local VLM agent that could run on Jetson to control UAVs and Im working on with qwen agent framework for this task. You can edit and modify code in qwen2-vl-cli.cpp and refer to qwen2vl paper for getting the right token. |
@huucuong1503 The direction of my current work should be about monitoring embedded devices for a certain scenario, for example, I will continue to pay attention to whether there will be a fire in a scene, I use QWEN for scene understanding, and give me an alarm when a fire occurs. So I just need to modify the code of the qwen2-vl-cli.cpp to do it? |
this demand sounds like you are in a Chinese company? I also use qwen2vl do the fire detction lol |
Oh this task seem quiet simple, you just need to adjust and add some system prompts for a json contruct but I think using an VLM for fire detection is a bit overkill for this task. You can use owl-vit which is really good for detecting open-vocab class. But of course you can contact me by linkedin or gmail for further discussion |
Yes, the new business developed by our company wants to develop in the direction of large models. |
Thanks again, I will be in touch with you |
Hi all in cmake . -DGGML_CUDA=ON -DCMAKE_CUDA_COMPILER=$(which nvcc) -DTCNN_CUDA_ARCHITECTURES=61 How to build using ' DGGML_SYCL=ON ' to get a build package like this: llama-b4218-bin-win-sycl-x64.zip I'll appreciate a lot any help thanks guys!! |
Thank you so much! |
I have tried llama-qwen2vl-cli -m ~/Downloads/qwen2-vl-72b-instruct-q4_k_m.gguf --mmproj ~/Downloads/qwen2-vl-72b-instruct.f32.mmproj.gguf --image demos/images/03.jpg Got an error:
The full output is:
|
same issue on M4 max 128 GB |
Same on M3-Max 64GB |
Same error on MBP M3-Max 128GB |
Mac issues should be fixed with #10896 |
I'm getting
when running images with UPD: setting bigger context length seems to help |
Thanks! It now works on my m3-max with #10896. git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
git fetch origin pull/10896/head:pr10896
git checkout pr10896
cmake -B build
cmake --build build --config Release -j
./build/bin/llama-qwen2vl-cli -m xxx.gguf --mmproj yyyy.gguf --image img.png -p "Describe the image." |
I have tried model
|
I don't think it supports webp. Just convert to png or jpeg for now. |
how do I merge the 2 ggufs for ollama ? llm gguf and vision encodoer gguf merge ? |
@gaussiangit Ollama doesn't support qwen2-vl yet. |
Any updates? |
I'm able to successfully test llama-qwen2vl-cli to describe an image using qwen2-VL-7B model on Android(Samsung S21+ to be specific). The operation takes reasonable 3-4 minutes with quantization. I'll be looking to include metal or vulkan to further improve performance by using GPU on the phones. Also, repeat this on IOS as well. |
Hello @embedsri could you please share more details how you did that?
See this:
I did not quantize the mmproj model, but I tried quantizing the text model to q4_0, no difference. |
yes, this CLIP encoding is quite compute-intensive. Especially with the newest commits where the GPU acceleration was deactivated (because it only ever worked on CUDA and everyone else started complaining), it takes some time. But I also think your image is quite large:
How did you set the context length? when the image already takes up 4070 tokens, maybe there is nothing left for the prompt and result. I'd first try to downscale the image and see what happens. |
Prerequisites
Feature Description
Qwen just released Qwen2-VL 2B & 7B under the Apache 2.0 License.
Motivation
SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.
Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.
Possible Implementation
No response
The text was updated successfully, but these errors were encountered: