Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT]: Adding save to gguf support for qwen2_vl #1904

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Captain-T2004
Copy link

[DRAFT] GGUF Support for Qwen2 Vision Models

Feature Overview

Aiming to provide direct GGUF export capability for vision finetunes, supporting all available Qwen2 Vision Models.

Expectations Details

  • Enables direct export of vision finetunes to GGUF format
  • Compatible with the complete range of Qwen2 Vision Models

Current Progress

  • Modifications to save.py logic allows it to export 2 GGUF files of vision models directly by running the save_pretrained_to_gguf method. One file is for the LLM part and the other is for the vision encoder (mmproj file).
  • The qwen2-vl-surgery.py is a modified version of the original file found in llama.cpp that uses GPU instead of CPU and generates the vision encoder.

Current Issues

  • The LLM part, when tested with the original model mmproj file, works perfectly, suggesting that the LLM part is saved successfully.
  • When the LLM part is used with the extracted vision encoder (mmproj), it gives vague output "GGGGGGGGGGGGGGG........".
  • The original qwen2-vl-surgery.py file exceeds RAM usage when run directly, and the custom qwen2-vl-surgery.py we have added works with original models' safetensors.

What We Have Tried

  • Optimizing original qwen2-vl-surgery.py to run on GPU instead of CPU to prevent exceeding memory usage.
  • Tried running qwen2-vl-surgery.py on different model formats like bin and safetensors.

Contributors

adityaghai07, Captain-T2004

@adityaghai07
Copy link
Contributor

vdonchev helped out a lot in clearing doubts about the vision_encoders and splitting of vlms. I believe we have followed the correct approach regarding exporting vlms to GGUF format.

The vision-encoder(mmproj) file when extracted directly from the original model. ( Just run the surgery file without passing model path and it utilizes the original Qwen2_vl 2B model from HuggingFace ) works well with the finetuned llm part in GGUF format.

The Possible Issue : When the saved vision-encoder is used to run the model , it produces warnings that a lot of tensor weights are missing and produces vague outputs.

Screenshot 2025-03-03 195153

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants