[DRAFT]: Adding save to gguf support for qwen2_vl #1904
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[DRAFT] GGUF Support for Qwen2 Vision Models
Feature Overview
Aiming to provide direct GGUF export capability for vision finetunes, supporting all available Qwen2 Vision Models.
Expectations Details
Current Progress
save_pretrained_to_gguf
method. One file is for the LLM part and the other is for the vision encoder (mmproj file).qwen2-vl-surgery.py
is a modified version of the original file found in llama.cpp that uses GPU instead of CPU and generates the vision encoder.Current Issues
qwen2-vl-surgery.py
file exceeds RAM usage when run directly, and the customqwen2-vl-surgery.py
we have added works with original models' safetensors.What We Have Tried
qwen2-vl-surgery.py
to run on GPU instead of CPU to prevent exceeding memory usage.qwen2-vl-surgery.py
on different model formats like bin and safetensors.Contributors
adityaghai07, Captain-T2004