You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tensor parallelism would be very useful for running Qwen2VL. My personal use case is running 72B across 2x 3090.
Solution
Would be great if exllamav2 could support full TP on multi-modal models. However, I am unsure as to the complexity in making the vision piece parallel. Even if just the language part could be parallelized that should increase generation speeds considerably.
Alternatives
No response
Explanation
Multimodal models are becoming increasingly relevant.
Examples
No response
Additional context
No response
Acknowledgements
I have looked for similar requests before submitting this one.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will make my requests politely.
The text was updated successfully, but these errors were encountered:
Problem
Tensor parallelism would be very useful for running Qwen2VL. My personal use case is running 72B across 2x 3090.
Solution
Would be great if exllamav2 could support full TP on multi-modal models. However, I am unsure as to the complexity in making the vision piece parallel. Even if just the language part could be parallelized that should increase generation speeds considerably.
Alternatives
No response
Explanation
Multimodal models are becoming increasingly relevant.
Examples
No response
Additional context
No response
Acknowledgements
The text was updated successfully, but these errors were encountered: