QWen2 Audio + Visual #28

matbee-eth · 2024-09-11T19:52:05Z

Would be great if you worked out a system to allow us to fine-tune QWen2-VL (rather than LLaVa) from your custom projector setup. They have Qwen2-Audio and Qwen2-VL, but no A+VL.

sshh12 · 2024-09-23T01:52:11Z

I haven't had the time to upgrade this but happy to advise anyone who wants to try it.

In theory, you'd just need to add qwen2 as a model like https://github.com/sshh12/multi_token/blob/main/multi_token/language_models/mistral.py and then train with a dataset that includes audio and vision (both together supported).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QWen2 Audio + Visual #28

QWen2 Audio + Visual #28

matbee-eth commented Sep 11, 2024

sshh12 commented Sep 23, 2024

QWen2 Audio + Visual #28

QWen2 Audio + Visual #28

Comments

matbee-eth commented Sep 11, 2024

sshh12 commented Sep 23, 2024