-
Notifications
You must be signed in to change notification settings - Fork 218
Add ColQwen3 and ColQwen3MoE #355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I forgot to tag @ManuelFay |
|
Surely pretty nice - but we can't merge things that have not been tested through training ! @QuentinJGMace has been training Qwen3VL models recently, there is a branch open already. I'll let this one open so he can see cherry pick what he wants from both branches ! |
|
Hey @selimcavas ! thanks for the contrib. I'm not sure about implementing support for MoE models as I don't think we'll train one (and none exists at the moment). But if one is trained one day I'll be happy to merge the code to support it. As @ManuelFay said, i've been experimenting a bit with qwen3, as I'm soon on (long) hollidays i'm not sure when a new model will come out, but one should be eventually :) |
|
Maybe we can pass this off to @mlconti1 ? |
|
Okay I might try training a model by adjusting the params I currently have a rtx5090, approximately how many gpu hours (H100) does it take to train a full model such as ColQwen2.5? I planned to train the Qwen3 VL 2B model |
I'm casually training Colqwen3-vl-2B on an RTX 5090. I'm expecting it to take roughly 16 hours, with checkpoints every 250 steps and tracking via wandb. PR in my fork, if you want to have a look: https://github.com/athrael-soju/colpali/pull/6/files I think it's got potential of being a great colpali model and the recipe is already there from previous models, so why not? |
|
Hi, sorry for the delay, just came back from holidays too! |
It plateaued before 1 epoch unfortunately. I've been having issues with the dataset and had to also update some files from colpali_engine to get it to run. I recall not having any of these issues when I was experimenting with colintern. Feel free to check my PR if you get a chance, but I'll try again soon. |
I am currently unable to test the training script due to GPU constraints, this is mostly a draft implementation done with codex. MoE processing is currently the same with dense, I kept the implementation seperate to leave room for the implementations to diverge later.