Skip to content

Conversation

@selimcavas
Copy link

  • add ColQwen3 + ColQwen3MoE wrappers around Qwen3VL backbones
  • add training entrypoint for Qwen3
  • updated transformers to v4.57.1 to access the Qwen3-VL backbones

I am currently unable to test the training script due to GPU constraints, this is mostly a draft implementation done with codex. MoE processing is currently the same with dense, I kept the implementation seperate to leave room for the implementations to diverge later.

@selimcavas
Copy link
Author

I forgot to tag @ManuelFay

@ManuelFay
Copy link
Collaborator

Surely pretty nice - but we can't merge things that have not been tested through training !

@QuentinJGMace has been training Qwen3VL models recently, there is a branch open already. I'll let this one open so he can see cherry pick what he wants from both branches !
Thanks for the contrib!

@QuentinJGMace
Copy link
Collaborator

Hey @selimcavas ! thanks for the contrib.

I'm not sure about implementing support for MoE models as I don't think we'll train one (and none exists at the moment). But if one is trained one day I'll be happy to merge the code to support it.

As @ManuelFay said, i've been experimenting a bit with qwen3, as I'm soon on (long) hollidays i'm not sure when a new model will come out, but one should be eventually :)

@ManuelFay
Copy link
Collaborator

Maybe we can pass this off to @mlconti1 ?

@selimcavas
Copy link
Author

Okay I might try training a model by adjusting the params I currently have a rtx5090, approximately how many gpu hours (H100) does it take to train a full model such as ColQwen2.5? I planned to train the Qwen3 VL 2B model

@athrael-soju
Copy link
Contributor

athrael-soju commented Nov 20, 2025

Okay I might try training a model by adjusting the params I currently have a rtx5090, approximately how many gpu hours (H100) does it take to train a full model such as ColQwen2.5? I planned to train the Qwen3 VL 2B model

I'm casually training Colqwen3-vl-2B on an RTX 5090. I'm expecting it to take roughly 16 hours, with checkpoints every 250 steps and tracking via wandb.

PR in my fork, if you want to have a look: https://github.com/athrael-soju/colpali/pull/6/files

I think it's got potential of being a great colpali model and the recipe is already there from previous models, so why not?

@mlconti1
Copy link
Collaborator

Hi, sorry for the delay, just came back from holidays too!
Indeed I was interested in taking up from where @QuentinJGMace left off, we might have some ideas for new data mixes, but so far nothing running. I'll try to find some time next week to have a look at that, thanks for sharing @athrael-soju and let us know how the run goes!

@athrael-soju
Copy link
Contributor

athrael-soju commented Nov 22, 2025

Hi, sorry for the delay, just came back from holidays too!

Indeed I was interested in taking up from where @QuentinJGMace left off, we might have some ideas for new data mixes, but so far nothing running. I'll try to find some time next week to have a look at that, thanks for sharing @athrael-soju and let us know how the run goes!

It plateaued before 1 epoch unfortunately. I've been having issues with the dataset and had to also update some files from colpali_engine to get it to run.

I recall not having any of these issues when I was experimenting with colintern.

Feel free to check my PR if you get a chance, but I'll try again soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants