-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproduced results #82
Comments
Hi, I think this is acceptable. Different number of GPUs cause different gradient_accumulation_steps. Different types of GPUs cause randomness. Btw, the performance for phi-2-siglip-base we listed here is trained by 8 A100-40Gs. |
Thanks for your reply. I see that you are using fp16 by default in the training script, but A100 supports bf16. May I ask if you are using the bf16 in your training? |
No, we haven't tried bf16 thoroughly. But we encourage the open-source community to give it a try. And we can update the performance table accordingly and welcome the open-source community as the contributors of this code repository. |
I try to reproduce the results under base recipe. I basically get the results in the paper on VQAv2, GQA, ScienceQA and POPE. But there is almost 1% gap on TextVQA, MMMU, and MM-Vet, and the gap on MME seems to be larger. I'm not sure if this gap is acceptable? Or what could potentially cause this gap?
The text was updated successfully, but these errors were encountered: