Reproduced results #82

Fantasy1120 · 2024-06-14T09:49:38Z

I try to reproduce the results under base recipe. I basically get the results in the paper on VQAv2, GQA, ScienceQA and POPE. But there is almost 1% gap on TextVQA, MMMU, and MM-Vet, and the gap on MME seems to be larger. I'm not sure if this gap is acceptable? Or what could potentially cause this gap?

YingHuTsing · 2024-06-15T09:53:59Z

Hi, I think this is acceptable. Different number of GPUs cause different gradient_accumulation_steps. Different types of GPUs cause randomness. Btw, the performance for phi-2-siglip-base we listed here is trained by 8 A100-40Gs.

Fantasy1120 · 2024-06-18T06:19:50Z

Hi, I think this is acceptable. Different number of GPUs cause different gradient_accumulation_steps. Different types of GPUs cause randomness. Btw, the performance for phi-2-siglip-base we listed here is trained by 8 A100-40Gs.

Thanks for your reply. I see that you are using fp16 by default in the training script, but A100 supports bf16. May I ask if you are using the bf16 in your training?

YingHuTsing · 2024-06-18T11:05:42Z

No, we haven't tried bf16 thoroughly. But we encourage the open-source community to give it a try. And we can update the performance table accordingly and welcome the open-source community as the contributors of this code repository.

Fantasy1120 changed the title ~~Reproduced results and eval problems~~ Reproduced results Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduced results #82

Reproduced results #82

Fantasy1120 commented Jun 14, 2024

YingHuTsing commented Jun 15, 2024

Fantasy1120 commented Jun 18, 2024

YingHuTsing commented Jun 18, 2024

Reproduced results #82

Reproduced results #82

Comments

Fantasy1120 commented Jun 14, 2024

YingHuTsing commented Jun 15, 2024

Fantasy1120 commented Jun 18, 2024

YingHuTsing commented Jun 18, 2024