Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLAVA-one-vision MMMU 性能评测差别很大 #645

Open
linxid opened this issue Dec 4, 2024 · 1 comment
Open

LLAVA-one-vision MMMU 性能评测差别很大 #645

linxid opened this issue Dec 4, 2024 · 1 comment

Comments

@linxid
Copy link

linxid commented Dec 4, 2024

这是评测的脚本:torchrun --nproc-per-node=1 run.py --data MMMU_DEV_VAL --model llava_onevision_qwen2_0.5b_ov --verbose
评测下来的指标是:
split validation dev
Overall 0.3522222222222222 0.31333333333333335
Accounting 0.43333333333333335 0.0
Agriculture 0.36666666666666664 0.2
Architecture_and_Engineering 0.23333333333333334 0.2
Art 0.3333333333333333 0.0
Art_Theory 0.43333333333333335 0.6
使用的 gpt3.5 进行评测,和llava-one-vision 论文里面给的结果0.31,有很大的出入。这是怎么回事呢。

@kennymckormick
Copy link
Member

Hi, @linxid ,
我们推荐使用官方的 OPENAI API 进行评测,可以确认下你这边是否使用的是 OPENAI 官方 API。
依据我们的测试结果,与 0.31 没有显著差异 (见下图):
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants