Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

与榜单指标差距过大,MTVQA和MME #663

Open
GoogleAlphaZero opened this issue Dec 12, 2024 · 4 comments
Open

与榜单指标差距过大,MTVQA和MME #663

GoogleAlphaZero opened this issue Dec 12, 2024 · 4 comments
Assignees

Comments

@GoogleAlphaZero
Copy link

用llama3.2-11b-vision
测试MTVQA的指标,您发布的为Overall:15.3,我实测为Overall:22.3812,
测试MME,您发布的指标为Perception:1380.9、Cognition:439.6,我实测为perception: 1319.5、Cognition:283
请问是什么原因?感谢,辛苦您解答或核实下榜单

@kennymckormick
Copy link
Member

@FangXinyu-0913

@FangXinyu-0913
Copy link
Collaborator

您好,想问一下您是用我们的codebase进行评测吗,以及评测时的参数设定是怎么设定的呢,我们这边按您的设定再评测试一下

@GoogleAlphaZero
Copy link
Author

没改参数,只是config内设置了llama3.2_11b_vision的路径,.env中设置了openai的密钥。运行命令为
python run.py --data MME MTVQA_TEST
--model Llama-3.2-11B-Vision-Instruct --nproc 4
--work-dir /workspace/mydir
--verbose

感谢!

@FangXinyu-0913
Copy link
Collaborator

我们这边重新测试了一遍,和榜单上的指标没有过大差别
image
image
这是我们的环境
image
想问一下您的环境配置是怎样的呢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants