Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to reproduce InternVL-8b x Blink result #649

Open
David-BominWei opened this issue Dec 5, 2024 · 5 comments
Open

Not able to reproduce InternVL-8b x Blink result #649

David-BominWei opened this issue Dec 5, 2024 · 5 comments

Comments

@David-BominWei
Copy link

Hi, I tried to reproduce the Blink evaluation result. The result I got is different from the result on the leaderboard and InternVL documentation.
您好,我在试图复刻Blink数据集的结果的时候发现了有0.8%的差异,请问这个差异是否来源于ChatGPT的不同版本

Here are the commands I used for evaluation:

torchrun --nproc-per-node=8 run.py --model InternVL2-8B --data BLINK

This is the result I got:

-------------------------  -------------------
split                      none
Overall                    0.5002630194634403
Art_Style                  0.6495726495726496
Counting                   0.7166666666666667
Forensic_Detection         0.3787878787878788
Functional_Correspondence  0.16923076923076924
IQ_Test                    0.32
Jigsaw                     0.6133333333333333
Multi-view_Reasoning       0.43609022556390975
Object_Localization        0.5655737704918032
Relative_Depth             0.7419354838709677
Relative_Reflectance       0.39552238805970147
Semantic_Correspondence    0.2014388489208633
Spatial_Relation           0.8041958041958042
Visual_Correspondence      0.3430232558139535
Visual_Similarity          0.762962962962963
-------------------------  -------------------

This is the reported result:

-------------------------  -------------------
split                      none
Overall                    0.5086796422935297
Art_Style                  0.7094017094017094
Counting                   0.75
Forensic_Detection         0.3484848484848485
Functional_Correspondence  0.17692307692307693
IQ_Test                    0.30666666666666664
Jigsaw                     0.5466666666666666
Multi-view_Reasoning       0.48872180451127817
Object_Localization        0.5573770491803278
Relative_Depth             0.7419354838709677
Relative_Reflectance       0.39552238805970147
Semantic_Correspondence    0.26618705035971224
Spatial_Relation           0.7972027972027972
Visual_Correspondence      0.36046511627906974
Visual_Similarity          0.7851851851851852
-------------------------  -------------------

My transformers version is transformers==4.37.0

My nvcc version is

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
@czczup
Copy link
Contributor

czczup commented Dec 9, 2024

Hello, indeed, I just reproduced the test and achieved a score of 55.0, but the test log from a few months ago shows a score of 50.9. I'm not sure what happened in between.

I often encounter situations where I can't reproduce the old score after a few months. 😭

@David-BominWei
Copy link
Author

Hello, indeed, I just reproduced the test and achieved a score of 55.0, but the test log from a few months ago shows a score of 50.9. I'm not sure what happened in between.

I often encounter situations where I can't reproduce the old score after a few months. 😭

Thank you for your reply. Could you give me some more details about reproducing the 55.0 results? (e.g. the command for running the test or the actual output from InternVL(?)) I wonder if it is a ChatGPT version issue or something else. Thank you VERY much for your help

@czczup
Copy link
Contributor

czczup commented Dec 12, 2024

InternVL2-8B_BLINK.zip

Hello, here are my evaluation results. One is from a test conducted several months ago (50.9), and the other is from today's test (50.4). A couple of days ago, I got a score of 50.0 during testing, but I have already deleted that log.

@czczup
Copy link
Contributor

czczup commented Dec 12, 2024

My cmd is:

torchrun --nproc-per-node=8 run.py --data BLINK --model InternVL2-8B

Also worth noting is that I configured the OpenAI key.

@David-BominWei
Copy link
Author

David-BominWei commented Dec 12, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants