-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not able to reproduce InternVL-8b x Blink result #649
Comments
Hello, indeed, I just reproduced the test and achieved a score of 55.0, but the test log from a few months ago shows a score of 50.9. I'm not sure what happened in between. I often encounter situations where I can't reproduce the old score after a few months. 😭 |
Thank you for your reply. Could you give me some more details about reproducing the 55.0 results? (e.g. the command for running the test or the actual output from InternVL(?)) I wonder if it is a ChatGPT version issue or something else. Thank you VERY much for your help |
Hello, here are my evaluation results. One is from a test conducted several months ago (50.9), and the other is from today's test (50.4). A couple of days ago, I got a score of 50.0 during testing, but I have already deleted that log. |
My cmd is: torchrun --nproc-per-node=8 run.py --data BLINK --model InternVL2-8B Also worth noting is that I configured the OpenAI key. |
Got it, thank you VERY much!
…--
发自我的网易邮箱平板适配版
在 2024-12-12 13:44:55,"Zhe Chen" ***@***.***> 写道:
InternVL2-8B_BLINK.zip
Hello, here are my evaluation results. One is from a test conducted several months ago (50.9), and the other is from today's test (50.4). A couple of days ago, I got a score of 50.0 during testing, but I have already deleted that log.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Hi, I tried to reproduce the Blink evaluation result. The result I got is different from the result on the leaderboard and InternVL documentation.
您好,我在试图复刻Blink数据集的结果的时候发现了有0.8%的差异,请问这个差异是否来源于ChatGPT的不同版本
Here are the commands I used for evaluation:
This is the result I got:
This is the reported result:
My transformers version is
transformers==4.37.0
My nvcc version is
The text was updated successfully, but these errors were encountered: