-
Notifications
You must be signed in to change notification settings - Fork 285
Open
Description
First of all, thank you so much for open-sourcing such an impressive system!
I just had a couple questions regarding reproducibility:
- Is the entirety of the Co-Sight v2.0.0 code open-sourced, such that the reported GAIA test set score of 84.05% (https://huggingface.co/spaces/gaia-benchmark/leaderboard) can be reproduced directly from this repository?
- If so, could you share guidance or recommended steps for running this repository on the GAIA benchmark?
- Was the 84.05% score achieved using pass@1?
I did notice there is a GAIA branch, but since its last update was back in June, I wanted to confirm whether it’s still the recommended way to evaluate on GAIA, or if there are any additional scripts or settings we should be aware of.
Any pointers or documentation would be greatly appreciated, I’d love to help reproduce these results🙏.
首先,非常感谢你们开源如此令人印象深刻的系统!
我有几个关于结果复现的问题:
- Co-Sight v2.0.0 的全部代码是否已经完全开源,以便能够直接通过本仓库复现报告中的 GAIA 测试集 84.05% 的成绩 (https://huggingface.co/spaces/gaia-benchmark/leaderboard) ?
- 如果是,能否提供运行本仓库以在 GAIA 基准上评测的指导或推荐步骤
- 这个 84.05% 的成绩是否是通过 pass@1 计算得到的?
我注意到仓库里有一个 GAIA 分支,但它最后一次更新是在六月,因此想确认它是否仍然是推荐的 GAIA 评测方式,或者是否有额外的脚本或配置需要注意。
任何提示或文档都将不胜感激,我很乐意帮忙复现这些结果🙏。
Tom-0727, bennmann, lixingjia77 and tohsaka888BetterAndBetterII, rentongxue and jiange91
Metadata
Metadata
Assignees
Labels
No labels