CenterforOpenScience: add ReplicatorBench benchmark#156
CenterforOpenScience: add ReplicatorBench benchmark#156masonmq wants to merge 5 commits intoprinceton-pli:mainfrom
Conversation
|
Hi HAL team @benediktstroebl @ksayash @schwartzadev : This PR:
Note: this benchmark submission is intentionally separated from the ReplicatorBench Agent submission. We are preparing the ReplicatorBench Agent as a separate PR so the benchmark review stays focused and easier to review. Thank you! |
|
Hi HAL team @benediktstroebl , @ksayash , @schwartzadev : Gentle follow-up on this PR. Our benchmark code is ready for review. We have validated that the our benchmark runs successfully using We've also prepared our Agent as a separate PR so this benchmark review stays focused and easier to review. Thank you! |
|
Hi HAL team : Gentle follow-up on this PR again. Our benchmark code is ready for review. We have validated that the our benchmark runs successfully using hal-eval in the harness. We've also prepared our Agent as a separate PR so this benchmark review stays focused and easier to review. Thank you! |
From Center for Open Science:
1.Adds the ReplicatorBench benchmark implementation under hal/benchmarks/replicatorbench/.
2.Registers replicatorbench in hal/benchmark_manager.py.
3.Includes tasks and schemas/templates needed for evaluation.
Thank you!
ReplicatorBench [preprint]
Center for Open Science