Skip to content

CenterforOpenScience: add ReplicatorBench benchmark#156

Open
masonmq wants to merge 5 commits intoprinceton-pli:mainfrom
masonmq:replicatorbench-submit
Open

CenterforOpenScience: add ReplicatorBench benchmark#156
masonmq wants to merge 5 commits intoprinceton-pli:mainfrom
masonmq:replicatorbench-submit

Conversation

@masonmq
Copy link
Copy Markdown

@masonmq masonmq commented Mar 5, 2026

From Center for Open Science:
1.Adds the ReplicatorBench benchmark implementation under hal/benchmarks/replicatorbench/.
2.Registers replicatorbench in hal/benchmark_manager.py.
3.Includes tasks and schemas/templates needed for evaluation.

Thank you!
ReplicatorBench [preprint]
Center for Open Science

@masonmq
Copy link
Copy Markdown
Author

masonmq commented Mar 5, 2026

Hi HAL team @benediktstroebl @ksayash @schwartzadev :

This PR:

  1. Adds the ReplicatorBench benchmark under hal/benchmarks/replicatorbench/
  2. Registers replicatorbench in hal/benchmark_manager.py
  3. Includes tasks + schemas/templates needed for evaluation.

Note: this benchmark submission is intentionally separated from the ReplicatorBench Agent submission. We are preparing the ReplicatorBench Agent as a separate PR so the benchmark review stays focused and easier to review.

Thank you!
ReplicatorBench [preprint]
Center for Open Science

@masonmq
Copy link
Copy Markdown
Author

masonmq commented Mar 27, 2026

Hi HAL team @benediktstroebl , @ksayash , @schwartzadev :

Gentle follow-up on this PR. Our benchmark code is ready for review. We have validated that the our benchmark runs successfully using hal-eval in the harness.

We've also prepared our Agent as a separate PR so this benchmark review stays focused and easier to review.

Thank you!

@masonmq
Copy link
Copy Markdown
Author

masonmq commented Apr 7, 2026

Hi HAL team :

Gentle follow-up on this PR again.

Our benchmark code is ready for review. We have validated that the our benchmark runs successfully using hal-eval in the harness.

We've also prepared our Agent as a separate PR so this benchmark review stays focused and easier to review.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant