CenterforOpenScience: add ReplicatorBench benchmark by masonmq · Pull Request #156 · princeton-pli/hal-harness

masonmq · 2026-03-05T22:49:39Z

From Center for Open Science:
1.Adds the ReplicatorBench benchmark implementation under hal/benchmarks/replicatorbench/.
2.Registers replicatorbench in hal/benchmark_manager.py.
3.Includes tasks and schemas/templates needed for evaluation.

Thank you!
ReplicatorBench [preprint]
Center for Open Science

masonmq · 2026-03-05T23:15:51Z

Hi HAL team @benediktstroebl @ksayash @schwartzadev :

This PR:

Adds the ReplicatorBench benchmark under hal/benchmarks/replicatorbench/
Registers replicatorbench in hal/benchmark_manager.py
Includes tasks + schemas/templates needed for evaluation.

Note: this benchmark submission is intentionally separated from the ReplicatorBench Agent submission. We are preparing the ReplicatorBench Agent as a separate PR so the benchmark review stays focused and easier to review.

Thank you!
ReplicatorBench [preprint]
Center for Open Science

masonmq · 2026-03-27T00:57:00Z

Hi HAL team @benediktstroebl , @ksayash , @schwartzadev :

Gentle follow-up on this PR. Our benchmark code is ready for review. We have validated that the our benchmark runs successfully using hal-eval in the harness.

We've also prepared our Agent as a separate PR so this benchmark review stays focused and easier to review.

Thank you!

masonmq · 2026-04-07T02:54:30Z

Hi HAL team :

Gentle follow-up on this PR again.

Our benchmark code is ready for review. We have validated that the our benchmark runs successfully using hal-eval in the harness.

We've also prepared our Agent as a separate PR so this benchmark review stays focused and easier to review.

Thank you!

CenterforOpenScience: add ReplicatorBench benchmark

751be8f

masonmq added 4 commits March 7, 2026 01:31

Fix submission issues

4ff65b2

update replicatorbench.py

9e88800

update setup.sh

6a68542

to support WANDB

cb60a57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CenterforOpenScience: add ReplicatorBench benchmark#156

CenterforOpenScience: add ReplicatorBench benchmark#156
masonmq wants to merge 5 commits intoprinceton-pli:mainfrom
masonmq:replicatorbench-submit

masonmq commented Mar 5, 2026 •

edited

Loading

Uh oh!

masonmq commented Mar 5, 2026 •

edited

Loading

Uh oh!

masonmq commented Mar 27, 2026

Uh oh!

masonmq commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

masonmq commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

masonmq commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

masonmq commented Mar 27, 2026

Uh oh!

masonmq commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

masonmq commented Mar 5, 2026 •

edited

Loading

masonmq commented Mar 5, 2026 •

edited

Loading