Skip to content

Conversation

@yonromai
Copy link
Contributor

@yonromai yonromai commented Jan 9, 2026

Fix #2306

Status: Blocked on chris/exp-rl being out of sync with main. See Discord discussion here.

@yonromai yonromai force-pushed the romain/surface-rljob-errors branch 2 times, most recently from 477dc1c to 9ea19d2 Compare January 15, 2026 22:08
@yonromai yonromai changed the base branch from chris/exp-rl to main January 15, 2026 22:08
@yonromai
Copy link
Contributor Author

I tested this on marin-us-central2:
image

Bottom run ^ doesn't have this fix, runs for ever (I had to kill it) and the top run, includes this fix, failed after 3 attempts.

@yonromai yonromai marked this pull request as ready for review January 15, 2026 22:10
Copy link
Contributor

@ahmeda14960 ahmeda14960 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yonromai yonromai force-pushed the romain/surface-rljob-errors branch from 9ea19d2 to 8a9f5e1 Compare January 16, 2026 02:15
@yonromai yonromai merged commit e07bde4 into main Jan 16, 2026
8 checks passed
@yonromai yonromai deleted the romain/surface-rljob-errors branch January 16, 2026 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RLJob misreports success when OOMing

3 participants