Add GRL Tetris resource server#578
Open
yixinhuang48 wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
Open
Conversation
287bfb6 to
5cae458
Compare
Collaborator
Author
|
Uses the same modified version of simple_agent app.py as in #564. |
Collaborator
Author
|
@cmunley1 @bxyu-nvidia this is the updated PR for the one that I closed (#260). |
- resources_servers/grl_tetris: environment, config, tests, data - Tetris game environment with step/verify endpoints - Example data and test examples generator Verified DCO and cryptographic signing. Signed-off-by: yixin <yixinhuang48@gmail.com>
5cae458 to
15e962f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Contributing To NeMo-Gym (GRL Tetris Resource Server)
1) Necessary information
i. Corresponding dataset on the spreadsheet
ii. Description of the prompt (source + domain)
steptool and clear at least one line.iii. Description of the environment
resources_servers/grl_tetris/tetris_env, modified from the GRL repo implementation.box_type0–3)._empty,#locked,Xactive piece.Left,Right,Down.iv. Description of the verifier
success=truewhen a line clear occurs; cumulative reward is returned only on success; otherwise zero./verifycomputes final reward and cleans up per-session state.v. Legal approval status
2) Simple correctness check
i. Commands used to run the server for the uploaded data
ii. Resulting rollout and judges (5 examples)
resources_servers/grl_tetris/data/example_rollouts.jsonlsuccess=true-0.1per step),success=falseiii. Additional notes for running the server properly
/seed_sessionbefore/step."Left","Right","Down"or0/1/2).rollouts.jsonl) are gitignored; do not commit them.3) Tests
Test files / command to run tests
# Resource server tests pytest resources_servers/grl_tetris/tests -qNotes on coverage / responsibilities
/v1/responsestool-call loop (done handling), end-to-end/runpath that seeds → responds → verify.4) Reward profiling
Models
Method
doneormax_steps; reward aggregated from env.Commands
Results (Qwen3-4B, 3,200 rollouts)
resources_servers/grl_tetris/data/qwen3_4b_eval/reward-analysis.md5) Training results
Ran GRPO training using Qwen3-4b-instruct for 3200 training examples and 800 validation examples (4x4 type 1 Tetris configuration).