Skip to content

Add GRL Tetris resource server#578

Open
yixinhuang48 wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
yixinhuang48:feature/grl-tetris-integration
Open

Add GRL Tetris resource server#578
yixinhuang48 wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
yixinhuang48:feature/grl-tetris-integration

Conversation

@yixinhuang48
Copy link
Collaborator

@yixinhuang48 yixinhuang48 commented Jan 12, 2026

Contributing To NeMo-Gym (GRL Tetris Resource Server)

1) Necessary information

i. Corresponding dataset on the spreadsheet

  • N/A

ii. Description of the prompt (source + domain)

  • Domain: Classic falling-block Tetris (grid-based game; tool-use agent).
  • Source: Synthetic prompts generated programmatically for board configurations (seeds, sizes, piece sets). Prompts instruct the agent to use the step tool and clear at least one line.

iii. Description of the environment

  • A vendored, self-contained Tetris environment under resources_servers/grl_tetris/tetris_env, modified from the GRL repo implementation.
  • Configurable board dimensions (4x4–6x6) and piece sets (box_type 0–3).
  • Observation: ASCII board with _ empty, # locked, X active piece.
  • Actions: Left, Right, Down.
  • FastAPI resource server following NeMo Gym conventions.

iv. Description of the verifier

  • Verifier is the environment: success=true when a line clear occurs; cumulative reward is returned only on success; otherwise zero.
  • /verify computes final reward and cleans up per-session state.

v. Legal approval status

  • Code: Apache 2.0.
  • Data: Synthetic, programmatically generated (Apache 2.0).
  • No third-party runtime data included.

2) Simple correctness check

i. Commands used to run the server for the uploaded data

# Start NeMo Gym servers (agent + Tetris)
config_paths="responses_api_models/openai_model/configs/openai_model.yaml,\
resources_servers/grl_tetris/configs/grl_tetris.yaml"
ng_run "+config_paths=[$config_paths]"

# Collect 5 rollouts with the sample dataset
ng_collect_rollouts +agent_name=grl_tetris_game_agent \
  +input_jsonl_fpath=resources_servers/grl_tetris/data/example.jsonl \
  +output_jsonl_fpath=resources_servers/grl_tetris/data/example_rollouts.jsonl \
  +limit=5

# View rollouts
ng_viewer +jsonl_fpath=resources_servers/grl_tetris/data/example_rollouts.jsonl

ii. Resulting rollout and judges (5 examples)

  • See resources_servers/grl_tetris/data/example_rollouts.jsonl
  • Expected behavior:
    • Successful line clear → reward ≈ 9.0–9.2, success=true
    • No line clear → negative step penalties (e.g., -0.1 per step), success=false

iii. Additional notes for running the server properly

  • Must call /seed_session before /step.
  • Actions accepted as labels or indices ("Left", "Right", "Down" or 0/1/2).
  • Session cookies are maintained by the middleware; the agent path handles cookie propagation automatically.
  • Large rollout artifacts (e.g., rollouts.jsonl) are gitignored; do not commit them.

3) Tests

Test files / command to run tests

# Resource server tests
pytest resources_servers/grl_tetris/tests -q

Notes on coverage / responsibilities

  • Tetris server tests: seed/step flow, action parsing, done handling, verify success/failure, cleanup.
  • Game agent tests: /v1/responses tool-call loop (done handling), end-to-end /run path that seeds → responds → verify.

4) Reward profiling

Models

Method

  • Test set: 200 prompts, 16 rollouts per prompt (3,200 total).
  • Tool calling enabled; agent loops until done or max_steps; reward aggregated from env.

Commands

# Auto pipeline (Qwen3-4B): runs vLLM + servers + collection + analysis
cd resources_servers/grl_tetris
./run_qwen3_4b_eval_loop.sh  # or ./run_qwen3_4b_eval.sh

# Manual analysis (any model/output)
python analyze_rewards.py \
  --rollouts-path resources_servers/grl_tetris/data/qwen3_4b_eval/rollouts.jsonl \
  --model-name "Qwen3-4B" \
  --output resources_servers/grl_tetris/data/qwen3_4b_eval/reward-analysis.md

Results (Qwen3-4B, 3,200 rollouts)

  • Success rate: 5.09% (163/3,200)
  • Mean reward: -0.29 (min -2.00, max 19.20; median -0.80)
  • Average tool calls/rollout: 7.48
  • Tool calls ↔ reward correlation: -0.06 (weak negative)
  • Report: resources_servers/grl_tetris/data/qwen3_4b_eval/reward-analysis.md

5) Training results

Screenshot 2025-11-25 at 5 23 20 PM Screenshot 2025-11-25 at 5 23 33 PM

Ran GRPO training using Qwen3-4b-instruct for 3200 training examples and 800 validation examples (4x4 type 1 Tetris configuration).

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 12, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yixinhuang48 yixinhuang48 force-pushed the feature/grl-tetris-integration branch 4 times, most recently from 287bfb6 to 5cae458 Compare January 12, 2026 21:40
@yixinhuang48
Copy link
Collaborator Author

Uses the same modified version of simple_agent app.py as in #564.

@yixinhuang48
Copy link
Collaborator Author

@cmunley1 @bxyu-nvidia this is the updated PR for the one that I closed (#260).

- resources_servers/grl_tetris: environment, config, tests, data
- Tetris game environment with step/verify endpoints
- Example data and test examples generator

Verified DCO and cryptographic signing.

Signed-off-by: yixin <yixinhuang48@gmail.com>
@yixinhuang48 yixinhuang48 force-pushed the feature/grl-tetris-integration branch from 5cae458 to 15e962f Compare January 12, 2026 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant