Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions resources_servers/grl_tetris/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# GRL Tetris Resource Server

FastAPI-based resource server that exposes the GRL Tetris environment through NeMo Gym conventions. The environment logic lives under `resources_servers/grl_tetris/tetris_env` and is a standalone adaptation of the upstream GRL implementation.

## Why it exists
- **Domain**: Classic falling-block Tetris on a configurable grid.
- **Evaluation**: Agents must clear at least one line; `/verify` rewards the cumulative score only when the environment reports success.
- **Independence**: No runtime dependency on the GRL repository—the environment is vendored and self-contained.

## Setup

Please follow the setup instructions as outlined in: https://github.com/NVIDIA-NeMo/Gym/blob/main/docs/tutorials/02-setup.md#step-1-clone-and-install.

## Running
Spin up the server alongside a compatible agent:
```bash
config_paths="responses_api_models/openai_model/configs/openai_model.yaml,\
resources_servers/grl_tetris/configs/grl_tetris.yaml"
ng_run "+config_paths=[$config_paths]"
```

Collect trajectories:
```bash
ng_collect_rollouts +agent_name=grl_tetris_simple_agent \
+input_jsonl_fpath=resources_servers/grl_tetris/data/example.jsonl \
+output_jsonl_fpath=resources_servers/grl_tetris/data/example_rollouts.jsonl \
+limit=5
```

Launch the rollout viewer:
```bash
ng_viewer +jsonl_fpath=resources_servers/grl_tetris/data/example_rollouts.jsonl
```

## Tests
```bash
pytest resources_servers/grl_tetris/tests
```

## Licensing
- Code: Apache 2.0
- Data: Apache 2.0

---

## Reward Profiling Results

### Qwen3-4B

**Dataset**: 3,200 rollouts (200 prompts × 16 repeats)

**Performance Metrics**:
- **Success Rate**: 5.09% (163/3,200 rollouts)
- **Mean Reward**: -0.29 (range: -2.00 to 19.20)
- **Median Reward**: -0.80

**Key Findings**:
- Most rollouts (21%) received reward of -0.90 (piece dropped without clearing lines)
- Successful line clears achieved rewards of ~9.0-9.2
- Average 7.48 tool calls per rollout
- Weak negative correlation between tool calls and reward (-0.06)

**Top Reward Distribution**:
- `-0.9`: 672 rollouts (21.0%) - piece dropped, no line clear
- `-0.8`: 603 rollouts (18.8%)
- `-0.7`: 495 rollouts (15.5%)
- `9.1`: 29 rollouts (0.9%) - successful line clear
- `8.9`: 26 rollouts (0.8%)

The relatively low success rate (5.09%) suggests that Tetris line-clearing is challenging for the model, requiring precise spatial reasoning and action sequencing. Most rollouts result in pieces dropping without clearing lines (negative rewards from -0.1 per action step).
226 changes: 226 additions & 0 deletions resources_servers/grl_tetris/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import annotations

from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional, Union

import numpy as np
from fastapi import FastAPI, HTTPException, Request
from pydantic import BaseModel, Field

from nemo_gym.base_resources_server import (
BaseResourcesServerConfig,
BaseSeedSessionRequest,
BaseSeedSessionResponse,
BaseVerifyRequest,
BaseVerifyResponse,
SimpleResourcesServer,
)
from nemo_gym.server_utils import SESSION_ID_KEY, ServerClient
from resources_servers.grl_tetris.tetris_env import TetrisEnv


DEFAULT_GRID_LOOKUP = {0: "_", 1: "#", 2: "X"}
DEFAULT_ACTION_LOOKUP = {0: "Left", 1: "Right", 2: "Down"}


class GrlTetrisResourcesServerConfig(BaseResourcesServerConfig):
env_config: Dict[str, Any] = Field(
default_factory=lambda: {
"grid_lookup": DEFAULT_GRID_LOOKUP,
"action_lookup": DEFAULT_ACTION_LOOKUP,
"render_mode": "text",
"dim_x": 4,
"dim_y": 4,
"box_type": 3,
}
)


class GrlTetrisSeedSessionRequest(BaseSeedSessionRequest):
seed: Optional[int] = None


class GrlTetrisSeedSessionResponse(BaseSeedSessionResponse):
observation: str


class GrlTetrisStepRequest(BaseModel):
actions: List[Union[str, int]] = Field(default_factory=list)


class GrlTetrisStepTrace(BaseModel):
action_id: int
action_label: str
reward: float
done: bool
info: Dict[str, Any]


class GrlTetrisStepResponse(BaseModel):
observation: str
reward: float
total_reward: float
done: bool
steps: List[GrlTetrisStepTrace]
history: List[GrlTetrisStepTrace] = Field(default_factory=list)


class GrlTetrisVerifyResponse(BaseVerifyResponse):
success: bool


@dataclass
class TetrisSessionState:
env: Any
observation: str
total_reward: float = 0.0
done: bool = False
last_info: Dict[str, Any] = field(default_factory=dict)
history: List[GrlTetrisStepTrace] = field(default_factory=list)


class GrlTetrisResourcesServer(SimpleResourcesServer):
config: GrlTetrisResourcesServerConfig
server_client: ServerClient
session_id_to_state: Dict[str, TetrisSessionState] = Field(default_factory=dict)

def setup_webserver(self) -> FastAPI:
app = super().setup_webserver()
app.post("/step")(self.step)
return app

def _create_env(self) -> TetrisEnv:
return TetrisEnv(self.config.env_config)

async def seed_session(self, request: Request, body: GrlTetrisSeedSessionRequest) -> GrlTetrisSeedSessionResponse:
session_id = request.session[SESSION_ID_KEY]
env = self._create_env()
observation = env.reset(seed=body.seed)

self.session_id_to_state[session_id] = TetrisSessionState(
env=env,
observation=observation,
)
return GrlTetrisSeedSessionResponse(observation=observation)

async def step(self, request: Request, body: GrlTetrisStepRequest) -> GrlTetrisStepResponse:
session_id = request.session.get(SESSION_ID_KEY)
if session_id is None or session_id not in self.session_id_to_state:
raise HTTPException(status_code=400, detail="Session not initialized. Call /seed_session first.")

session_state = self.session_id_to_state[session_id]
env = session_state.env

reverse_lookup = {label.lower(): idx for idx, label in env.ACTION_LOOKUP.items()}
total_step_reward = 0.0
steps: List[GrlTetrisStepTrace] = []

if session_state.done:
return GrlTetrisStepResponse(
observation=session_state.observation,
reward=0.0,
total_reward=session_state.total_reward,
done=True,
steps=[],
history=list(session_state.history),
)

for action in body.actions:
action_id = self._parse_action(action, reverse_lookup)
if action_id not in env.ACTION_LOOKUP:
raise HTTPException(status_code=400, detail=f"Invalid action identifier: {action}")

next_obs, reward, done, info = env.step(action_id)
info = self._to_python_types(info)
total_step_reward += reward
session_state.total_reward += reward
session_state.observation = next_obs
session_state.last_info = info
session_state.done = bool(done)

step = GrlTetrisStepTrace(
action_id=action_id,
action_label=env.ACTION_LOOKUP[action_id],
reward=reward,
done=session_state.done,
info=info,
)
session_state.history.append(step)
steps.append(step)

if session_state.done:
break

return GrlTetrisStepResponse(
observation=session_state.observation,
reward=total_step_reward,
total_reward=session_state.total_reward,
done=session_state.done,
steps=steps,
history=list(session_state.history),
)

async def verify(self, request: Request, body: BaseVerifyRequest) -> GrlTetrisVerifyResponse:
session_id = request.session.get(SESSION_ID_KEY)
session_state = self.session_id_to_state.get(session_id)

success = False
reward = 0.0
if session_state is not None:
success = bool(session_state.last_info.get("success"))
reward = session_state.total_reward

if session_id in self.session_id_to_state:
try:
session_state.env.close() # type: ignore[union-attr]
except Exception: # pragma: no cover - defensive cleanup
pass
del self.session_id_to_state[session_id]

return GrlTetrisVerifyResponse(
**body.model_dump(),
reward=reward,
success=success,
)

@staticmethod
def _parse_action(action: Union[str, int], reverse_lookup: Dict[str, int]) -> int:
if isinstance(action, int):
return action

candidate = action.strip()
lower_candidate = candidate.lower()
if lower_candidate in reverse_lookup:
return reverse_lookup[lower_candidate]

try:
return int(candidate)
except ValueError as exc:
raise HTTPException(status_code=400, detail=f"Unable to parse action: {action}") from exc

@staticmethod
def _to_python_types(obj: Any) -> Any:
if isinstance(obj, dict):
return {k: GrlTetrisResourcesServer._to_python_types(v) for k, v in obj.items()}
if isinstance(obj, list):
return [GrlTetrisResourcesServer._to_python_types(v) for v in obj]
if isinstance(obj, np.generic):
return obj.item()
return obj


if __name__ == "__main__":
GrlTetrisResourcesServer.run_webserver()
27 changes: 27 additions & 0 deletions resources_servers/grl_tetris/configs/grl_tetris.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
grl_tetris_resources_server:
resources_servers:
grl_tetris:
entrypoint: app.py
domain: games
verified: false
grl_tetris_simple_agent:
responses_api_agents:
simple_agent:
entrypoint: app.py
max_steps: 10
resources_server:
type: resources_servers
name: grl_tetris_resources_server
model_server:
type: responses_api_models
name: policy_model
datasets:
- name: example
type: example
jsonl_fpath: resources_servers/grl_tetris/data/example.jsonl
num_repeats: 1
gitlab_identifier:
dataset_name: grl_tetris
version: 0.0.1
artifact_fpath: example.jsonl
license: Apache 2.0
5 changes: 5 additions & 0 deletions resources_servers/grl_tetris/data/example.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{"game_id": 1, "seed": 93810, "dim_board": [5, 5], "box_type": 0, "responses_create_params": {"input": [{"role": "developer", "content": "You are a Tetris-playing assistant. IMPORTANT: First call the `step` tool with an empty array [] to see the initial board state and active piece. Example: step({\"actions\": []}). The tool will return an ASCII board using '_' for empty cells, '#' for locked blocks, and 'X' for the active piece. Then continue calling `step` with valid actions (Left, Right, Down) until you clear a line or the board locks out. At the end, respond with <answer>Action1 || Action2 || ...</answer> summarizing all moves you made."}, {"role": "user", "content": "Call the step tool to see the board, then play Tetris to clear at least one line if possible."}], "tools": [{"name": "step", "type": "function", "description": "Execute Tetris moves sequentially. Call with empty array [] to see current board state without moving.", "strict": true, "parameters": {"type": "object", "properties": {"actions": {"type": "array", "items": {"type": "string"}, "description": "Sequence of actions, e.g. ['Left', 'Down']. Use empty array [] to view current state."}}, "required": ["actions"], "additionalProperties": false}}]}}
{"game_id": 2, "seed": 46185, "dim_board": [4, 6], "box_type": 1, "responses_create_params": {"input": [{"role": "developer", "content": "You are a Tetris-playing assistant. IMPORTANT: First call the `step` tool with an empty array [] to see the initial board state and active piece. Example: step({\"actions\": []}). The tool will return an ASCII board using '_' for empty cells, '#' for locked blocks, and 'X' for the active piece. Then continue calling `step` with valid actions (Left, Right, Down) until you clear a line or the board locks out. At the end, respond with <answer>Action1 || Action2 || ...</answer> summarizing all moves you made."}, {"role": "user", "content": "Call the step tool to see the board, then play Tetris to clear at least one line if possible."}], "tools": [{"name": "step", "type": "function", "description": "Execute Tetris moves sequentially. Call with empty array [] to see current board state without moving.", "strict": true, "parameters": {"type": "object", "properties": {"actions": {"type": "array", "items": {"type": "string"}, "description": "Sequence of actions, e.g. ['Left', 'Down']. Use empty array [] to view current state."}}, "required": ["actions"], "additionalProperties": false}}]}}
{"game_id": 3, "seed": 28563, "dim_board": [5, 5], "box_type": 0, "responses_create_params": {"input": [{"role": "developer", "content": "You are a Tetris-playing assistant. IMPORTANT: First call the `step` tool with an empty array [] to see the initial board state and active piece. Example: step({\"actions\": []}). The tool will return an ASCII board using '_' for empty cells, '#' for locked blocks, and 'X' for the active piece. Then continue calling `step` with valid actions (Left, Right, Down) until you clear a line or the board locks out. At the end, respond with <answer>Action1 || Action2 || ...</answer> summarizing all moves you made."}, {"role": "user", "content": "Call the step tool to see the board, then play Tetris to clear at least one line if possible."}], "tools": [{"name": "step", "type": "function", "description": "Execute Tetris moves sequentially. Call with empty array [] to see current board state without moving.", "strict": true, "parameters": {"type": "object", "properties": {"actions": {"type": "array", "items": {"type": "string"}, "description": "Sequence of actions, e.g. ['Left', 'Down']. Use empty array [] to view current state."}}, "required": ["actions"], "additionalProperties": false}}]}}
{"game_id": 4, "seed": 87808, "dim_board": [6, 5], "box_type": 0, "responses_create_params": {"input": [{"role": "developer", "content": "You are a Tetris-playing assistant. IMPORTANT: First call the `step` tool with an empty array [] to see the initial board state and active piece. Example: step({\"actions\": []}). The tool will return an ASCII board using '_' for empty cells, '#' for locked blocks, and 'X' for the active piece. Then continue calling `step` with valid actions (Left, Right, Down) until you clear a line or the board locks out. At the end, respond with <answer>Action1 || Action2 || ...</answer> summarizing all moves you made."}, {"role": "user", "content": "Call the step tool to see the board, then play Tetris to clear at least one line if possible."}], "tools": [{"name": "step", "type": "function", "description": "Execute Tetris moves sequentially. Call with empty array [] to see current board state without moving.", "strict": true, "parameters": {"type": "object", "properties": {"actions": {"type": "array", "items": {"type": "string"}, "description": "Sequence of actions, e.g. ['Left', 'Down']. Use empty array [] to view current state."}}, "required": ["actions"], "additionalProperties": false}}]}}
{"game_id": 5, "seed": 14453, "dim_board": [5, 5], "box_type": 1, "responses_create_params": {"input": [{"role": "developer", "content": "You are a Tetris-playing assistant. IMPORTANT: First call the `step` tool with an empty array [] to see the initial board state and active piece. Example: step({\"actions\": []}). The tool will return an ASCII board using '_' for empty cells, '#' for locked blocks, and 'X' for the active piece. Then continue calling `step` with valid actions (Left, Right, Down) until you clear a line or the board locks out. At the end, respond with <answer>Action1 || Action2 || ...</answer> summarizing all moves you made."}, {"role": "user", "content": "Call the step tool to see the board, then play Tetris to clear at least one line if possible."}], "tools": [{"name": "step", "type": "function", "description": "Execute Tetris moves sequentially. Call with empty array [] to see current board state without moving.", "strict": true, "parameters": {"type": "object", "properties": {"actions": {"type": "array", "items": {"type": "string"}, "description": "Sequence of actions, e.g. ['Left', 'Down']. Use empty array [] to view current state."}}, "required": ["actions"], "additionalProperties": false}}]}}
8 changes: 8 additions & 0 deletions resources_servers/grl_tetris/data/example_metrics.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"name": "example",
"type": "example",
"jsonl_fpath": "resources_servers/grl_tetris/data/example.jsonl",
"gitlab_identifier": null,
"license": "Apache 2.0",
"Number of examples": 5
}
Loading