Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions docs/environment-tutorials/aviary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
(environment-aviary)=

# Aviary

Integration with [Future-House/aviary](https://github.com/Future-House/aviary), a gymnasium for defining custom language agent RL environments.

Aviary is a framework for building custom RL environments with tool use and multi-step reasoning. Environments built in Aviary can be ran through NeMo Gym for training and inference. The library features pre-existing environments on math, general knowledge, biological sequences, scientific literature search, and protein stability.

---

## Available Environments

The integration includes several pre-built Aviary environments:

- **GSM8K** (`gsm8k_app.py`) - Grade school math problems with calculator tool
- **HotPotQA** (`hotpotqa_app.py`) - Multi-hop question answering
- **BixBench** (`notebook_app.py`) - Jupyter notebook execution for scientific tasks
- **Client/Proxy** (`client_app.py`) - Generic interface to remote Aviary dataset servers

---

## Example Usage

### GSM8K Environment

Run the GSM8K Aviary resources server with a model config:

```bash
ng_run "+config_paths=[resources_servers/aviary/configs/gsm8k_aviary.yaml,responses_api_models/vllm_model/configs/vllm_model.yaml]"
```

Collect rollouts:

```bash
ng_collect_rollouts \
+agent_name=gsm8k_aviary_agent \
+input_jsonl_fpath=resources_servers/aviary/data/example.jsonl \
+output_jsonl_fpath=resources_servers/aviary/data/example_rollouts.jsonl
```

---

## Reference

- [Aviary GitHub](https://github.com/Future-House/aviary) - Official Aviary repository
- [Aviary Paper](https://arxiv.org/abs/2412.21154) - Training language agents on challenging scientific tasks
- `resources_servers/aviary/` - NeMo Gym resources server implementations
- `responses_api_agents/aviary_agent/` - NeMo Gym aviary agent integration
40 changes: 39 additions & 1 deletion docs/environment-tutorials/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,43 @@ Scale environments across machines with containers.

::::

### Integrations

::::{grid} 1 2 3 3
:gutter: 2

:::{grid-item-card} {octicon}`light-bulb;1.5em;sd-mr-1` Reasoning Gym
:link: reasoning-gym
:link-type: doc

100+ procedurally generated reasoning tasks across multiple domains.

+++
{bdg-secondary}`integration` {bdg-secondary}`15-20 min`
:::

:::{grid-item-card} {octicon}`beaker;1.5em;sd-mr-1` Aviary
:link: aviary
:link-type: doc

Custom language agent environments for scientific and reasoning tasks.

+++
{bdg-secondary}`integration` {bdg-secondary}`10-15 min`
:::

:::{grid-item-card} {octicon}`package;1.5em;sd-mr-1` Verifiers
:link: verifiers
:link-type: doc

600+ environments from Prime Intellect's Environments Hub.

+++
{bdg-secondary}`integration` {bdg-secondary}`20 min`
:::

::::

---

## Learning Path
Expand Down Expand Up @@ -162,8 +199,9 @@ NeMo Gym includes working examples in `resources_servers/`:
| `calendar/` | Multi-turn | State comparison |
| `equivalence_llm_judge/` | Single-step | LLM judge with swap check |
| `math_with_judge/` | Single-step | Library + judge fallback |
| `aviary/` | Multi-step | Aviary environment integration |
| `aviary/` | Multi-step | Aviary framework integration |
| `workplace_assistant/` | Multi-step | Session state, tool routing |
| `reasoning_gym/` | Single-step | Algorithmic verification with reasoning-gym library |

:::{tip}
Use `ng_init_resources_server +entrypoint=resources_servers/my_env` to scaffold a new environment from a template.
Expand Down
109 changes: 109 additions & 0 deletions docs/environment-tutorials/reasoning-gym.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
(environment-reasoning-gym)=

# Reasoning Gym

Integration with [open-thought/reasoning-gym](https://github.com/open-thought/reasoning-gym), a library of procedural dataset generators and algorithmically verifiable reasoning environments.

Reasoning Gym provides 100+ tasks over many domains including algebra, arithmetic, computation, cognition, geometry, graph theory, logic, and common games. Tasks are procedurally generated with adjustable complexity and algorithmically verified.

---

## Dataset Preparation

The integration includes a helper script for creating datasets from reasoning gym tasks.

**Single task:**
```bash
python resources_servers/reasoning_gym/scripts/create_dataset.py \
--task knights_knaves \
--size 500 \
--seed 42 \
--output resources_servers/reasoning_gym/data/train_knights_knaves.jsonl
```

**Multiple tasks (composite):**
```bash
python resources_servers/reasoning_gym/scripts/create_dataset.py \
--tasks knights_knaves,syllogisms,leg_counting \
--size 1000 \
--output resources_servers/reasoning_gym/data/train_composite.jsonl
```

**All tasks in a category:**
```bash
python resources_servers/reasoning_gym/scripts/create_dataset.py \
--category logic \
--size 1000 \
--output resources_servers/reasoning_gym/data/train_logic.jsonl
```

**All available tasks:**
```bash
python resources_servers/reasoning_gym/scripts/create_dataset.py \
--all-tasks \
--size 1000 \
--output resources_servers/reasoning_gym/data/train_all.jsonl
```

**With custom task configuration:**
```bash
python resources_servers/reasoning_gym/scripts/create_dataset.py \
--task knights_knaves \
--size 500 \
--config '{"n_people": 3, "depth_constraint": 3}' \
--output resources_servers/reasoning_gym/data/train_hard.jsonl
```

---

## Rollout Collection

### Start vLLM Server

```bash
pip install -U "vllm>=0.12.0"

wget https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/resolve/main/nano_v3_reasoning_parser.py

vllm serve nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \
--max-num-seqs 8 \
--tensor-parallel-size 1 \
--max-model-len 262144 \
--port 10240 \
--trust-remote-code \
--tool-call-parser qwen3_coder \
--reasoning-parser-plugin nano_v3_reasoning_parser.py \
--reasoning-parser nano_v3
```

### Create env.yaml

```yaml
policy_base_url: http://localhost:10240/v1
policy_api_key: EMPTY
policy_model_name: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
```

### Launch NeMo Gym Servers

```bash
ng_run "+config_paths=[resources_servers/reasoning_gym/configs/reasoning_gym.yaml,responses_api_models/vllm_model/configs/vllm_model.yaml]"
```

### Collect Rollouts

```bash
ng_collect_rollouts \
+agent_name=reasoning_gym_simple_agent \
+input_jsonl_fpath=resources_servers/reasoning_gym/data/example.jsonl \
+output_jsonl_fpath=results/reasoning_gym_rollouts.jsonl \
+limit=5
```

---

## Reference

- [Reasoning Gym GitHub](https://github.com/open-thought/reasoning-gym)
- [Dataset Gallery](https://github.com/open-thought/reasoning-gym/blob/main/GALLERY.md) - Examples of all available tasks
- `resources_servers/reasoning_gym/` - NeMo Gym integration implementation
111 changes: 111 additions & 0 deletions docs/environment-tutorials/verifiers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
(environment-verifiers)=

# Verifiers

Integration with [PrimeIntellect-ai/verifiers](https://github.com/PrimeIntellect-ai/verifiers), enabling environments from Prime Intellect's Environments Hub to run in NeMo Gym.

Verifiers provides 600+ environments across reasoning, math, and agent tasks. Environments built for Environments Hub can be deployed through NeMo Gym for training with NeMo RL. Unlike typical NeMo Gym environments, verifiers environments handle state management, verification, and tool execution internally without requiring a separate resource server.

:::{note}
**Multi-turn environments:** Currently require disabling `enforce_monotonicity` in training configuration until token propagation is fully patched.
:::

---

## Install Dependencies

Install verifiers and prime tools:

```bash
# From the Gym repository root
uv venv
source .venv/bin/activate
uv sync
uv add verifiers
uv tool install prime
```

Install an environment:

```bash
prime env install primeintellect/acereason-math
```

---

## Create Dataset

Generate example tasks:

```bash
python3 responses_api_agents/verifiers_agent/scripts/create_dataset.py \
--env-id primeintellect/acereason-math \
--size 5 \
--output responses_api_agents/verifiers_agent/data/acereason-math-example.jsonl
```

---

## Update Agent Requirements

Add to `responses_api_agents/verifiers_agent/requirements.txt`:

```txt
-e nemo-gym[dev] @ ../../
verifiers>=0.1.9
--extra-index-url https://hub.primeintellect.ai/primeintellect/simple/
acereason-math
```

---

## Configure Model Server

Create `env.yaml` at repository root:

```yaml
policy_base_url: "http://localhost:8000/v1"
policy_api_key: "dummy"
policy_model_name: "Qwen/Qwen3-4B-Instruct-2507"
```

---

## Start Model Server

```bash
uv add vllm
vllm serve Qwen/Qwen3-4B-Instruct-2507 \
--max-model-len 32768 \
--reasoning-parser qwen3 \
--enable-auto-tool-choice \
--tool-call-parser hermes
```

---

## Launch NeMo Gym Servers

```bash
ng_run "+config_paths=[responses_api_agents/verifiers_agent/configs/verifiers_acereason-math.yaml,responses_api_models/vllm_model/configs/vllm_model.yaml]"
```

---

## Collect Rollouts

```bash
ng_collect_rollouts \
+agent_name=verifiers_agent \
+input_jsonl_fpath=responses_api_agents/verifiers_agent/data/acereason-math-example.jsonl \
+output_jsonl_fpath=responses_api_agents/verifiers_agent/data/acereason-math-example-rollouts.jsonl \
+limit=5
```

---

## Reference

- [Prime Intellect Environments Hub](https://app.primeintellect.ai/dashboard/environments) - Browse 600+ available environments
- [Verifiers GitHub](https://github.com/PrimeIntellect-ai/verifiers) - Verifiers library
- `responses_api_agents/verifiers_agent/` - NeMo Gym agent integration
3 changes: 3 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -407,6 +407,9 @@ Rollout Collection <get-started/rollout-collection.md>
🟡 Multi-Node Docker <environment-tutorials/multi-node-docker>
🟡 LLM as Judge <environment-tutorials/llm-as-judge>
🟡 RLHF Reward Models <environment-tutorials/rlhf-reward-models>
Reasoning Gym <environment-tutorials/reasoning-gym>
Aviary <environment-tutorials/aviary>
Verifiers <environment-tutorials/verifiers>
```

```{toctree}
Expand Down
Loading