docs: aviary, verifiers, reasoning gym env integration docs #617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open

cmunley1 wants to merge 4 commits into main from cmunley1/env-int-docs

+310 −1

docs/environment-tutorials/aviary.md

-Original file line number
+Diff line change
@@ -0,0 +1,48 @@
+    (environment-aviary)=
+    # Aviary
+    Integration with [Future-House/aviary](https://github.com/Future-House/aviary), a gymnasium for defining custom language agent RL environments.
+    Aviary is a framework for building custom RL environments with tool use and multi-step reasoning. Environments built in Aviary can be ran through NeMo Gym for training and inference. The library features pre-existing environments on math, general knowledge, biological sequences, scientific literature search, and protein stability.
+    ---
+    ## Available Environments
+    The integration includes several pre-built Aviary environments:
+    - **GSM8K** (`gsm8k_app.py`) - Grade school math problems with calculator tool
+    - **HotPotQA** (`hotpotqa_app.py`) - Multi-hop question answering
+    - **BixBench** (`notebook_app.py`) - Jupyter notebook execution for scientific tasks
+    - **Client/Proxy** (`client_app.py`) - Generic interface to remote Aviary dataset servers
+    ---
+    ## Example Usage
+    ### GSM8K Environment
+    Run the GSM8K Aviary resources server with a model config:
+    ```bash
+    ng_run "+config_paths=[resources_servers/aviary/configs/gsm8k_aviary.yaml,responses_api_models/vllm_model/configs/vllm_model.yaml]"
+    ```
+    Collect rollouts:
+    ```bash
+    ng_collect_rollouts \
+        +agent_name=gsm8k_aviary_agent \
+        +input_jsonl_fpath=resources_servers/aviary/data/example.jsonl \
+        +output_jsonl_fpath=resources_servers/aviary/data/example_rollouts.jsonl
+    ```
+    ---
+    ## Reference
+    - [Aviary GitHub](https://github.com/Future-House/aviary) - Official Aviary repository
+    - [Aviary Paper](https://arxiv.org/abs/2412.21154) - Training language agents on challenging scientific tasks
+    - `resources_servers/aviary/` - NeMo Gym resources server implementations
+    - `responses_api_agents/aviary_agent/` - NeMo Gym aviary agent integration

docs/environment-tutorials/index.md

-Original file line number
+Diff line change
@@ Expand Up / @@ -125,6 +125,43 @@ Scale environments across machines with containers. @@
     ::::
+    ### Integrations
+    ::::{grid} 1 2 3 3
+    :gutter: 2
+    :::{grid-item-card} {octicon}`light-bulb;1.5em;sd-mr-1` Reasoning Gym
+    :link: reasoning-gym
+    :link-type: doc
++ procedurally generated reasoning tasks across multiple domains.
+    +++
+    {bdg-secondary}`integration` {bdg-secondary}`15-20 min`
+    :::
+    :::{grid-item-card} {octicon}`beaker;1.5em;sd-mr-1` Aviary
+    :link: aviary
+    :link-type: doc
+    Custom language agent environments for scientific and reasoning tasks.
+    +++
+    {bdg-secondary}`integration` {bdg-secondary}`10-15 min`
+    :::
+    :::{grid-item-card} {octicon}`package;1.5em;sd-mr-1` Verifiers
+    :link: verifiers
+    :link-type: doc
++ environments from Prime Intellect's Environments Hub.
+    +++
+    {bdg-secondary}`integration` {bdg-secondary}`20 min`
+    :::
+    ::::
     ---
     ## Learning Path
@@ Expand Down Expand Up / @@ -162,8 +199,9 @@ NeMo Gym includes working examples in `resources_servers/`: @@
     | `calendar/` | Multi-turn | State comparison |
     | `equivalence_llm_judge/` | Single-step | LLM judge with swap check |
     | `math_with_judge/` | Single-step | Library + judge fallback |
-    | `aviary/` | Multi-step | Aviary environment integration |
+    | `aviary/` | Multi-step | Aviary framework integration |
     | `workplace_assistant/` | Multi-step | Session state, tool routing |
+    | `reasoning_gym/` | Single-step | Algorithmic verification with reasoning-gym library |
     :::{tip}
     Use `ng_init_resources_server +entrypoint=resources_servers/my_env` to scaffold a new environment from a template.
@@ Expand Down @@

docs/environment-tutorials/reasoning-gym.md

-Original file line number
+Diff line change
@@ -0,0 +1,109 @@
+    (environment-reasoning-gym)=
+    # Reasoning Gym
+    Integration with [open-thought/reasoning-gym](https://github.com/open-thought/reasoning-gym), a library of procedural dataset generators and algorithmically verifiable reasoning environments.
+    Reasoning Gym provides 100+ tasks over many domains including algebra, arithmetic, computation, cognition, geometry, graph theory, logic, and common games. Tasks are procedurally generated with adjustable complexity and algorithmically verified.
+    ---
+    ## Dataset Preparation
+    The integration includes a helper script for creating datasets from reasoning gym tasks.
+    **Single task:**
+    ```bash
+    python resources_servers/reasoning_gym/scripts/create_dataset.py \
+        --task knights_knaves \
+        --size 500 \
+        --seed 42 \
+        --output resources_servers/reasoning_gym/data/train_knights_knaves.jsonl
+    ```
+    **Multiple tasks (composite):**
+    ```bash
+    python resources_servers/reasoning_gym/scripts/create_dataset.py \
+        --tasks knights_knaves,syllogisms,leg_counting \
+        --size 1000 \
+        --output resources_servers/reasoning_gym/data/train_composite.jsonl
+    ```
+    **All tasks in a category:**
+    ```bash
+    python resources_servers/reasoning_gym/scripts/create_dataset.py \
+        --category logic \
+        --size 1000 \
+        --output resources_servers/reasoning_gym/data/train_logic.jsonl
+    ```
+    **All available tasks:**
+    ```bash
+    python resources_servers/reasoning_gym/scripts/create_dataset.py \
+        --all-tasks \
+        --size 1000 \
+        --output resources_servers/reasoning_gym/data/train_all.jsonl
+    ```
+    **With custom task configuration:**
+    ```bash
+    python resources_servers/reasoning_gym/scripts/create_dataset.py \
+        --task knights_knaves \
+        --size 500 \
+        --config '{"n_people": 3, "depth_constraint": 3}' \
+        --output resources_servers/reasoning_gym/data/train_hard.jsonl
+    ```
+    ---
+    ## Rollout Collection
+    ### Start vLLM Server
+    ```bash
+    pip install -U "vllm>=0.12.0"
+    wget https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/resolve/main/nano_v3_reasoning_parser.py
+    vllm serve nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \
+      --max-num-seqs 8 \
+      --tensor-parallel-size 1 \
+      --max-model-len 262144 \
+      --port 10240 \
+      --trust-remote-code \
+      --tool-call-parser qwen3_coder \
+      --reasoning-parser-plugin nano_v3_reasoning_parser.py \
+      --reasoning-parser nano_v3
+    ```
+    ### Create env.yaml
+    ```yaml
+    policy_base_url: http://localhost:10240/v1
+    policy_api_key: EMPTY
+    policy_model_name: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
+    ```
+    ### Launch NeMo Gym Servers
+    ```bash
+    ng_run "+config_paths=[resources_servers/reasoning_gym/configs/reasoning_gym.yaml,responses_api_models/vllm_model/configs/vllm_model.yaml]"
+    ```
+    ### Collect Rollouts
+    ```bash
+    ng_collect_rollouts \
+        +agent_name=reasoning_gym_simple_agent \
+        +input_jsonl_fpath=resources_servers/reasoning_gym/data/example.jsonl \
+        +output_jsonl_fpath=results/reasoning_gym_rollouts.jsonl \
+        +limit=5
+    ```
+    ---
+    ## Reference
+    - [Reasoning Gym GitHub](https://github.com/open-thought/reasoning-gym)
+    - [Dataset Gallery](https://github.com/open-thought/reasoning-gym/blob/main/GALLERY.md) - Examples of all available tasks
+    - `resources_servers/reasoning_gym/` - NeMo Gym integration implementation

docs/environment-tutorials/verifiers.md

-Original file line number
+Diff line change
@@ -0,0 +1,111 @@
+    (environment-verifiers)=
+    # Verifiers
+    Integration with [PrimeIntellect-ai/verifiers](https://github.com/PrimeIntellect-ai/verifiers), enabling environments from Prime Intellect's Environments Hub to run in NeMo Gym.
+    Verifiers provides 600+ environments across reasoning, math, and agent tasks. Environments built for Environments Hub can be deployed through NeMo Gym for training with NeMo RL. Unlike typical NeMo Gym environments, verifiers environments handle state management, verification, and tool execution internally without requiring a separate resource server.
+    :::{note}
+    **Multi-turn environments:** Currently require disabling `enforce_monotonicity` in training configuration until token propagation is fully patched.
+    :::
+    ---
+    ## Install Dependencies
+    Install verifiers and prime tools:
+    ```bash
+    # From the Gym repository root
+    uv venv
+    source .venv/bin/activate
+    uv sync
+    uv add verifiers
+    uv tool install prime
+    ```
+    Install an environment:
+    ```bash
+    prime env install primeintellect/acereason-math
+    ```
+    ---
+    ## Create Dataset
+    Generate example tasks:
+    ```bash
+    python3 responses_api_agents/verifiers_agent/scripts/create_dataset.py \
+      --env-id primeintellect/acereason-math \
+      --size 5 \
+      --output responses_api_agents/verifiers_agent/data/acereason-math-example.jsonl
+    ```
+    ---
+    ## Update Agent Requirements
+    Add to `responses_api_agents/verifiers_agent/requirements.txt`:
+    ```txt
+    -e nemo-gym[dev] @ ../../
+    verifiers>=0.1.9
+    --extra-index-url https://hub.primeintellect.ai/primeintellect/simple/
+    acereason-math
+    ```
+    ---
+    ## Configure Model Server
+    Create `env.yaml` at repository root:
+    ```yaml
+    policy_base_url: "http://localhost:8000/v1"
+    policy_api_key: "dummy"
+    policy_model_name: "Qwen/Qwen3-4B-Instruct-2507"
+    ```
+    ---
+    ## Start Model Server
+    ```bash
+    uv add vllm
+    vllm serve Qwen/Qwen3-4B-Instruct-2507 \
+      --max-model-len 32768 \
+      --reasoning-parser qwen3 \
+      --enable-auto-tool-choice \
+      --tool-call-parser hermes
+    ```
+    ---
+    ## Launch NeMo Gym Servers
+    ```bash
+    ng_run "+config_paths=[responses_api_agents/verifiers_agent/configs/verifiers_acereason-math.yaml,responses_api_models/vllm_model/configs/vllm_model.yaml]"
+    ```
+    ---
+    ## Collect Rollouts
+    ```bash
+    ng_collect_rollouts \
+        +agent_name=verifiers_agent \
+        +input_jsonl_fpath=responses_api_agents/verifiers_agent/data/acereason-math-example.jsonl \
+        +output_jsonl_fpath=responses_api_agents/verifiers_agent/data/acereason-math-example-rollouts.jsonl \
+        +limit=5
+    ```
+    ---
+    ## Reference
+    - [Prime Intellect Environments Hub](https://app.primeintellect.ai/dashboard/environments) - Browse 600+ available environments
+    - [Verifiers GitHub](https://github.com/PrimeIntellect-ai/verifiers) - Verifiers library
+    - `responses_api_agents/verifiers_agent/` - NeMo Gym agent integration

docs/index.md

-Original file line number
+Diff line change
@@ Expand Up / @@ -407,6 +407,9 @@ Rollout Collection <get-started/rollout-collection.md> @@
     🟡 Multi-Node Docker <environment-tutorials/multi-node-docker>
     🟡 LLM as Judge <environment-tutorials/llm-as-judge>
     🟡 RLHF Reward Models <environment-tutorials/rlhf-reward-models>
+    Reasoning Gym <environment-tutorials/reasoning-gym>
+    Aviary <environment-tutorials/aviary>
+    Verifiers <environment-tutorials/verifiers>
     ```
     ```{toctree}
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: aviary, verifiers, reasoning gym env integration docs #617

Diff view

Diff view

There are no files selected for viewing

Uh oh!

docs: aviary, verifiers, reasoning gym env integration docs #617

Are you sure you want to change the base?

docs: aviary, verifiers, reasoning gym env integration docs #617

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!