Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 35 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

NeMo Gym is a library for building reinforcement learning (RL) training environments for large language models (LLMs). It provides infrastructure to develop environments, scale rollout collection, and integrate seamlessly with your preferred training framework.

NeMo Gym is a component of the [NVIDIA NeMo Framework](https://docs.nvidia.com/nemo-framework/), NVIDIA’s GPU-accelerated platform for building and training generative AI models.
NeMo Gym is a component of the [NVIDIA NeMo Framework](https://docs.nvidia.com/nemo-framework/). For details on how NeMo Gym fits within the NeMo ecosystem and integrates with other RL frameworks, see the [Ecosystem](https://docs.nvidia.com/nemo/gym/latest/about/ecosystem.html) documentation.


## 🏆 Why NeMo Gym?
Expand All @@ -16,6 +16,34 @@ NeMo Gym is a component of the [NVIDIA NeMo Framework](https://docs.nvidia.com/n
> [!IMPORTANT]
> NeMo Gym is currently in early development. You should expect evolving APIs, incomplete documentation, and occasional bugs. We welcome contributions and feedback - for any changes, please open an issue first to kick off discussion!

## 🔗 Ecosystem Integrations

NeMo Gym is designed to integrate seamlessly with the broader RL ecosystem. For detailed documentation, see the [Ecosystem](https://docs.nvidia.com/nemo/gym/latest/about/ecosystem.html) page.

### Training Frameworks

NeMo Gym provides rollout collection infrastructure that integrates with various RL training frameworks:

| Framework | Status | Description |
|-----------|--------|-------------|
| [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | ✅ Supported | NVIDIA's scalable post-training library with GRPO, DPO, SFT |
| [Unsloth](https://github.com/unslothai/unsloth) | ✅ Supported | Fast fine-tuning framework with memory optimization |
| [veRL](https://github.com/volcengine/verl) | 🔜 In Progress | Volcano Engine's scalable RL framework |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think verl is in progres but maybe someone is working on it?

and i think we can change TRL to say supported now, we are just fixing a minor last minute change, and working on additional docs e.g. sample reward/step or a potential blog post.

| [TRL](https://github.com/huggingface/trl) | 🔜 In Progress | Hugging Face Transformer Reinforcement Learning |

### Environment Libraries

NeMo Gym integrates with environment libraries for diverse training scenarios:

| Library | Status | Description |
|---------|--------|-------------|
| [reasoning-gym](https://github.com/open-thought/reasoning-gym) | ✅ Supported | Procedurally generated reasoning tasks (see `reasoning_gym` resource server) |
| [Aviary](https://github.com/Future-House/aviary) | ✅ Supported | Multi-environment framework for tool-using agents (see `aviary` resource server) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be worth saying its openai gymnasium compatible (but we should double confirm that statement)

Prime intellect - the library is named verifiers, or environments hub, not prime intelelct itself, imo

browsergym - not sure if anyone is working on this? @cwing-nvidia ?

| [PRIME Intellect](https://github.com/PrimeIntellect-ai) | 🔜 In Progress | Distributed AI training environments |
| [BrowserGym](https://github.com/ServiceNow/BrowserGym) | 🔜 In Progress | Web browsing and automation environments |

> 💡 **Want to add an integration?** We welcome contributions! See our [Contributing Guide](https://docs.nvidia.com/nemo/gym/latest/contribute/index.html) or [open an issue](https://github.com/NVIDIA-NeMo/Gym/issues) to discuss.

## 📋 Requirements

### Hardware Requirements
Expand Down Expand Up @@ -138,6 +166,7 @@ Purpose: Demonstrate NeMo Gym patterns and concepts.
| Name | Demonstrates | Config | README |
| ------------------ | ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
| Multi Step | Multi-step tool calling | <a href='resources_servers/example_multi_step/configs/example_multi_step.yaml'>example_multi_step.yaml</a> | <a href='resources_servers/example_multi_step/README.md'>README</a> |
| Reasoning Gym | External environment library integration | <a href='resources_servers/reasoning_gym/configs/reasoning_gym.yaml'>reasoning_gym.yaml</a> | <a href='resources_servers/reasoning_gym/README.md'>README</a> |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought these dont go in readme because they dont have hf dataset link, i thought this readme table was built automatically based on that somehow

| Session State Mgmt | Session state management (in-memory) | <a href='resources_servers/example_session_state_mgmt/configs/example_session_state_mgmt.yaml'>example_session_state_mgmt.yaml</a> | <a href='resources_servers/example_session_state_mgmt/README.md'>README</a> |
| Single Tool Call | Basic single-step tool calling | <a href='resources_servers/example_single_tool_call/configs/example_single_tool_call.yaml'>example_single_tool_call.yaml</a> | <a href='resources_servers/example_single_tool_call/README.md'>README</a> |
<!-- END_EXAMPLE_ONLY_SERVERS_TABLE -->
Expand All @@ -152,15 +181,20 @@ Purpose: Training-ready environments with curated datasets.
<!-- START_TRAINING_SERVERS_TABLE -->
| Resource Server | Domain | Dataset | Description | Value | Config | Train | Validation | License |
| -------------------------- | --------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------- | ----- | ---------- | --------------------------------------------------------- |
| Aviary (GSM8K) | agent | <a href='https://arxiv.org/abs/2110.14168'>GSM8K</a> | Grade school math with calculator tool via Aviary integration | Improve math reasoning with tool use | <a href='resources_servers/aviary/configs/gsm8k_aviary.yaml'>config</a> | ✓ | - | MIT |
| Aviary (HotPotQA) | agent | <a href='https://aclanthology.org/D18-1259/'>HotPotQA</a> | Multi-hop question answering via Aviary integration | Improve multi-hop reasoning capabilities | <a href='resources_servers/aviary/configs/hotpotqa_aviary.yaml'>config</a> | ✓ | - | Apache 2.0 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we starting to enumerate multiple datasets / env implementation in the readme now too? we should do same for math for example too then? @bxyu-nvidia

| Aviary (BixBench) | agent | <a href='https://arxiv.org/abs/2503.00096'>BixBench</a> | Scientific computational tasks with Jupyter notebook execution | Improve scientific reasoning capabilities | <a href='resources_servers/aviary/configs/bixbench_aviary.yaml'>config</a> | ✓ | - | Apache 2.0 |
| Calendar | agent | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-agent-calendar_scheduling'>Nemotron-RL-agent-calendar_scheduling</a> | - | - | <a href='resources_servers/calendar/configs/calendar.yaml'>config</a> | ✓ | ✓ | Apache 2.0 |
| Google Search | agent | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-knowledge-web_search-mcqa'>Nemotron-RL-knowledge-web_search-mcqa</a> | Multi-choice question answering problems with search tools integrated | Improve knowledge-related benchmarks with search tools | <a href='resources_servers/google_search/configs/google_search.yaml'>config</a> | ✓ | - | Apache 2.0 |
| Math Advanced Calculations | agent | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-math-advanced_calculations'>Nemotron-RL-math-advanced_calculations</a> | An instruction following math environment with counter-intuitive calculators | Improve instruction following capabilities in specific math environments | <a href='resources_servers/math_advanced_calculations/configs/math_advanced_calculations.yaml'>config</a> | ✓ | - | Apache 2.0 |
| Workplace Assistant | agent | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-agent-workplace_assistant'>Nemotron-RL-agent-workplace_assistant</a> | Workplace assistant multi-step tool-using environment | Improve multi-step tool use capability | <a href='resources_servers/workplace_assistant/configs/workplace_assistant.yaml'>config</a> | ✓ | ✓ | Apache 2.0 |
| Xlam Function Calling | agent | <a href='https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k'>xlam-function-calling-60k</a> | Function calling training using Salesforce dataset | Improve function calling capabilities | <a href='resources_servers/xlam_fc/configs/xlam_fc.yaml'>config</a> | ✓ | ✓ | Apache 2.0 |
| Code Gen | coding | <a href='https://huggingface.co/datasets/nvidia/nemotron-RL-coding-competitive_coding'>nemotron-RL-coding-competitive_coding</a> | - | - | <a href='resources_servers/code_gen/configs/code_gen.yaml'>config</a> | ✓ | ✓ | Apache 2.0 |
| Mini Swe Agent | coding | <a href='https://huggingface.co/datasets/SWE-Gym/SWE-Gym'>SWE-Gym</a> | A software development with mini-swe-agent orchestration | Improve software development capabilities, like SWE-bench | <a href='resources_servers/mini_swe_agent/configs/mini_swe_agent.yaml'>config</a> | ✓ | ✓ | MIT |
| Instruction Following | instruction_following | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-instruction_following'>Nemotron-RL-instruction_following</a> | Instruction following datasets targeting IFEval and IFBench style instruction following capabilities | Improve IFEval and IFBench | <a href='resources_servers/instruction_following/configs/instruction_following.yaml'>config</a> | ✓ | - | Apache 2.0 |
| Structured Outputs | instruction_following | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-instruction_following-structured_outputs'>Nemotron-RL-instruction_following-structured_outputs</a> | Check if responses are following structured output requirements in prompts | Improve instruction following capabilities | <a href='resources_servers/structured_outputs/configs/structured_outputs_json.yaml'>config</a> | ✓ | ✓ | Apache 2.0 |
| Mcqa | knowledge | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-knowledge-mcqa'>Nemotron-RL-knowledge-mcqa</a> | Multi-choice question answering problems | Improve benchmarks like MMLU / GPQA / HLE | <a href='resources_servers/mcqa/configs/mcqa.yaml'>config</a> | ✓ | - | Apache 2.0 |
| Math With Code | math | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-math-OpenMathReasoning'>Nemotron-RL-math-OpenMathReasoning</a> | Math problems with Python code execution (numpy, scipy, pandas) | Improve math capabilities with code-assisted reasoning | <a href='resources_servers/math_with_code/configs/math_with_code.yaml'>config</a> | ✓ | - | Apache 2.0 |
| Math With Judge | math | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-math-OpenMathReasoning'>Nemotron-RL-math-OpenMathReasoning</a> | Math dataset with math-verify and LLM-as-a-judge | Improve math capabilities including AIME 24 / 25 | <a href='resources_servers/math_with_judge/configs/math_with_judge.yaml'>config</a> | ✓ | ✓ | Creative Commons Attribution 4.0 International |
| Math With Judge | math | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-math-stack_overflow'>Nemotron-RL-math-stack_overflow</a> | - | - | <a href='resources_servers/math_with_judge/configs/math_stack_overflow.yaml'>config</a> | ✓ | ✓ | Creative Commons Attribution-ShareAlike 4.0 International |
<!-- END_TRAINING_SERVERS_TABLE -->
Expand Down
Loading
Loading