diff --git a/README.md b/README.md index 162ca6b80..b816b2f91 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ NeMo Gym is a library for building reinforcement learning (RL) training environments for large language models (LLMs). It provides infrastructure to develop environments, scale rollout collection, and integrate seamlessly with your preferred training framework. -NeMo Gym is a component of the [NVIDIA NeMo Framework](https://docs.nvidia.com/nemo-framework/), NVIDIA’s GPU-accelerated platform for building and training generative AI models. +NeMo Gym is a component of the [NVIDIA NeMo Framework](https://docs.nvidia.com/nemo-framework/). For details on how NeMo Gym fits within the NeMo ecosystem and integrates with other RL frameworks, see the [Ecosystem](https://docs.nvidia.com/nemo/gym/latest/about/ecosystem.html) documentation. ## πŸ† Why NeMo Gym? @@ -16,6 +16,34 @@ NeMo Gym is a component of the [NVIDIA NeMo Framework](https://docs.nvidia.com/n > [!IMPORTANT] > NeMo Gym is currently in early development. You should expect evolving APIs, incomplete documentation, and occasional bugs. We welcome contributions and feedback - for any changes, please open an issue first to kick off discussion! +## πŸ”— Ecosystem Integrations + +NeMo Gym is designed to integrate seamlessly with the broader RL ecosystem. For detailed documentation, see the [Ecosystem](https://docs.nvidia.com/nemo/gym/latest/about/ecosystem.html) page. + +### Training Frameworks + +NeMo Gym provides rollout collection infrastructure that integrates with various RL training frameworks: + +| Framework | Status | Description | +|-----------|--------|-------------| +| [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | βœ… Supported | NVIDIA's scalable post-training library with GRPO, DPO, SFT | +| [Unsloth](https://github.com/unslothai/unsloth) | βœ… Supported | Fast fine-tuning framework with memory optimization | +| [veRL](https://github.com/volcengine/verl) | πŸ”œ In Progress | Volcano Engine's scalable RL framework | +| [TRL](https://github.com/huggingface/trl) | πŸ”œ In Progress | Hugging Face Transformer Reinforcement Learning | + +### Environment Libraries + +NeMo Gym integrates with environment libraries for diverse training scenarios: + +| Library | Status | Description | +|---------|--------|-------------| +| [reasoning-gym](https://github.com/open-thought/reasoning-gym) | βœ… Supported | Procedurally generated reasoning tasks (see `reasoning_gym` resource server) | +| [Aviary](https://github.com/Future-House/aviary) | βœ… Supported | Multi-environment framework for tool-using agents (see `aviary` resource server) | +| [PRIME Intellect](https://github.com/PrimeIntellect-ai) | πŸ”œ In Progress | Distributed AI training environments | +| [BrowserGym](https://github.com/ServiceNow/BrowserGym) | πŸ”œ In Progress | Web browsing and automation environments | + +> πŸ’‘ **Want to add an integration?** We welcome contributions! See our [Contributing Guide](https://docs.nvidia.com/nemo/gym/latest/contribute/index.html) or [open an issue](https://github.com/NVIDIA-NeMo/Gym/issues) to discuss. + ## πŸ“‹ Requirements ### Hardware Requirements @@ -138,6 +166,7 @@ Purpose: Demonstrate NeMo Gym patterns and concepts. | Name | Demonstrates | Config | README | | ------------------ | ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- | | Multi Step | Multi-step tool calling | example_multi_step.yaml | README | +| Reasoning Gym | External environment library integration | reasoning_gym.yaml | README | | Session State Mgmt | Session state management (in-memory) | example_session_state_mgmt.yaml | README | | Single Tool Call | Basic single-step tool calling | example_single_tool_call.yaml | README | @@ -152,15 +181,20 @@ Purpose: Training-ready environments with curated datasets. | Resource Server | Domain | Dataset | Description | Value | Config | Train | Validation | License | | -------------------------- | --------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------- | ----- | ---------- | --------------------------------------------------------- | +| Aviary (GSM8K) | agent | GSM8K | Grade school math with calculator tool via Aviary integration | Improve math reasoning with tool use | config | βœ“ | - | MIT | +| Aviary (HotPotQA) | agent | HotPotQA | Multi-hop question answering via Aviary integration | Improve multi-hop reasoning capabilities | config | βœ“ | - | Apache 2.0 | +| Aviary (BixBench) | agent | BixBench | Scientific computational tasks with Jupyter notebook execution | Improve scientific reasoning capabilities | config | βœ“ | - | Apache 2.0 | | Calendar | agent | Nemotron-RL-agent-calendar_scheduling | - | - | config | βœ“ | βœ“ | Apache 2.0 | | Google Search | agent | Nemotron-RL-knowledge-web_search-mcqa | Multi-choice question answering problems with search tools integrated | Improve knowledge-related benchmarks with search tools | config | βœ“ | - | Apache 2.0 | | Math Advanced Calculations | agent | Nemotron-RL-math-advanced_calculations | An instruction following math environment with counter-intuitive calculators | Improve instruction following capabilities in specific math environments | config | βœ“ | - | Apache 2.0 | | Workplace Assistant | agent | Nemotron-RL-agent-workplace_assistant | Workplace assistant multi-step tool-using environment | Improve multi-step tool use capability | config | βœ“ | βœ“ | Apache 2.0 | +| Xlam Function Calling | agent | xlam-function-calling-60k | Function calling training using Salesforce dataset | Improve function calling capabilities | config | βœ“ | βœ“ | Apache 2.0 | | Code Gen | coding | nemotron-RL-coding-competitive_coding | - | - | config | βœ“ | βœ“ | Apache 2.0 | | Mini Swe Agent | coding | SWE-Gym | A software development with mini-swe-agent orchestration | Improve software development capabilities, like SWE-bench | config | βœ“ | βœ“ | MIT | | Instruction Following | instruction_following | Nemotron-RL-instruction_following | Instruction following datasets targeting IFEval and IFBench style instruction following capabilities | Improve IFEval and IFBench | config | βœ“ | - | Apache 2.0 | | Structured Outputs | instruction_following | Nemotron-RL-instruction_following-structured_outputs | Check if responses are following structured output requirements in prompts | Improve instruction following capabilities | config | βœ“ | βœ“ | Apache 2.0 | | Mcqa | knowledge | Nemotron-RL-knowledge-mcqa | Multi-choice question answering problems | Improve benchmarks like MMLU / GPQA / HLE | config | βœ“ | - | Apache 2.0 | +| Math With Code | math | Nemotron-RL-math-OpenMathReasoning | Math problems with Python code execution (numpy, scipy, pandas) | Improve math capabilities with code-assisted reasoning | config | βœ“ | - | Apache 2.0 | | Math With Judge | math | Nemotron-RL-math-OpenMathReasoning | Math dataset with math-verify and LLM-as-a-judge | Improve math capabilities including AIME 24 / 25 | config | βœ“ | βœ“ | Creative Commons Attribution 4.0 International | | Math With Judge | math | Nemotron-RL-math-stack_overflow | - | - | config | βœ“ | βœ“ | Creative Commons Attribution-ShareAlike 4.0 International | diff --git a/docs/about/ecosystem.md b/docs/about/ecosystem.md index 6abf9dd3a..56e260ad2 100644 --- a/docs/about/ecosystem.md +++ b/docs/about/ecosystem.md @@ -1,27 +1,145 @@ (about-ecosystem)= -# NeMo Gym in the NVIDIA Ecosystem +# NeMo Gym in the Ecosystem -NeMo Gym is a component of the [NVIDIA NeMo Framework](https://docs.nvidia.com/nemo-framework/), NVIDIA's GPU-accelerated platform for building and training generative AI models. +NeMo Gym provides scalable {term}`rollout ` collection infrastructure that works with multiple RL training frameworks and environment libraries. This page describes how NeMo Gym fits into both the NVIDIA NeMo Framework and the wider RL community. :::{tip} -For details on NeMo Gym capabilities, refer to the -{ref}`Overview `. +**New to NeMo Gym?** Refer to the {ref}`Overview ` for capabilities and the {ref}`Key Terminology ` glossary for definitions of rollout, multi-turn, and other terms. +::: + +--- + +## Training Framework Integrations + +NeMo Gym decouples environment development from training by outputting standardized JSONL rollout data. Training frameworks consume this data through their own integration code. + +### The Interface Contract + +NeMo Gym outputs JSONL with OpenAI-compatible message format: + +```json +{ + "reward": 1.0, + "output": [ + {"role": "user", "content": "What is 2 + 2?"}, + {"role": "assistant", "content": "The answer is 4."} + ] +} +``` + +Any framework that can read this format can use NeMo Gym rolloutsβ€”no native integration required. The following frameworks have documented patterns. + +### Supported Frameworks + +```{list-table} +:header-rows: 1 +:widths: 20 15 65 + +* - Framework + - Integration + - Description +* - [NeMo RL](https://github.com/NVIDIA-NeMo/RL) + - βœ… Native + - NVIDIA's scalable post-training library. NeMo RL includes built-in NeMo Gym support for {term}`multi-turn` rollout collection with GRPO, DPO, and SFT algorithms. Refer to the {doc}`NeMo RL tutorial <../tutorials/nemo-rl-grpo/index>`. +* - [Unsloth](https://github.com/unslothai/unsloth) + - βœ… Compatible + - Fast fine-tuning framework with 2-5x speedup and 80% memory reduction. Consumes NeMo Gym JSONL output for LoRA and QLoRA training. Refer to the {doc}`Unsloth tutorial <../tutorials/unsloth-training>`. +``` + +:::{note} +**Integration model**: NeMo Gym produces rollout data; training frameworks consume it. NeMo RL includes native integration code. Other frameworks use NeMo Gym's JSONL output format. +::: + +### In-Progress Integrations + +The following frameworks have planned integrations with documented patterns: + +| Framework | Description | Status | +|-----------|-------------|--------| +| [veRL](https://github.com/volcengine/verl) | Volcano Engine's scalable RL library with hybrid parallelism | Planned | +| [TRL](https://github.com/huggingface/trl) | Hugging Face's Transformer Reinforcement Learning library | Planned | + +:::{tip} +**Integrate your framework**: Refer to the {doc}`Training Framework Integration Guide <../contribute/rl-framework-integration/index>` or [open an issue](https://github.com/NVIDIA-NeMo/Gym/issues) to discuss requirements. +::: + +--- + +## Environment Library Integrations + +NeMo Gym integrates with environment libraries to provide diverse training scenarios, from reasoning tasks to tool-using agents. + +### Supported Libraries + +```{list-table} +:header-rows: 1 +:widths: 25 15 60 + +* - Library + - Status + - Description +* - [reasoning-gym](https://github.com/open-thought/reasoning-gym) + - βœ… Code + - Procedurally generated reasoning tasks. Integration: [`resources_servers/reasoning_gym/`](https://github.com/NVIDIA-NeMo/Gym/tree/main/resources_servers/reasoning_gym) +* - [Aviary](https://github.com/Future-House/aviary) + - βœ… Code + - Multi-environment framework for tool-using agents (GSM8K, HotPotQA, BixBench). Integration: [`resources_servers/aviary/`](https://github.com/NVIDIA-NeMo/Gym/tree/main/resources_servers/aviary) +``` + +### In-Progress Integrations + +| Library | Description | Status | +|---------|-------------|--------| +| [PRIME Intellect](https://github.com/PrimeIntellect-ai) | Distributed AI training environments | πŸ”œ Planned | +| [BrowserGym](https://github.com/ServiceNow/BrowserGym) | Web browsing and automation environments | πŸ”œ Planned | + +### Building Custom Environments + +Beyond external library integrations, NeMo Gym provides a native pattern for building LLM training environmentsβ€”the {term}`Resource Server `. This pattern has four components: + +- **Tool definitions**: OpenAI function calling schema for model interactions +- **Verification logic**: Computes reward scores (0.0-1.0) from rollout outcomes +- **State management**: Tracks context across {term}`multi-step` and {term}`multi-turn` interactions +- **Curated datasets**: Task prompts paired with expected outcomes + +This pattern supports LLM-specific capabilities like tool use, instruction following, and complex reasoning that traditional RL environments were not designed for. + +:::{seealso} +Refer to the {doc}`Creating a Resource Server <../tutorials/creating-resource-server>` tutorial to build custom environments. ::: --- ## NeMo Gym Within the NeMo Framework -NeMo Framework includes modular libraries for end-to-end model training: +NeMo Gym is a component of the [NVIDIA NeMo Framework](https://docs.nvidia.com/nemo-framework/), a GPU-accelerated platform for building and training generative AI models. + +The NeMo Framework includes modular libraries for end-to-end model development: + +| Library | Purpose | +|---------|---------| +| [NeMo Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | Pretraining and fine-tuning with Megatron-Core | +| [NeMo AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | PyTorch native training for Hugging Face models | +| [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | Scalable post-training with GRPO, DPO, and SFT | +| **[NeMo Gym](https://github.com/NVIDIA-NeMo/Gym)** | RL environment infrastructure and rollout collection *(this project)* | +| [NeMo Curator](https://github.com/NVIDIA-NeMo/Curator) | Data preprocessing and curation | +| [NeMo Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner) | Synthetic data generation | +| [NeMo Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) | Model evaluation and benchmarking | +| [NeMo Guardrails](https://github.com/NVIDIA-NeMo/Guardrails) | Programmable safety guardrails | + +**NeMo Gym's role**: Gym standardizes rollout collection for RL training. It provides unified interfaces to heterogeneous RL environments and resource servers with verification logic, enabling large-scale training data generation for NeMo RL and other frameworks. + +--- + +## Community and Contributions + +NeMo Gym welcomes community contributions in the following areas: -* **[NeMo Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge)**: Pretraining and fine-tuning with Megatron-Core -* **[NeMo AutoModel](https://github.com/NVIDIA-NeMo/Automodel)**: PyTorch native training for Hugging Face models -* **[NeMo RL](https://github.com/NVIDIA-NeMo/RL)**: Scalable and efficient post-training -* **[NeMo Gym](https://github.com/NVIDIA-NeMo/Gym)**: RL environment infrastructure and rollout collection (this project) -* **[NeMo Curator](https://github.com/NVIDIA-NeMo/Curator)**: Data preprocessing and curation -* **[NeMo Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner)**: Synthetic data generation from scratch or seed datasets -* **[NeMo Evaluator](https://github.com/NVIDIA-NeMo/Evaluator)**: Model evaluation and benchmarking -* **[NeMo Guardrails](https://github.com/NVIDIA-NeMo/Guardrails)**: Programmable safety guardrails -* And more... +- **Training framework integration**: Connect NeMo Gym with additional RL training libraries +- **Resource server contributions**: Share environments for domains like coding, math, tool use, or instruction following +- **Documentation improvements**: Improve guides and examples for new users +- **Issue reporting**: Report bugs and suggest features to shape the roadmap -**NeMo Gym's Role**: Within this ecosystem, Gym focuses on standardizing scalable rollout collection for RL training. It provides unified interfaces to heterogeneous RL environments and curated resource servers with verification logic. This makes it practical to generate large-scale, high-quality training data for NeMo RL and other training frameworks. +:::{tip} +Refer to the {doc}`Contributing Guide <../contribute/index>` or browse [open issues](https://github.com/NVIDIA-NeMo/Gym/issues) to get started. +:::