-
Notifications
You must be signed in to change notification settings - Fork 57
ecosystem pg verbiage update #612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
8047c37
71f0756
e4f9487
83f4f63
d8aa82b
289cc25
b45bd38
e02e9d9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,7 +2,7 @@ | |
|
|
||
| NeMo Gym is a library for building reinforcement learning (RL) training environments for large language models (LLMs). It provides infrastructure to develop environments, scale rollout collection, and integrate seamlessly with your preferred training framework. | ||
|
|
||
| NeMo Gym is a component of the [NVIDIA NeMo Framework](https://docs.nvidia.com/nemo-framework/), NVIDIA’s GPU-accelerated platform for building and training generative AI models. | ||
| NeMo Gym is a component of the [NVIDIA NeMo Framework](https://docs.nvidia.com/nemo-framework/). For details on how NeMo Gym fits within the NeMo ecosystem and integrates with other RL frameworks, see the [Ecosystem](https://docs.nvidia.com/nemo/gym/latest/about/ecosystem.html) documentation. | ||
|
|
||
|
|
||
| ## 🏆 Why NeMo Gym? | ||
|
|
@@ -16,6 +16,34 @@ NeMo Gym is a component of the [NVIDIA NeMo Framework](https://docs.nvidia.com/n | |
| > [!IMPORTANT] | ||
| > NeMo Gym is currently in early development. You should expect evolving APIs, incomplete documentation, and occasional bugs. We welcome contributions and feedback - for any changes, please open an issue first to kick off discussion! | ||
|
|
||
| ## 🔗 Ecosystem Integrations | ||
|
|
||
| NeMo Gym is designed to integrate seamlessly with the broader RL ecosystem. For detailed documentation, see the [Ecosystem](https://docs.nvidia.com/nemo/gym/latest/about/ecosystem.html) page. | ||
|
|
||
| ### Training Frameworks | ||
|
|
||
| NeMo Gym provides rollout collection infrastructure that integrates with various RL training frameworks: | ||
|
|
||
| | Framework | Status | Description | | ||
| |-----------|--------|-------------| | ||
| | [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | ✅ Supported | NVIDIA's scalable post-training library with GRPO, DPO, SFT | | ||
| | [Unsloth](https://github.com/unslothai/unsloth) | ✅ Supported | Fast fine-tuning framework with memory optimization | | ||
| | [veRL](https://github.com/volcengine/verl) | 🔜 In Progress | Volcano Engine's scalable RL framework | | ||
| | [TRL](https://github.com/huggingface/trl) | 🔜 In Progress | Hugging Face Transformer Reinforcement Learning | | ||
|
|
||
| ### Environment Libraries | ||
|
|
||
| NeMo Gym integrates with environment libraries for diverse training scenarios: | ||
|
|
||
| | Library | Status | Description | | ||
| |---------|--------|-------------| | ||
| | [reasoning-gym](https://github.com/open-thought/reasoning-gym) | ✅ Supported | Procedurally generated reasoning tasks (see `reasoning_gym` resource server) | | ||
| | [Aviary](https://github.com/Future-House/aviary) | ✅ Supported | Multi-environment framework for tool-using agents (see `aviary` resource server) | | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. may be worth saying its openai gymnasium compatible (but we should double confirm that statement) Prime intellect - the library is named verifiers, or environments hub, not prime intelelct itself, imo browsergym - not sure if anyone is working on this? @cwing-nvidia ? |
||
| | [PRIME Intellect](https://github.com/PrimeIntellect-ai) | 🔜 In Progress | Distributed AI training environments | | ||
| | [BrowserGym](https://github.com/ServiceNow/BrowserGym) | 🔜 In Progress | Web browsing and automation environments | | ||
|
|
||
| > 💡 **Want to add an integration?** We welcome contributions! See our [Contributing Guide](https://docs.nvidia.com/nemo/gym/latest/contribute/index.html) or [open an issue](https://github.com/NVIDIA-NeMo/Gym/issues) to discuss. | ||
|
|
||
| ## 📋 Requirements | ||
|
|
||
| ### Hardware Requirements | ||
|
|
@@ -138,6 +166,7 @@ Purpose: Demonstrate NeMo Gym patterns and concepts. | |
| | Name | Demonstrates | Config | README | | ||
| | ------------------ | ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- | | ||
| | Multi Step | Multi-step tool calling | <a href='resources_servers/example_multi_step/configs/example_multi_step.yaml'>example_multi_step.yaml</a> | <a href='resources_servers/example_multi_step/README.md'>README</a> | | ||
| | Reasoning Gym | External environment library integration | <a href='resources_servers/reasoning_gym/configs/reasoning_gym.yaml'>reasoning_gym.yaml</a> | <a href='resources_servers/reasoning_gym/README.md'>README</a> | | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i thought these dont go in readme because they dont have hf dataset link, i thought this readme table was built automatically based on that somehow |
||
| | Session State Mgmt | Session state management (in-memory) | <a href='resources_servers/example_session_state_mgmt/configs/example_session_state_mgmt.yaml'>example_session_state_mgmt.yaml</a> | <a href='resources_servers/example_session_state_mgmt/README.md'>README</a> | | ||
| | Single Tool Call | Basic single-step tool calling | <a href='resources_servers/example_single_tool_call/configs/example_single_tool_call.yaml'>example_single_tool_call.yaml</a> | <a href='resources_servers/example_single_tool_call/README.md'>README</a> | | ||
| <!-- END_EXAMPLE_ONLY_SERVERS_TABLE --> | ||
|
|
@@ -152,15 +181,20 @@ Purpose: Training-ready environments with curated datasets. | |
| <!-- START_TRAINING_SERVERS_TABLE --> | ||
| | Resource Server | Domain | Dataset | Description | Value | Config | Train | Validation | License | | ||
| | -------------------------- | --------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------- | ----- | ---------- | --------------------------------------------------------- | | ||
| | Aviary (GSM8K) | agent | <a href='https://arxiv.org/abs/2110.14168'>GSM8K</a> | Grade school math with calculator tool via Aviary integration | Improve math reasoning with tool use | <a href='resources_servers/aviary/configs/gsm8k_aviary.yaml'>config</a> | ✓ | - | MIT | | ||
| | Aviary (HotPotQA) | agent | <a href='https://aclanthology.org/D18-1259/'>HotPotQA</a> | Multi-hop question answering via Aviary integration | Improve multi-hop reasoning capabilities | <a href='resources_servers/aviary/configs/hotpotqa_aviary.yaml'>config</a> | ✓ | - | Apache 2.0 | | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are we starting to enumerate multiple datasets / env implementation in the readme now too? we should do same for math for example too then? @bxyu-nvidia |
||
| | Aviary (BixBench) | agent | <a href='https://arxiv.org/abs/2503.00096'>BixBench</a> | Scientific computational tasks with Jupyter notebook execution | Improve scientific reasoning capabilities | <a href='resources_servers/aviary/configs/bixbench_aviary.yaml'>config</a> | ✓ | - | Apache 2.0 | | ||
| | Calendar | agent | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-agent-calendar_scheduling'>Nemotron-RL-agent-calendar_scheduling</a> | - | - | <a href='resources_servers/calendar/configs/calendar.yaml'>config</a> | ✓ | ✓ | Apache 2.0 | | ||
| | Google Search | agent | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-knowledge-web_search-mcqa'>Nemotron-RL-knowledge-web_search-mcqa</a> | Multi-choice question answering problems with search tools integrated | Improve knowledge-related benchmarks with search tools | <a href='resources_servers/google_search/configs/google_search.yaml'>config</a> | ✓ | - | Apache 2.0 | | ||
| | Math Advanced Calculations | agent | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-math-advanced_calculations'>Nemotron-RL-math-advanced_calculations</a> | An instruction following math environment with counter-intuitive calculators | Improve instruction following capabilities in specific math environments | <a href='resources_servers/math_advanced_calculations/configs/math_advanced_calculations.yaml'>config</a> | ✓ | - | Apache 2.0 | | ||
| | Workplace Assistant | agent | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-agent-workplace_assistant'>Nemotron-RL-agent-workplace_assistant</a> | Workplace assistant multi-step tool-using environment | Improve multi-step tool use capability | <a href='resources_servers/workplace_assistant/configs/workplace_assistant.yaml'>config</a> | ✓ | ✓ | Apache 2.0 | | ||
| | Xlam Function Calling | agent | <a href='https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k'>xlam-function-calling-60k</a> | Function calling training using Salesforce dataset | Improve function calling capabilities | <a href='resources_servers/xlam_fc/configs/xlam_fc.yaml'>config</a> | ✓ | ✓ | Apache 2.0 | | ||
| | Code Gen | coding | <a href='https://huggingface.co/datasets/nvidia/nemotron-RL-coding-competitive_coding'>nemotron-RL-coding-competitive_coding</a> | - | - | <a href='resources_servers/code_gen/configs/code_gen.yaml'>config</a> | ✓ | ✓ | Apache 2.0 | | ||
| | Mini Swe Agent | coding | <a href='https://huggingface.co/datasets/SWE-Gym/SWE-Gym'>SWE-Gym</a> | A software development with mini-swe-agent orchestration | Improve software development capabilities, like SWE-bench | <a href='resources_servers/mini_swe_agent/configs/mini_swe_agent.yaml'>config</a> | ✓ | ✓ | MIT | | ||
| | Instruction Following | instruction_following | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-instruction_following'>Nemotron-RL-instruction_following</a> | Instruction following datasets targeting IFEval and IFBench style instruction following capabilities | Improve IFEval and IFBench | <a href='resources_servers/instruction_following/configs/instruction_following.yaml'>config</a> | ✓ | - | Apache 2.0 | | ||
| | Structured Outputs | instruction_following | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-instruction_following-structured_outputs'>Nemotron-RL-instruction_following-structured_outputs</a> | Check if responses are following structured output requirements in prompts | Improve instruction following capabilities | <a href='resources_servers/structured_outputs/configs/structured_outputs_json.yaml'>config</a> | ✓ | ✓ | Apache 2.0 | | ||
| | Mcqa | knowledge | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-knowledge-mcqa'>Nemotron-RL-knowledge-mcqa</a> | Multi-choice question answering problems | Improve benchmarks like MMLU / GPQA / HLE | <a href='resources_servers/mcqa/configs/mcqa.yaml'>config</a> | ✓ | - | Apache 2.0 | | ||
| | Math With Code | math | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-math-OpenMathReasoning'>Nemotron-RL-math-OpenMathReasoning</a> | Math problems with Python code execution (numpy, scipy, pandas) | Improve math capabilities with code-assisted reasoning | <a href='resources_servers/math_with_code/configs/math_with_code.yaml'>config</a> | ✓ | - | Apache 2.0 | | ||
| | Math With Judge | math | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-math-OpenMathReasoning'>Nemotron-RL-math-OpenMathReasoning</a> | Math dataset with math-verify and LLM-as-a-judge | Improve math capabilities including AIME 24 / 25 | <a href='resources_servers/math_with_judge/configs/math_with_judge.yaml'>config</a> | ✓ | ✓ | Creative Commons Attribution 4.0 International | | ||
| | Math With Judge | math | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-math-stack_overflow'>Nemotron-RL-math-stack_overflow</a> | - | - | <a href='resources_servers/math_with_judge/configs/math_stack_overflow.yaml'>config</a> | ✓ | ✓ | Creative Commons Attribution-ShareAlike 4.0 International | | ||
| <!-- END_TRAINING_SERVERS_TABLE --> | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont think verl is in progres but maybe someone is working on it?
and i think we can change TRL to say supported now, we are just fixing a minor last minute change, and working on additional docs e.g. sample reward/step or a potential blog post.