-
Notifications
You must be signed in to change notification settings - Fork 146
fix: Fix OOM in validation during colocated training #1159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Jarno Seppänen <[email protected]>
📝 WalkthroughWalkthroughAdds guarded calls to policy.offload_after_refit() before policy_generation.prepare_for_generation() in grpo_train when colocated inference is enabled, in two code paths. No other logic, signatures, or error handling changed. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant T as GRPO Trainer
participant P as Policy
participant G as PolicyGeneration
rect rgba(230,240,255,0.5)
note over T,P: Training step (colocated inference enabled)
T->>P: refit/update (optimizer in memory)
alt Colocated inference
T->>P: offload_after_refit()
note right of P: Unload optimizer to free memory
end
T->>G: prepare_for_generation()
G-->>T: ready
T->>G: generate()
G-->>T: samples
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Pre-merge checks and finishing touches✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
nemo_rl/algorithms/grpo.py (2)
621-623
: Gate offload to vLLM backend; clarify commentUnconditionally offloading on any colocated setup may cause unnecessary CPU↔GPU churn for Megatron-backed generation. Restrict to vLLM where the OOM was observed, and tighten the comment.
- if colocated_inference: - policy.offload_after_refit() # unload optimizer to make space for generation + if colocated_inference and master_config["policy"]["generation"]["backend"] == "vllm": + policy.offload_after_refit() # offload model/optimizer buffers to CPU to free GPU memory for vLLM generation
775-777
: Same gating for validation pathApply the same vLLM-only guard here to avoid device thrash on Megatron.
- if colocated_inference: - policy.offload_after_refit() # unload optimizer to make space for generation + if colocated_inference and master_config["policy"]["generation"]["backend"] == "vllm": + policy.offload_after_refit() # offload model/optimizer buffers to CPU to free GPU memory for vLLM generation
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
nemo_rl/algorithms/grpo.py
(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
nemo_rl/algorithms/grpo.py (4)
nemo_rl/models/policy/lm_policy.py (1)
offload_after_refit
(581-584)nemo_rl/models/policy/megatron_policy_worker.py (1)
offload_after_refit
(1759-1780)nemo_rl/models/policy/dtensor_policy_worker.py (1)
offload_after_refit
(1437-1460)nemo_rl/models/policy/interfaces.py (1)
offload_after_refit
(119-120)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Lint check
- GitHub Check: Post submodule check comment / Comment on PR
- GitHub Check: Post automodel integration comment / Comment on PR
What does this PR do ?
Fix OOM after validation when training with colocation enabled.
Currently optimizer state is not offloaded from GPU memory when refit is not triggered (e.g. after validation), which causes vLLM to run out of memory in the next generation after validation.
Summary by CodeRabbit
Performance
Chores