-
Notifications
You must be signed in to change notification settings - Fork 144
docs: async doc update for importance sampling correction #1222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… correction is required for async convergence Signed-off-by: Parth Chadha <[email protected]>
📝 WalkthroughWalkthroughAdds a new subsection to docs/guides/async-grpo.md explaining the need for importance sampling correction in asynchronous GRPO, detailing the objective, distribution mismatch from generator policies, and how the objective adjusts when use_importance_sampling_correction is enabled. No code or API changes. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. @jgerh can you review?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completed tech pubs review and provided a few comments.
|
||
3. **Resource Allocation**: Ensure sufficient GPU memory for both the training and generation clusters | ||
|
||
## Why Importance Sampling Correction is required for Async |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… correction is required for async convergence
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use this
Before your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit