Skip to content

Conversation

parthchadha
Copy link
Contributor

@parthchadha parthchadha commented Sep 28, 2025

… correction is required for async convergence

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • Documentation
    • Added “Why Importance Sampling Correction is required for Async” to the Async GRPO guide.
    • Explains the GRPO objective, distribution mismatch from generator-driven trajectories, and the need for importance sampling correction.
    • Describes the correction term and how the objective changes when importance sampling correction is enabled.
    • Improves conceptual clarity for users implementing async training.
    • No code or public API changes.

… correction is required for async convergence

Signed-off-by: Parth Chadha <[email protected]>
@parthchadha parthchadha requested a review from a team as a code owner September 28, 2025 22:16
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Sep 28, 2025
@parthchadha parthchadha changed the title Update async grpo doc to include a section on why importance sampling… doc: async doc update for importance sampling correction Sep 28, 2025
Copy link
Contributor

coderabbitai bot commented Sep 28, 2025

📝 Walkthrough

Walkthrough

Adds a new subsection to docs/guides/async-grpo.md explaining the need for importance sampling correction in asynchronous GRPO, detailing the objective, distribution mismatch from generator policies, and how the objective adjusts when use_importance_sampling_correction is enabled. No code or API changes.

Changes

Cohort / File(s) Summary
Docs: Async GRPO importance sampling
docs/guides/async-grpo.md
Added subsection describing GRPO objective, generator-policy-induced distribution mismatch, and the importance sampling correction term; includes how the objective changes when use_importance_sampling_correction is enabled.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

  • docs: update grpo.md #1106 — Updates GRPO documentation to explain importance-sampling/pi-ratio correction for distribution mismatch, overlapping in topic and scope.

Suggested labels

documentation

Suggested reviewers

  • terrykong

Pre-merge checks and finishing touches

✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
Test Results For Major Changes ✅ Passed The PR only adds explanatory documentation to docs/guides/async-grpo.md without modifying code or public APIs, so it qualifies as a minor change. Under the custom check, minor documentation-only updates do not require test results to be provided in the PR description. Therefore, the absence of testing information does not violate the check’s criteria.
Title Check ✅ Passed The title “docs: async doc update for importance sampling correction” clearly and concisely summarizes the primary change of this PR, indicating that the documentation for the asynchronous GRPO implementation is being updated to include the importance sampling correction. It follows standard commit message conventions and is specific enough for readers to understand the PR’s focus.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch async-doc-update

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@terrykong terrykong requested a review from jgerh September 29, 2025 03:28
@terrykong terrykong changed the title doc: async doc update for importance sampling correction docs: async doc update for importance sampling correction Sep 29, 2025
Copy link
Contributor

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. @jgerh can you review?

Copy link
Contributor

@jgerh jgerh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed tech pubs review and provided a few comments.


3. **Resource Allocation**: Ensure sufficient GPU memory for both the training and generation clusters

## Why Importance Sampling Correction is required for Async
Copy link
Contributor

@jgerh jgerh Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 125 To render the mermaid diagram correctly, use the MyST directive syntax ```{mermaid}

mermaid

Line 159

Suggested change
## Why Importance Sampling Correction is required for Async
## Why Importance Sampling Correction Is Required for Async

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation r0.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants