[trainer] fix: data balance dp tokens logic with sequence parallelism #3691

puneeshkhanna · 2025-10-07T12:43:53Z

What does this PR do?

Fixes data balance dp tokens logic when sequence parallelism is enabled. valid token this rank is all reduced over all ranks (including SP ranks as an example considering same tensor values twice when SP is 2) so we need to scale by full world size else loss plot will be incorrectly half of the values for SP-2 in comparison to SP-1 loss plot when data balance dp token is set to True.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

Verified loss plot of SP-1 vs SP-2 with data balance DP tokens and they match.
Without the fix, SP-2 loss will be half of SP-1 loss.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

puneeshkhanna · 2025-10-07T12:44:40Z

@vermouth1992 - please review.

gemini-code-assist

Code Review

This pull request correctly fixes a bug in the loss calculation when data_balance_dp_token and sequence parallelism are enabled. The original logic used an incorrect scaling factor for the loss, causing it to be smaller when sequence parallelism was active. The fix simplifies the code by consistently using the total world size for scaling, which ensures that the calculated loss and resulting gradients are correct and consistent regardless of the sequence parallelism size. My analysis confirms that with this change, both the gradient updates and the logged loss values will be consistent across different sequence parallelism configurations, resolving the issue described.

puneeshkhanna · 2025-10-08T07:36:22Z

@vermouth1992 - Did you get a chance to check this ? I think very important for loss plot to be correct with data balance DP tokens when SP is enabled.

puneeshkhanna · 2025-11-03T10:54:35Z

@vermouth1992 @eric-haibin-lin - Did you get a chance to review this one ?

Fix data balance dp tokens logic with sequence parallelism

0caf1b7

puneeshkhanna requested review from PeterSH6, eric-haibin-lin, tongyx361 and vermouth1992 as code owners October 7, 2025 12:43

gemini-code-assist bot reviewed Oct 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[trainer] fix: data balance dp tokens logic with sequence parallelism #3691

[trainer] fix: data balance dp tokens logic with sequence parallelism #3691

Uh oh!

puneeshkhanna commented Oct 7, 2025 •

edited

Loading

Uh oh!

puneeshkhanna commented Oct 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

puneeshkhanna commented Oct 8, 2025

Uh oh!

puneeshkhanna commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[trainer] fix: data balance dp tokens logic with sequence parallelism #3691

Are you sure you want to change the base?

[trainer] fix: data balance dp tokens logic with sequence parallelism #3691

Uh oh!

Conversation

puneeshkhanna commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

puneeshkhanna commented Oct 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

puneeshkhanna commented Oct 8, 2025

Uh oh!

puneeshkhanna commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

puneeshkhanna commented Oct 7, 2025 •

edited

Loading