Skip to content

Conversation

@Agoniii
Copy link
Contributor

@Agoniii Agoniii commented Dec 4, 2025

What does this PR do?

This PR introduces FP8 rollout with sglang inference backend in verl.

Experiments and Outcomes

Qwen3-8B-Base Dense Model

Configuration

  • DAPO recipe. AIME24 online validation.
  • SGLang + FSDP
    • Note that SPMD rollout has been deprecated, so we removed the FP8 SPMD rollout.
  • Prompt batch size 32, n=16.
  • Rollout batch size: 32*3*16
  • Train_batch_size & ppo_mini_batch_size 32
  • Max response length 20K
  • Token-level TIS, C=2
  • 8*H100
  • verlai/verl:sgl055.latest

Accuracy

With TIS, FP8 rollout aligns with BF16
image

Performance
image
image
purple: BF16, red: FP8 rollout

Results and observations:

  • FP8 rollout leads to around ~12% rollout speedup

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces FP8 rollout support for the sglang backend, which is a significant feature enhancement. The changes include adding a new utility file for FP8 quantization, updating the sglang server and rollout worker to handle FP8 configurations, and updating the documentation accordingly. The implementation appears solid. My review focuses on improving maintainability by addressing code duplication and removing unreachable code.

Comment on lines +163 to +176
if weight_block_size is not None:
if torch.distributed.get_rank() == 0:
logger.debug(f" Quantizing to FP8 blockwise: {k}")
param_lp, param_scale = scaled_fp8_blockwise(
v.to(dtype),
weight_block_size=weight_block_size,
)
param_scale = param_scale.squeeze(-1)
weights_quantized.append([k, param_lp])
weights_quantized.append([k + "_scale_inv", param_scale])
else:
raise ValueError(
"Only blockwise quantization is supported. Please set weight_block_size in quant_config"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This else block is unreachable. weight_block_size is checked for None on line 152 before the loop begins, and an exception is raised if it is None. Consequently, the condition weight_block_size is not None on line 163 will always evaluate to true inside the loop, rendering the else branch dead code. Removing the conditional wrapper and the unreachable else block will improve code clarity and maintainability.

            if torch.distributed.get_rank() == 0:
                logger.debug(f"  Quantizing to FP8 blockwise: {k}")
            param_lp, param_scale = scaled_fp8_blockwise(
                v.to(dtype),
                weight_block_size=weight_block_size,
            )
            param_scale = param_scale.squeeze(-1)
            weights_quantized.append([k, param_lp])
            weights_quantized.append([k + "_scale_inv", param_scale])

Comment on lines +133 to +140
assert sglang.__version__ >= "0.5.5", "sglang>=0.5.5 is required for FP8 quantization"
FP8_BLOCK_QUANT_KWARGS = {
"activation_scheme": "dynamic",
"fmt": "e4m3",
"quant_method": "fp8",
"weight_block_size": [128, 128],
}
fp8_block_quant_kwargs = dict(FP8_BLOCK_QUANT_KWARGS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The FP8 quantization configuration logic, including the version check and FP8_BLOCK_QUANT_KWARGS dictionary, is duplicated in verl/workers/rollout/sglang_rollout/sglang_rollout.py. To improve maintainability and prevent future inconsistencies, this logic should be centralized. Consider moving FP8_BLOCK_QUANT_KWARGS to verl/utils/sglang/sglang_fp8_utils.py as a constant and creating a helper function there to encapsulate the version check and config creation.

Comment on lines +1561 to +1568
assert sglang.__version__ >= "0.5.5", "sglang>=0.5.5 is required for FP8 quantization"
FP8_BLOCK_QUANT_KWARGS = {
"activation_scheme": "dynamic",
"fmt": "e4m3",
"quant_method": "fp8",
"weight_block_size": [128, 128],
}
fp8_block_quant_kwargs = dict(FP8_BLOCK_QUANT_KWARGS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The FP8 quantization configuration logic, including the version check and FP8_BLOCK_QUANT_KWARGS dictionary, is duplicated in verl/workers/rollout/sglang_rollout/async_sglang_server.py. To improve maintainability and prevent future inconsistencies, this logic should be centralized. Consider moving FP8_BLOCK_QUANT_KWARGS to verl/utils/sglang/sglang_fp8_utils.py as a constant and creating a helper function there to encapsulate the version check and config creation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants