APRIL (Active Partial Rollouts) is a compute-efficient method to accelerate rollout generation in reinforcement learning training for Large Language Models (LLMs). By addressing the critical "long-tail" problem in RL training where a few samples with exceptionally long responses cause the entire batch to stall, APRIL delivers:
- 20-35% improvement in rollout throughput
- 2-5% higher final model accuracy
- Faster convergence during training
- Hardware agnostic - supports both NVIDIA and AMD GPUs
In on-policy RL training (RLHF/GRPO/DAPO), the rollout phase dominates runtime, typically accounting for over 90% of total training time. Due to the highly variable response lengths across samples, synchronous training paradigms suffer from severe GPU underutilization as faster-generating workers sit idle waiting for the longest-running instances to complete.
APRIL revolutionizes rollout efficiency through an innovative mechanism:
- Over-provisioning: Deliberately initiate more rollout requests than needed (N' > N)
- Active interruption: Once the target batch size is reached, actively stop remaining unfinished rollouts
- Intelligent recycling: Store partial results in a buffer and resume generation in the next iteration
- Seamless integration: Works with existing RL frameworks without modifying inference kernels
- π₯ Plug-and-play: Enable with just two command-line flags (
--partial-rollout
and--over-sampling-batch-size
) - π― Algorithm-agnostic: Compatible with GRPO, DAPO, GSPO, and other popular RL algorithms
- ποΈ Framework-ready: Already integrated into slime framework
- β‘ System-level optimization: Operates at the scheduling layer, complementary to kernel-level optimizations
- π§ Production-tested: Evaluated on multiple LLMs including DeepSeek-R1, Qwen3, and GLM-4
docker run --rm --gpus all --ipc=host --shm-size=16g \
--ulimit memlock=-1 --ulimit stack=67108864 \
-it rlsys/slime:slime_ubuntu22.04_rocm6.3.4-patch-numa-patch_sglang0.4.9_megatron-patch_ray2.47.1_apex_torch-memory-saver0.0.8-patch-vim /bin/bash
git clone https://github.com/RLsys-Foundation/APRIL.git
cd APRIL
pip install -e .
Run a training example with APRIL enabled:
# Example: Qwen3-4B with DAPO
bash scripts/partial_rollout/qwen/grpo/run-qwen3-4B-dapo-partial.sh
# Enable APRIL optimization
--partial-rollout
# Set over-sampling batch size (should be > rollout_batch_size)
--over-sampling-batch-size 64 # e.g., 2x the rollout_batch_size
# Standard rollout batch size
--rollout-batch-size 32
For detailed parameter explanations, see arguments.py.
Dataset | Model | Algorithm | Throughput Gain | Accuracy Improvement |
---|---|---|---|---|
DAPO-Math-17k | Qwen3-4B | DAPO | +17% | +2.3% |
DeepScaleR | Qwen3-4B | GRPO | +21% | +3.1% |
DeepMath-103K | Qwen3-4B | GSPO | +35% | +4.7% |
Agent Tasks | DeepSeek-1.5B | GRPO | +23% | +2.8% |
APRIL not only improves training efficiency but also achieves:
- Faster convergence: Reaches target accuracy 15-20% faster
- Higher final accuracy: 2-5% improvement in final model performance
- Stable training: No additional instability despite partial off-policy samples
graph TB
subgraph Pipeline["π― APRIL Training Pipeline"]
subgraph Rollout["π Rollout Phase"]
R1[("π² Over-provision<br/>N' > N requests")]
R2[("β‘ SGLang<br/>Inference Engine")]
R3[("π Active<br/>Interruption")]
R1 --> R2
R2 --> R3
end
subgraph Buffer["πΎ Buffer Management"]
B1[("π¦ Partial<br/>Rollouts")]
B2[("β»οΈ Resume<br/>Queue")]
B3[("β
Complete<br/>Samples")]
B1 --> B2
R3 --> B1
R3 --> B3
end
subgraph Training["π§ Training Phase"]
T1[("π Policy<br/>Update")]
T2[("π Loss<br/>Computation")]
T3[("βοΈ Megatron/<br/>FSDP Backend")]
B3 --> T2
T2 --> T1
T1 --> T3
end
B2 -.->|Next Iteration| R1
T3 -.->|Updated Model| R2
end
style Pipeline fill:#f9f9ff,stroke:#4a5568,stroke-width:2px
style Rollout fill:#e6f7ff,stroke:#1890ff,stroke-width:1px
style Buffer fill:#fff7e6,stroke:#fa8c16,stroke-width:1px
style Training fill:#f0f5ff,stroke:#597ef7,stroke-width:1px
style R1 fill:#e6f7ff,stroke:#40a9ff
style R2 fill:#e6f7ff,stroke:#40a9ff
style R3 fill:#e6f7ff,stroke:#40a9ff
style B1 fill:#fff7e6,stroke:#ffa940
style B2 fill:#fff7e6,stroke:#ffa940
style B3 fill:#fff7e6,stroke:#ffa940
style T1 fill:#f0f5ff,stroke:#85a5ff
style T2 fill:#f0f5ff,stroke:#85a5ff
style T3 fill:#f0f5ff,stroke:#85a5ff
Component | Path | Description |
---|---|---|
Rollout Engine | slime/rollout/sglang_example.py |
Manages generation with active interruption |
Buffer System | slime/ray/buffer.py |
Stores and prioritizes partial rollouts |
Scheduler | slime/ray/rollout.py |
Orchestrates over-sampling and batch management |
Training Backend | slime/backends/ |
Supports both Megatron and FSDP |
While APRIL introduces ~40% off-policy tokens per iteration, extensive experiments show:
- No significant training instability
- Improved final model accuracy
- Consistent convergence patterns
Note: For extremely long sequences (e.g., multi-turn agent tasks), additional validation may be needed.
Yes! APRIL operates at the system scheduling layer and is fully compatible with:
- Kernel optimizations (FlashAttention, continuous batching)
- Inference engines (vLLM, SGLang, TensorRT-LLM)
- Speculative decoding techniques
- Model parallelism strategies
APRIL is hardware-agnostic and tested on:
- NVIDIA GPUs: H100
- AMD GPUs: MI300X
APRIL/
βββ imgs/ # Documentation images
β βββ APRIL.png # Project logo
β βββ partial_scheduling.png # Architecture diagrams
βββ scripts/
β βββ partial_rollout/ # Training scripts
β βββ deepseek/ # DeepSeek model experiments
β βββ qwen/ # Qwen model experiments
β βββ README.md # Script documentation
βββ slime/ # Core framework
β βββ backends/ # Training backends
β β βββ fsdp_utils/ # FSDP implementation
β β βββ megatron_utils/ # Megatron-LM support
β βββ rollout/
β β βββ sglang_example.py # Core rollout implementation
β β βββ rm_hub/ # Reward model integrations
β βββ ray/ # Distributed orchestration
β β βββ buffer.py # Partial rollout buffer
β β βββ rollout.py # Rollout scheduling
β βββ utils/ # Utilities and helpers
βββ docs/ # Documentation
β βββ en/ # English docs
β βββ zh/ # Chinese docs
βββ tools/ # Model conversion utilities
- Over-provisioning Phase: Request N' = Ξ±N rollouts (Ξ± typically 1.5-2.0)
- Active Monitoring: Track completion status across all workers
- Intelligent Interruption: Send abort signal when N samples complete
- Buffer Management: Store partial results with generation state
- Seamless Resumption: Continue partial rollouts in next iteration
APRIL is designed as a drop-in enhancement for existing RL training pipelines:
- Minimal code changes: Enable with command-line flags
- Framework agnostic: Works with OpenRLHF, verl, Areal, slime
- Automatic optimization: Self-tuning based on workload characteristics
If you use APRIL in your research, please cite our paper:
@article{april2025,
title={APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation},
author={RLsys Foundation Team},
journal={arXiv preprint},
year={2025}
}
We welcome contributions! Please see our Contributing Guide for details.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
APRIL builds upon the excellent work of:
- slime - The base RL training framework
- SGLang - High-performance inference backend
- Megatron-LM - Distributed training backend
For questions and support:
- Open an issue on GitHub