Skip to content

APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation. A system-level optimization for scalable LLM training.

License

Notifications You must be signed in to change notification settings

RLsys-Foundation/APRIL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

APRIL Logo

APRIL: Active Partial Rollouts in Reinforcement Learning

Accelerating LLM Training by Taming Long-tail Generation

License Python 3.8+ PyTorch DeepWiki

πŸš€ Overview

APRIL (Active Partial Rollouts) is a compute-efficient method to accelerate rollout generation in reinforcement learning training for Large Language Models (LLMs). By addressing the critical "long-tail" problem in RL training where a few samples with exceptionally long responses cause the entire batch to stall, APRIL delivers:

  • 20-35% improvement in rollout throughput
  • 2-5% higher final model accuracy
  • Faster convergence during training
  • Hardware agnostic - supports both NVIDIA and AMD GPUs

The Problem: Long-tail Generation Bottleneck

In on-policy RL training (RLHF/GRPO/DAPO), the rollout phase dominates runtime, typically accounting for over 90% of total training time. Due to the highly variable response lengths across samples, synchronous training paradigms suffer from severe GPU underutilization as faster-generating workers sit idle waiting for the longest-running instances to complete.

Our Solution: Active Partial Rollouts

APRIL revolutionizes rollout efficiency through an innovative mechanism:

  1. Over-provisioning: Deliberately initiate more rollout requests than needed (N' > N)
  2. Active interruption: Once the target batch size is reached, actively stop remaining unfinished rollouts
  3. Intelligent recycling: Store partial results in a buffer and resume generation in the next iteration
  4. Seamless integration: Works with existing RL frameworks without modifying inference kernels

scheduling

✨ Key Features

  • πŸ”₯ Plug-and-play: Enable with just two command-line flags (--partial-rollout and --over-sampling-batch-size)
  • 🎯 Algorithm-agnostic: Compatible with GRPO, DAPO, GSPO, and other popular RL algorithms
  • πŸ—οΈ Framework-ready: Already integrated into slime framework
  • ⚑ System-level optimization: Operates at the scheduling layer, complementary to kernel-level optimizations
  • πŸ”§ Production-tested: Evaluated on multiple LLMs including DeepSeek-R1, Qwen3, and GLM-4

πŸ› οΈ Installation

Quick Start with Docker

For AMD GPUs:

docker run --rm --gpus all --ipc=host --shm-size=16g \
  --ulimit memlock=-1 --ulimit stack=67108864 \
  -it rlsys/slime:slime_ubuntu22.04_rocm6.3.4-patch-numa-patch_sglang0.4.9_megatron-patch_ray2.47.1_apex_torch-memory-saver0.0.8-patch-vim /bin/bash

For NVIDIA GPUs:

See NVIDIA setup guide

Install APRIL

git clone https://github.com/RLsys-Foundation/APRIL.git
cd APRIL
pip install -e .

🚦 Quick Start

Basic Usage

Run a training example with APRIL enabled:

# Example: Qwen3-4B with DAPO
bash scripts/partial_rollout/qwen/grpo/run-qwen3-4B-dapo-partial.sh

Key Parameters

# Enable APRIL optimization
--partial-rollout

# Set over-sampling batch size (should be > rollout_batch_size)
--over-sampling-batch-size 64  # e.g., 2x the rollout_batch_size

# Standard rollout batch size
--rollout-batch-size 32

Advanced Configuration

For detailed parameter explanations, see arguments.py.

πŸ“Š Performance Results

Throughput Improvements

Dataset Model Algorithm Throughput Gain Accuracy Improvement
DAPO-Math-17k Qwen3-4B DAPO +17% +2.3%
DeepScaleR Qwen3-4B GRPO +21% +3.1%
DeepMath-103K Qwen3-4B GSPO +35% +4.7%
Agent Tasks DeepSeek-1.5B GRPO +23% +2.8%

Convergence Analysis

evaluation

APRIL not only improves training efficiency but also achieves:

  • Faster convergence: Reaches target accuracy 15-20% faster
  • Higher final accuracy: 2-5% improvement in final model performance
  • Stable training: No additional instability despite partial off-policy samples

πŸ—οΈ Architecture

System Design

graph TB
    subgraph Pipeline["🎯 APRIL Training Pipeline"]
        subgraph Rollout["πŸ“Š Rollout Phase"]
            R1[("🎲 Over-provision<br/>N' > N requests")]
            R2[("⚑ SGLang<br/>Inference Engine")]
            R3[("πŸ›‘ Active<br/>Interruption")]
            R1 --> R2
            R2 --> R3
        end

        subgraph Buffer["πŸ’Ύ Buffer Management"]
            B1[("πŸ“¦ Partial<br/>Rollouts")]
            B2[("♻️ Resume<br/>Queue")]
            B3[("βœ… Complete<br/>Samples")]
            B1 --> B2
            R3 --> B1
            R3 --> B3
        end

        subgraph Training["🧠 Training Phase"]
            T1[("πŸ”„ Policy<br/>Update")]
            T2[("πŸ“ˆ Loss<br/>Computation")]
            T3[("βš™οΈ Megatron/<br/>FSDP Backend")]
            B3 --> T2
            T2 --> T1
            T1 --> T3
        end

        B2 -.->|Next Iteration| R1
        T3 -.->|Updated Model| R2
    end

    style Pipeline fill:#f9f9ff,stroke:#4a5568,stroke-width:2px
    style Rollout fill:#e6f7ff,stroke:#1890ff,stroke-width:1px
    style Buffer fill:#fff7e6,stroke:#fa8c16,stroke-width:1px
    style Training fill:#f0f5ff,stroke:#597ef7,stroke-width:1px

    style R1 fill:#e6f7ff,stroke:#40a9ff
    style R2 fill:#e6f7ff,stroke:#40a9ff
    style R3 fill:#e6f7ff,stroke:#40a9ff
    style B1 fill:#fff7e6,stroke:#ffa940
    style B2 fill:#fff7e6,stroke:#ffa940
    style B3 fill:#fff7e6,stroke:#ffa940
    style T1 fill:#f0f5ff,stroke:#85a5ff
    style T2 fill:#f0f5ff,stroke:#85a5ff
    style T3 fill:#f0f5ff,stroke:#85a5ff
Loading

Core Components

Component Path Description
Rollout Engine slime/rollout/sglang_example.py Manages generation with active interruption
Buffer System slime/ray/buffer.py Stores and prioritizes partial rollouts
Scheduler slime/ray/rollout.py Orchestrates over-sampling and batch management
Training Backend slime/backends/ Supports both Megatron and FSDP

❓ FAQ

Q: Does APRIL affect training stability?

While APRIL introduces ~40% off-policy tokens per iteration, extensive experiments show:

  • No significant training instability
  • Improved final model accuracy
  • Consistent convergence patterns

Note: For extremely long sequences (e.g., multi-turn agent tasks), additional validation may be needed.

Q: Is APRIL compatible with other optimizations?

Yes! APRIL operates at the system scheduling layer and is fully compatible with:

  • Kernel optimizations (FlashAttention, continuous batching)
  • Inference engines (vLLM, SGLang, TensorRT-LLM)
  • Speculative decoding techniques
  • Model parallelism strategies

Q: What hardware is supported?

APRIL is hardware-agnostic and tested on:

  • NVIDIA GPUs: H100
  • AMD GPUs: MI300X

πŸ“ Repository Structure

APRIL/
β”œβ”€β”€ imgs/                           # Documentation images
β”‚   β”œβ”€β”€ APRIL.png                  # Project logo
β”‚   └── partial_scheduling.png     # Architecture diagrams
β”œβ”€β”€ scripts/
β”‚   └── partial_rollout/           # Training scripts
β”‚       β”œβ”€β”€ deepseek/              # DeepSeek model experiments
β”‚       β”œβ”€β”€ qwen/                  # Qwen model experiments
β”‚       └── README.md              # Script documentation
β”œβ”€β”€ slime/                         # Core framework
β”‚   β”œβ”€β”€ backends/                  # Training backends
β”‚   β”‚   β”œβ”€β”€ fsdp_utils/           # FSDP implementation
β”‚   β”‚   └── megatron_utils/       # Megatron-LM support
β”‚   β”œβ”€β”€ rollout/
β”‚   β”‚   β”œβ”€β”€ sglang_example.py    # Core rollout implementation
β”‚   β”‚   └── rm_hub/               # Reward model integrations
β”‚   β”œβ”€β”€ ray/                      # Distributed orchestration
β”‚   β”‚   β”œβ”€β”€ buffer.py             # Partial rollout buffer
β”‚   β”‚   └── rollout.py            # Rollout scheduling
β”‚   └── utils/                    # Utilities and helpers
β”œβ”€β”€ docs/                         # Documentation
β”‚   β”œβ”€β”€ en/                       # English docs
β”‚   └── zh/                       # Chinese docs
└── tools/                        # Model conversion utilities

πŸ”¬ Technical Details

How APRIL Works

  1. Over-provisioning Phase: Request N' = Ξ±N rollouts (Ξ± typically 1.5-2.0)
  2. Active Monitoring: Track completion status across all workers
  3. Intelligent Interruption: Send abort signal when N samples complete
  4. Buffer Management: Store partial results with generation state
  5. Seamless Resumption: Continue partial rollouts in next iteration

Integration with Existing Frameworks

APRIL is designed as a drop-in enhancement for existing RL training pipelines:

  • Minimal code changes: Enable with command-line flags
  • Framework agnostic: Works with OpenRLHF, verl, Areal, slime
  • Automatic optimization: Self-tuning based on workload characteristics

πŸ“š Citation

If you use APRIL in your research, please cite our paper:

@article{april2025,
  title={APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation},
  author={RLsys Foundation Team},
  journal={arXiv preprint},
  year={2025}
}

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

πŸ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

πŸ™ Acknowledgments

APRIL builds upon the excellent work of:

  • slime - The base RL training framework
  • SGLang - High-performance inference backend
  • Megatron-LM - Distributed training backend

πŸ“¬ Contact

For questions and support:


Built with ❀️ by the RLsys Foundation Team

About

APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation. A system-level optimization for scalable LLM training.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5