Quick Start

A flexible and efficient reinforcement learning framework for large language models(LLMs).

English | 中文

Latest News 🔥

[2025/9] Support for Agentic RL tasks. documentation 🔥
[2025/9] Support for Vision-Language model RL tasks. documentation 🔥
[2025/8] We support GSPO on Mcore! 🔥
[2025/7] We give a reinforcement learning training example for DeepSeek-V3-671B based on Mcore! 🔥
[2025/7] We give reinforcement learning training examples for Qwen3-235B-A22B based on Mcore and FSDP2!
[2025/7] Training now supports the FSDP2 framework! We support sequence packing, sequence parallelism, and group GEMM for efficient and user-friendly reinforcement learning training!
[2025/5] We support Mcore frameworks for training! By using Mcore and vLLM, we give a tutorial about end-2-end GRPO training for Qwen3!
[2025/5] We support FSDP frameworks for training! By using FSDP and vLLM, we give a tutorial about end-2-end GRPO training for Qwen3!
[2024/8] We officially released ChatLearn! Check out our documentation.

ChatLearn is a large-scale reinforcement learning training framework for LLMs developed by the Alibaba Cloud PAI platform.

Chatlearn has the following advantages:

🚀User-friendly programming interface: Users can focus on programming individual models by wrapping a few functions, while the system takes care of resource scheduling, data and control flow transmission, and distributed execution.
🔧Highly Scalable Training Methodology: ChatLearn supports user-defined model execution flows, making customized training processes more flexible and convenient.
🔄Diverse Distributed Acceleration Engines: ChatLearn supports industry-leading SOTA training (FSDP2, Megatron) and inference engines (vLLM, SGLang), delivering exceptional training throughput performance.
🎯Flexible Parallel Strategies and Resource Allocation: ChatLearn supports different parallel strategies for various model configurations, enabling the formulation of distinct parallel approaches tailored to each model's computational, memory, and communication characteristics. Additionally, ChatLearn features a flexible resource scheduling mechanism that accommodates exclusive or shared use of resources across models. Through its system scheduling policies, it facilitates efficient serial/parallel execution and optimized GPU memory sharing, enhancing overall performance and efficiency.
⚡High performance: Compared to current SOTA systems, ChatLearn achieves a 52% performance improvement at the 7B+7B (Policy+Reward) scale and a 137% performance improvement at the 70B+70B scale. Meanwhile, ChatLearn supports reinforcement learning training at scales exceeding 600B parameters.

Quick Start

Please refer to the documentation for a quick start.

Feature List

Supports training engines such as Megatron and FSDP
Supports inference engines including vLLM and SGLang, controlled via the runtime_args.rollout_engine parameter
Supports reinforcement learning algorithms such as GRPO and GSPO
Supports experiment monitoring with wandb and tensorboard
Supports training acceleration techniques such as sequence packing, Ulysses sequence parallelism, and Group GEMM

Performance

We compared the RLHF training throughput of models with different parameter scales, adopting an N+N model configuration where both the Policy model and the Reward model have the same number of parameters. We benchmarked against DeepSpeed-Chat and OpenRLHF with 7B and 70B model configurations. For the 8 GPU setup with a 7B+7B scale, we achieved a 115% speedup; for the 32 GPU setup with a 70B+70B scale, the speedup was 208%. The larger the scale, the more pronounced the acceleration effect becomes. Additionally, ChatLearn can support even larger-scale reinforcement learning, such as at a 600B scale.

Note: The performance of DeepSpeed-Chat and OpenRLHF has already been optimized.

Roadmap

The upcoming features for ChatLearn include:

Simplify Configuration Settings
Support tutorials for the RL training of MoE (Mixture of Experts) models
Support for more models
Performance Optimization
Support for more RL algorithms

We are continuously hiring and welcome you to contact us or submit your resume to email.

Name		Name	Last commit message	Last commit date
Latest commit History 334 Commits
.github		.github
chatlearn		chatlearn
docker/torch		docker/torch
docs		docs
examples		examples
scripts		scripts
template		template
tests		tests
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_CN.md		README_CN.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A flexible and efficient reinforcement learning framework for large language models(LLMs).

Quick Start

Feature List

Performance

Roadmap

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors 22

Uh oh!

Languages

License

alibaba/ChatLearn

Folders and files

Latest commit

History

Repository files navigation

A flexible and efficient reinforcement learning framework for large language models(LLMs).

Quick Start

Feature List

Performance

Roadmap

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors 22

Uh oh!

Languages

Packages