This repository contains the official implementation for the paper:
PIPEMESH: Achieving Memory-Efficient Computation-Communication Overlap for Training Large Language Models
This implementation is based on Megatron-LM (commit 20574f7553e66dbb3e8de72ca6c26c9faa2e1b18).
PipeMesh introduces two new arguments to control the elastic pipeline schedule:
--forward-groups <group_1> <group_2> ...
--backward-groups <group_1> <group_2> ...Each argument defines the enqueue (forward) and dequeue (backward) sequences used for micro-batch scheduling within each pipeline group.
For example, for 12 micro-batches per pipeline group, you can use:
--forward-groups 8 4 \
--backward-groups 6 6This configuration enqueues 8 micro-batches first (for the forward pass) and dequeues 6 micro-batches at a time during the backward pass, achieving fine-grained overlap between computation and communication.