fairscale is a PyTorch extension library for high performance and large scale training.
fairscale supports:
- pipeline parallelism (fairscale.nn.Pipe)
- tensor parallelism (fairscale.nn.model_parallel)
- optimizer state sharding (fairscale.optim.oss)
Run a 4-layer model on 2 GPUs. The first two layers run on cuda:0 and the next two layers run on cuda:1.
import torch
import fairscale
model = torch.nn.Sequential(a, b, c, d)
model = fairscale.nn.Pipe(model, balance=[2, 2], devices=[0, 1], chunks=8)- PyTorch >= 1.4
Normal installation:
pip install .Development mode:
pip install -e .See the CONTRIBUTING file for how to help out.
fairscale is licensed under the BSD-3-Clause License.
fairscale.nn.pipe is forked from torchgpipe, Copyright 2019, Kakao Brain, licensed under Apache License.
fairscale.nn.model_parallel is forked from Megatron-LM, Copyright 2020, NVIDIA CORPORATION, licensed under Apache License.
Here is a list of all authors on relevant research papers this work is based on:
- torchgpipe: Chiheon Kim, Heungsub Lee, Myungryong Jeong, Woonhyuk Baek, Boogeon Yoon, Ildoo Kim, Sungbin Lim, Sungwoong Kim. [Paper] [Code]
- ZeRO: Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He. [Paper] [Code]
- Megatron-LM: Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro. [Paper][Code]