GitHub - ParCIS/Chimera: Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.

Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines

Bidirectional pipeline parallelism Chimera is pulished in SC'21, Best Paper Finalist. See the paper and the video talk for more details.

Data preparation

https://github.com/microsoft/AzureML-BERT/blob/master/docs/dataprep.md

Please store wikipedia.segmented.nltk.txt file under the bert_data/ directory.

Installation

pip install -r requirements.txt

For training, we use apex.optimizers.FusedLAMB of NVIDIA's Apex library. Please follow the instruction for installing apex.

For profiling, we use NVIDIA Nsight Systems. Please make sure you can execute nsys command.

Our scripts are intended to run through the SLURM workload manager on a GPU cluster with 1 GPU per node.

Profiling Chimera with 8 stages for BERT-Large on 8 GPUs

sbatch scripts/prof_steps.sh

sh scripts/plot_cuda_timeline.sh

output: bert_prof/bert-large_chimera_8stages_8gpus_microbs32_acc1.pdf

Publication

To cite our work:

@inproceedings{li143,
  author = {Li, Shigang and Hoefler, Torsten},
  title = {Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines},
  year = {2021},
  isbn = {9781450384421},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3458817.3476145},
  doi = {10.1145/3458817.3476145},
  booktitle = {Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},
  articleno = {27},
  numpages = {14},
  location = {St. Louis, Missouri},
  series = {SC '21}
}

License

See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
bert_data		bert_data
configs		configs
prof		prof
scripts		scripts
ChimeraThumbnail.png		ChimeraThumbnail.png
LICENSE		LICENSE
README.md		README.md
auto_schedule.py		auto_schedule.py
bert_dataset.py		bert_dataset.py
bert_model.py		bert_model.py
bert_optim.py		bert_optim.py
chimera_pipeline_rank.py		chimera_pipeline_rank.py
main_bert.py		main_bert.py
main_bert_simple.py		main_bert_simple.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt
threadsafe_counter.py		threadsafe_counter.py
threadsafe_queue.py		threadsafe_queue.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines

Data preparation

Installation

Profiling Chimera with 8 stages for BERT-Large on 8 GPUs

Publication

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ParCIS/Chimera

Folders and files

Latest commit

History

Repository files navigation

Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines

Data preparation

Installation

Profiling Chimera with 8 stages for BERT-Large on 8 GPUs

Publication

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages