Skip to content

Commit

Permalink
add multiple machine results in README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
feifeibear committed Dec 21, 2021
1 parent 6493e51 commit a5a8e6a
Show file tree
Hide file tree
Showing 4 changed files with 11 additions and 1 deletion.
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,16 @@ We also evaluated PatrickStar v0.4.3 on a single node of A100 SuperPod. It is ab

Detail benchmark results on WeChat AI data center as well as NVIDIA SuperPod are posted on this [Google Doc](https://docs.google.com/spreadsheets/d/136CWc_jA_2zC4h1r-6dzD4PrOvp6aw6uCDchEyQv6sE/edit?usp=sharing).


Scale PatrickStar to multiple machine (node) on SuperPod.
We succeed to train a GPT3-175B on 32 GPU. As far as we known, it is the first work
to run GPT3 on such small GPU cluster.
Microsoft used 10,000 V100 to pertrain GPT3.
Now you can finetune it or even pretrain your own one on 32 A100 GPU, amazing!

![alt perf](./doc/m_node_superpod.png "performance testing result on multiple Node of SuperNode")


We've also trained the [CLUE-GPT2](https://huggingface.co/uer/gpt2-chinese-cluecorpussmall) model with PatrickStar, the loss and accuracy curve is shown below:

![CLUE-GPT2](./doc/clue-gpt2-loss-n-acc.png)
Expand Down
Binary file added doc/m_node_superpod.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified doc/one_node_perf_a100.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion examples/run_transformers.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ export MEM_PROF=${MEM_PROF:-0}
# asyn memory monitor for mem sampler
export AMM=${AMM:-1}
# mem saving comm
export MSC=${MSC:-0}
export MSC=${MSC:-1}
# mem caching comm
export CACHE=${CACHE:-1}
# async move
Expand Down

0 comments on commit a5a8e6a

Please sign in to comment.