huggingface_bert.py
is a fine-tuning Huggingface example with Patrickstar. Could you compare it with the official Huggingface example to know how to apply PatrickStar to existed projects.
Before running the example, you need to prepare the data:
wget http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
tar -xf aclImdb_v1.tar.gz
And change the directory used in get_dataset()
. After these, you are ready to go:
python huggingface_bert.py
run_transformers.sh
and pretrain_demo.py
is an example to train large PTMs with PatrickStar. You could run different size of model by adding config torun_transformers.sh
.
The following command will run a model with 4B params:
env MODEL_NAME=GPT2_4B RES_CHECK=0 DIST_PLAN="patrickstar" bash run_transformers.sh
For the available MODEL_NAME
, please check pretrain_demo.py
.
Check the accuracy of PatrickStar with Bert:
bash RES_CHECK=1 run_transformers.sh
PatrickStar also support training MoE models. In the examples/moe
directory, run:
python -m torch.distributed.launch --nproc_per_node=4 huggingface_bert_moe.py
Note that you need to install FastMoE before running this example.
Chunk size (CS) is an important hyperparameter for patrickstar. Although you can set an CS value empirically by run your training task serveral times. We provide an systemic way to find a CS with less memory footprint. Using the following command to search the chunk size.
env CS_SEARCH=1 bash run_transformers.sh