Welcome to the art and science of optimizing neural networks at scale! In this workshop you'll get hands-on experience working with our high performance distributed training libraries to achieve the best performance on AWS.
Today you'll walk through two hands-on labs. The first one focuses on data parallelism, and the second one is about model parallelism.
This lab is self-contained. All of the content you need is produced by the notebooks themselves or included in the directory. However, if you are in an AWS-led workshop you will most likely use the Event Engine to manage your AWS account.
If not, please make sure you have an AWS account with a SageMaker Studio domain created. In this account please request a service limit increase for the ml.g4dn.12xlarge
instance type within SageMaker training.
If you're interested in learning more about distributed training on Amazon SageMaker, here are some helpful links in your journey.
- Preparing data for distributed training. This blog post introduces different modes of working with data on SageMaker training.
- Distributing tabular data. This example notebook uses a built-in algorithim,
TabTransformer
, to provide state of the art transformer neural networks for tabular data.TabTrasnformer
runs on multiple CPU-based instances. - SageMaker Training Compiler. This feature enables faster training on smaller cluster sizes, decreasing the overall job time by as much as 50%. Find example notebooks for Hugging Face and TensorFlow models here, including GPT2, BERT, and VisionTransformer. Training compiler is also common in hyperparameter tuning, and can be helpful in finding the right batch size.
- Hyperparameter tuning. You can use SageMaker hyperparamter tuning, including our Syne Tune project, to find the right hyperparameters for your model, including learning rate, number of epochs, overall model size, batch size, and anything else you like. Syne Tune offers multi-objective search.
- Hosting distributed models with DeepSpeed on SageMaker In this example notebook we demonstrate using SageMaker hosting to deploy a GPT-J model using DeepSpeed.
- Shell scripts as SageMaker entrypoint. Want to bring a shell script so you can add any extra modifications or non-pip installable packages? Or use a wheel? No problem. This link shows you how to use a bash script to run your program on SageMaker Training.
Some relevant papers for your reference:
- SageMaker Data Parallel, aka Herring. In this paper we introduce a custom high performance computing configuration for distributed gradient descent on AWS, available within Amazon SageMaker Training.
- SageMaker Model Parallel. In this paper we propose a model parallelism framework available within Amazon SageMaker Training to reduce memory errors and enable training GPT-3 sized models and more! See our case study achieving 32 samples / second with 175B parameters on SageMaker over 140 p4d nodes.
- Amazon Search speeds up training by 7.3x on SageMaker. In this blog post we introduce two new features on Amazon SageMaker: support for native PyTorch DDP and PyTorch Lightning integration with SM DDP. We also discuss how Amazon Search sped up their overall training time by 7.3x by moving to distributed training.