Optimize Training Performance in PyTorch

This tutorial shows how to optimize training performance in PyTorch.

MFU (Model FLOPs Utilization) is a key metric to measure how efficiently a model utilizes the available computational resources during training. Higher MFU indicates better utilization of the hardware, leading to faster training times and improved performance.

$$ MFU = \frac{FLOPs_{actual}}{FLOPs_{max}} $$

In general, MFU > 0.5 is considered good, while MFU < 0.3 indicates that there is significant room for optimization.

In the following examples, we will use a ResNet-18 model as an example.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Training Performance in PyTorch

FilesExpand file tree

training_optim.md

Latest commit

History

training_optim.md

File metadata and controls

Optimize Training Performance in PyTorch