diff --git a/docs/examples/use_cases/pytorch/efficientnet/readme.rst b/docs/examples/use_cases/pytorch/efficientnet/readme.rst index 57493b47f1..77c4f00474 100644 --- a/docs/examples/use_cases/pytorch/efficientnet/readme.rst +++ b/docs/examples/use_cases/pytorch/efficientnet/readme.rst @@ -89,11 +89,27 @@ You may need to adjust ``--batch-size`` parameter for your machine. You can change the data loader and automatic augmentation scheme that are used by adding: -* ``--data-backend``: ``dali`` | ``pytorch`` | ``synthetic``, +* ``--data-backend``: ``dali`` | ``dali_proxy`` | ``pytorch`` | ``synthetic``, * ``--automatic-augmentation``: ``disabled`` | ``autoaugment`` | ``trivialaugment`` (the last one only for DALI), * ``--dali-device``: ``cpu`` | ``gpu`` (only for DALI). -By default DALI GPU-variant with AutoAugment is used. +By default DALI GPU-variant with AutoAugment is used (``dali`` and ``dali_proxy`` backends). + +Data Backends +------------- + +- **dali**: + Leverages a DALI pipeline along with DALI's PyTorch iterator for data loading, preprocessing, and augmentation. + +- **dali_proxy**: + Uses a DALI pipeline for preprocessing and augmentation while relying on PyTorch's data loader. DALI Proxy facilitates the transfer of data to DALI for processing. + See :ref:`pytorch_dali_proxy`. + +- **pytorch**: + Employs the native PyTorch data loader for data preprocessing and augmentation. + +- **synthetic**: + Creates synthetic data on the fly, which is useful for testing and benchmarking purposes. This backend eliminates the need for actual datasets, providing a convenient way to simulate data loading. For example to run the EfficientNet with AMP on a batch size of 128 with DALI using TrivialAugment you need to invoke: @@ -161,6 +177,20 @@ To run training benchmarks with different data loaders and automatic augmentatio --workspace $RESULT_WORKSPACE --report-file bench_report_dali_ta.json $PATH_TO_IMAGENET + # DALI proxy with AutoAugment + python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 + --batch-size 128 --epochs 4 --no-checkpoints --training-only + --data-backend dali_proxy --automatic-augmentation autoaugment + --workspace $RESULT_WORKSPACE + --report-file bench_report_dali_proxy_aa.json $PATH_TO_IMAGENET + + # DALI proxy with TrivialAugment + python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 + --batch-size 128 --epochs 4 --no-checkpoints --training-only + --data-backend dali_proxy --automatic-augmentation trivialaugment + --workspace $RESULT_WORKSPACE + --report-file bench_report_dali_proxy_ta.json $PATH_TO_IMAGENET + # PyTorch without automatic augmentations python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 --batch-size 128 --epochs 4 --no-checkpoints --training-only diff --git a/docs/examples/use_cases/pytorch/resnet50/main.py b/docs/examples/use_cases/pytorch/resnet50/main.py index 34837c9de7..8046abf928 100644 --- a/docs/examples/use_cases/pytorch/resnet50/main.py +++ b/docs/examples/use_cases/pytorch/resnet50/main.py @@ -93,12 +93,14 @@ def parse(): '"dali" for DALI data loader, or "dali_proxy" for PyTorch dataloader with DALI proxy preprocessing.') parser.add_argument('--prof', default=-1, type=int, help='Only run 10 iterations for profiling.') - parser.add_argument('--deterministic', action='store_true') - + parser.add_argument('--deterministic', action='store_true', + help='Enable deterministic behavior for reproducibility') parser.add_argument('--fp16-mode', default=False, action='store_true', help='Enable half precision mode.') - parser.add_argument('--loss-scale', type=float, default=1) - parser.add_argument('--channels-last', type=bool, default=False) + parser.add_argument('--loss-scale', type=float, default=1, + help='Scaling factor for loss to prevent underflow in FP16 mode.') + parser.add_argument('--channels-last', type=bool, default=False, + help='Use channels last memory format for tensors.') parser.add_argument('-t', '--test', action='store_true', help='Launch test mode with preset arguments') args = parser.parse_args() diff --git a/docs/examples/use_cases/pytorch/resnet50/pytorch-resnet50.rst b/docs/examples/use_cases/pytorch/resnet50/pytorch-resnet50.rst index 720a0d2d33..79ebf5e858 100644 --- a/docs/examples/use_cases/pytorch/resnet50/pytorch-resnet50.rst +++ b/docs/examples/use_cases/pytorch/resnet50/pytorch-resnet50.rst @@ -44,39 +44,69 @@ The default learning rate schedule starts at 0.1 and decays by a factor of 10 ev python main.py -a alexnet --lr 0.01 [imagenet-folder with train and val folders] +Data loaders +------------ + +- **dali**: + Leverages a DALI pipeline along with DALI's PyTorch iterator for data loading, preprocessing, and augmentation. + +- **dali_proxy**: + Uses a DALI pipeline for preprocessing and augmentation while relying on PyTorch's data loader. DALI Proxy facilitates the transfer of data to DALI for processing. + See :ref:`pytorch_dali_proxy`. + +- **pytorch**: + Employs the native PyTorch data loader for data preprocessing and augmentation. + Usage ----- .. code-block:: bash - - main.py [-h] [--arch ARCH] [-j N] [--epochs N] [--start-epoch N] [-b N] [--lr LR] [--momentum M] [--weight-decay W] [--print-freq N] [--resume PATH] [-e] [--pretrained] [--opt-level] DIR - - PyTorch ImageNet Training - - positional arguments: - DIR path(s) to dataset (if one path is provided, it is assumed to have subdirectories named "train" and "val"; alternatively, train and val paths can be specified directly by providing both paths as arguments) - - optional arguments (for the full list please check `Apex ImageNet example - `_) - -h, --help show this help message and exit - --arch ARCH, -a ARCH model architecture: alexnet | resnet | resnet101 - | resnet152 | resnet18 | resnet34 | resnet50 | vgg - | vgg11 | vgg11_bn | vgg13 | vgg13_bn | vgg16 - | vgg16_bn | vgg19 | vgg19_bn (default: resnet18) - -j N, --workers N number of data loading workers (default: 4) - --epochs N number of total epochs to run - --start-epoch N manual epoch number (useful on restarts) - -b N, --batch-size N mini-batch size (default: 256) - --lr LR, --learning-rate LR initial learning rate - --momentum M momentum - --weight-decay W, --wd W weight decay (default: 1e-4) - --print-freq N, -p N print frequency (default: 10) - --resume PATH path to latest checkpoint (default: none) - -e, --evaluate evaluate model on validation set - --pretrained use pre-trained model - --dali_cpu use CPU based pipeline for DALI, for heavy GPU - networks it may work better, for IO bottlenecked - one like RN18 GPU default should be faster - --data_loader Select data loader: "pytorch" for native PyTorch data loader, - "dali" for DALI data loader, or "dali_proxy" for PyTorch dataloader with DALI proxy preprocessing. - --fp16-mode enables mixed precision mode + main.py [-h] [--arch ARCH] [-j N] [--epochs N] [--start-epoch N] [-b N] [--lr LR] [--momentum M] [--weight-decay W] [--print-freq N] [--resume PATH] + [-e] [--pretrained] [--dali_cpu] [--data_loader {pytorch,dali,dali_proxy}] [--prof PROF] [--deterministic] [--fp16-mode] + [--loss-scale LOSS_SCALE] [--channels-last CHANNELS_LAST] [-t] + [DIR ...] + + PyTorch ImageNet Training + + positional arguments: + DIR path(s) to dataset (if one path is provided, it is assumed to have subdirectories named "train" and "val"; alternatively, train and val paths can + be specified directly by providing both paths as arguments) + + options: + -h, --help show this help message and exit + --arch ARCH, -a ARCH model architecture: alexnet | convnext_base | convnext_large | convnext_small | convnext_tiny | densenet121 | densenet161 | densenet169 | + densenet201 | efficientnet_b0 | efficientnet_b1 | efficientnet_b2 | efficientnet_b3 | efficientnet_b4 | efficientnet_b5 | efficientnet_b6 | + efficientnet_b7 | efficientnet_v2_l | efficientnet_v2_m | efficientnet_v2_s | get_model | get_model_builder | get_model_weights | get_weight | + googlenet | inception_v3 | list_models | maxvit_t | mnasnet0_5 | mnasnet0_75 | mnasnet1_0 | mnasnet1_3 | mobilenet_v2 | mobilenet_v3_large | + mobilenet_v3_small | regnet_x_16gf | regnet_x_1_6gf | regnet_x_32gf | regnet_x_3_2gf | regnet_x_400mf | regnet_x_800mf | regnet_x_8gf | + regnet_y_128gf | regnet_y_16gf | regnet_y_1_6gf | regnet_y_32gf | regnet_y_3_2gf | regnet_y_400mf | regnet_y_800mf | regnet_y_8gf | resnet101 | + resnet152 | resnet18 | resnet34 | resnet50 | resnext101_32x8d | resnext101_64x4d | resnext50_32x4d | shufflenet_v2_x0_5 | shufflenet_v2_x1_0 | + shufflenet_v2_x1_5 | shufflenet_v2_x2_0 | squeezenet1_0 | squeezenet1_1 | swin_b | swin_s | swin_t | swin_v2_b | swin_v2_s | swin_v2_t | vgg11 | + vgg11_bn | vgg13 | vgg13_bn | vgg16 | vgg16_bn | vgg19 | vgg19_bn | vit_b_16 | vit_b_32 | vit_h_14 | vit_l_16 | vit_l_32 | wide_resnet101_2 | + wide_resnet50_2 (default: resnet18) + -j N, --workers N number of data loading workers (default: 4) + --epochs N number of total epochs to run + --start-epoch N manual epoch number (useful on restarts) + -b N, --batch-size N mini-batch size per process (default: 256) + --lr LR, --learning-rate LR + Initial learning rate. Will be scaled by /256: args.lr = args.lr*float(args.batch_size*args.world_size)/256. A warmup schedule + will also be applied over the first 5 epochs. + --momentum M momentum + --weight-decay W, --wd W + weight decay (default: 1e-4) + --print-freq N, -p N print frequency (default: 10) + --resume PATH path to latest checkpoint (default: none) + -e, --evaluate evaluate model on validation set + --pretrained use pre-trained model + --dali_cpu Runs CPU based version of DALI pipeline. + --data_loader {pytorch,dali,dali_proxy} + Select data loader: "pytorch" for native PyTorch data loader, "dali" for DALI data loader, or "dali_proxy" for PyTorch dataloader with DALI proxy + preprocessing. + --prof PROF Only run 10 iterations for profiling. + --deterministic Enable deterministic behavior for reproducibility + --fp16-mode Enable half precision mode. + --loss-scale LOSS_SCALE + Scaling factor for loss to prevent underflow in FP16 mode. + --channels-last CHANNELS_LAST + Use channels last memory format for tensors. + -t, --test Launch test mode with preset arguments diff --git a/docs/plugins/pytorch_dali_proxy.rst b/docs/plugins/pytorch_dali_proxy.rst index bbe9a6c169..1240878c90 100644 --- a/docs/plugins/pytorch_dali_proxy.rst +++ b/docs/plugins/pytorch_dali_proxy.rst @@ -1,3 +1,4 @@ +.. _pytorch_dali_proxy: PyTorch DALI Proxy ==================