Skip to content

Developer Tips: Optimizing Your Vulcan Code

Caitrin Armstrong edited this page Apr 29, 2019 · 4 revisions

The DataLoader

The DataLoader, part of PyTorch's utils.data module, is an incredibly powerful tool for sampling your data. Rather than provide a dataset directly to Vulcan's .fit and other functions, providing a DataLoader allows you to specify batch sizes, samplers and more. There are several parameters related to optimizing your code to your dataset, batch size, and computation resource availability. We do not cover parameters related to sampling, although this too affects training speed, as more complex samplers take more time (although usually trivial).

  1. batch_size: how many samples to load. As the default is 1, you probably want to set this.
  2. num_workers: how many subprocesses to use for data loading. The default is 0, meaning that the data will be loaded in the main process. This best value for this is best identified through experimentation, although some on the PyTorch forums suggest 4*#GPUs. This value relates to the number of subprocesses on the CPU that prepare the data for loading on the GPU, ahead of use by the model (on the GPU). Watch your CPU resources to gauge what may be appropriate. Setting this value >0 isn't going to matter for small datasets, or for large batch sizes (nothing to prepare), and may in fact result in a slow down.
  3. pin_memory If set to True (default False) the data loader will copy tensors into pinned memory on the CPU (think of this as a staging area), before copying to the GPU. May result in a slowdown if you have small batch sizes. This copy process can then be performed asynchronously when we use .to on the data in _train_epoch, seeing as we have enabled non_blocking = True.

Other tips and tricks

  1. If you have a consistent input size (i.e. not variable sentence length or input images) AND no need for determinacy then you can set

torch.backends.cudnn.benchmark = True

This allows the inbuilt cudnn auto-tuner to go into benchmark mode and find the best algorithm for processing your data on your machine.

  1. If you have data in numpy that you will put on the GPU as a tensor, use from_numpy, as explained here - the two variables will then share memory, changes in one will be reflected in the other.