Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Latest commit

 

History

History

Horovod

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Horovod

Horovod is a distributed training framework for TensorFlow. The goal of Horovod is to make distributed Deep Learning fast and easy to use.

See official Horovod GitHub page.

This Horovod recipe contains information on how to run Horovod distributed training job for Tensorflow on a GPU cluster with Batch AI.

This Horovod recipe contains information on how to run Horovod distributed training job for PyTorch on a GPU cluster with Batch AI.

This Horovod-Infiniband-Benchmark recipe contains information on how to reproduce Horovod distributed training benchmarks with infiniband support using Batch AI.

Help or Feedback


If you have any problems or questions, you can reach the Batch AI team at [email protected] or you can create an issue on GitHub.

We also welcome your contributions of additional sample notebooks, scripts, or other examples of working with Batch AI.