Skip to content

Releases: lsds/KungFu

KungFu 0.2.1 Release

11 Jan 21:31
Compare
Choose a tag to compare

This release contains the following updates:

  • We have enabled different logging levels in KungFu. See PR #239 for details.
  • We have improved the scalability of KungFu in a Cloud environment where network bandwidth is often limited. This is benefited from building multiple aggregation/broadcast trees to utilise the bandwidth on all possible network paths. See PR #242 for details and the performance comparison with Horovod NCCL.

KungFu 0.2.0 release

10 Nov 04:44
cbe2770
Compare
Choose a tag to compare

Release notes

The KungFu team has been receiving many valuable feedbacks from the SOSP audience and early industry users. We have tried our best to integrate their feedback to improve the usability of KungFu which is the focus of the 0.2.0 release. The following are the main novel features of this release:

New framework support

KungFu supports TensorFlow 1/2, TensorLayer 1/2, and Keras. This covers most of the models trained with TensorFlow. We have released examples that show how to use KungFu within various TensorFlow programs. Check here.

New advanced examples

KungFu provides many advanced examples that show how to enable KungFu within complex AI models including:

  1. Google BERT
  2. Generative Adversarial Learning (CycleGAN)
  3. Reinforcement learning (Alpha Zero)
  4. ResNet and many useful DNNs for ImageNet
  5. Pose estimation network (OpenPose)

New distributed optimiser

We release a new distributed optimiser SynchronousAveragingOptimizer. This optimiser tries to preserve the property of small-batch training when adopting many parallel workers, making it a useful option for AI models that are restricted to train with small batch sizes. Check here for more details.

Better performance

We have greatly improved the performance of asynchronous training.

KungFu 0.1.0 pre-release

26 Oct 17:02
09f6d7e
Compare
Choose a tag to compare
Pre-release

This is the first release of KungFu.

This release contains two features:

  • SynchronousSGDOptimizer: This optimiser implements the classical Synchronous SGD algorithm for distributed training.
  • PairAeveragingOptimizer: This optimiser implements communication-efficient asynchronous training while reaching the same evaluation accuracy as the S-SGD.

We have tested and deployed these optimisers in a cloud testbed and a production cluster. Check out their performance in the Benchmark section in README.