Skip to content

Releases: databricks/megablocks

v0.8.0

14 Mar 17:57
Compare
Choose a tag to compare

Breaking Changes

As a consequence of the torch 2.6.0 upgrade, sparse support is disabled for megablocks (meaning that only grouped support is available).

For additional context, torch 2.6.0 depends on triton 3.2.0, which introduced some change to how it handles dtype promotion when two binary operands have different dtypes, and as a result we're encountering an int16 overflow in the stk dependency of megablocks which results in an illegal memory access (IMA). Once this issue is resolved, we will release a new version of megablocks. View #168 for additional details.

What's Changed

New Contributors

Full Changelog: v0.7.0...v0.8.0

v0.7.0

20 Nov 00:44
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.6.1...v0.7.0

v0.6.1

31 Aug 14:49
Compare
Choose a tag to compare

What's New

Patch release to remove dependencies specified via github and instead use released versions through pypi (specifically, stanford-stk and grouped-gemm). This allows for releasing megablocks itself via pypi.

What's Changed

  • Remove direct dependencies, allowing for megablocks pypi release by @snarayan21 in #149

Full Changelog: v0.6.0...v0.6.1

v0.6.0

30 Aug 18:55
Compare
Choose a tag to compare

What's New

1. Torch 2.4 Compatibility (#145)

MegaBlocks now supports Torch 2.4!

2. New CI/CD

MegaBlocks has new Github Actions for better CI/CD! Now on every PR, MegaBlocks will automatically perform code linting and formatting (#131) and run tests on a GPU (#127).

3. Remove Weight Parallelism (#137)

Weight parallelism was not in use and so we removed it.

4. Shared Experts (#109)
Implement shared experts, based on the DeepSeekMoE paper.

Bug Fixes

  1. Better handle incompatible ffn sizes (#108)
  2. Fix AMP for memory optimized options (#111)
  3. Don't save moe lb-loss tensors (#119)

What's Changed

New Contributors

Full Changelog: v0.5.1...v0.6.0

v0.5.1

11 Jan 22:14
f05609c
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.5.0...v0.5.1

v0.5.0

08 Dec 16:51
0460181
Compare
Choose a tag to compare

What's New

Several improvements to avoid CPU <> GPU device synchronizations, GLU support, and support for some new models 👀

What's Changed

New Contributors

Full Changelog: v0.4.0...v0.5.0

v0.4.0

24 Oct 22:44
6a71b18
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.3.3...v0.4.0

v0.3.3

17 Oct 21:58
52aa1b2
Compare
Choose a tag to compare

What's Changed

  • Enable running MegaBlocks MoE without bias by @vchiley in #31

Full Changelog: v0.3.2...v0.3.3

v0.3.2

10 Oct 22:32
Compare
Choose a tag to compare

What's Changed

  • Support for bfloat16
  • Optimizations for top_k > 1
  • Support for fully-sharded data parallelism
  • Support tensor model parallelism when expert_parallel_world_size > num_experts
  • Optimizations for activation memory
  • Support activation quantization (thanks @dblalock!)
  • Optimizations for SM90 (Hopper)
  • Lots of bug fixes, cleanup and small optimizations

New Contributors

Full Changelog: v0.1...v0.3.2

Version 0.1

01 May 15:14
Compare
Choose a tag to compare
Version 0.1 Pre-release
Pre-release

Initial release documenting repository state prior to MLSys'23 camera-ready publication.