Releases: ROCm/aotriton
Releases · ROCm/aotriton
AOTriton 0.6 Beta
What's Changed
- Resolve cmake conflicts when adding aotriton into TE via add_subdirectory by @wangye805 in #23
- [mGPU] Run hipModuleLoadDataEx for each GPU device. by @xinyazhang in #24
- Adding mutex.h for TE pytorch extension compilation by @wangye805 in #26
- Refactor the build system by @xinyazhang in #29
New Contributors
- @wangye805 made their first contribution in #23
Full Changelog: 0.5b...0.6b
AOTriton 0.5 Beta
What's Changed
- Switch Tuning database to SQLite3 for Incremental Tuning
- Add matrix bias to forward/backward kernel
- Fix build failures due to missing
- Add new triton kernel debug_fill_dropout_rng to for debugging dropout
- Add FP32 support to fulfill the functionalities required by
torch.nn.attention.SDPBackend.EFFICIENT_ATTENTION
Notes about binary delivery
Starting from 0.5 Beta, we are not delivering binary form of AOTriton along with software releases due to software supply chain security considerations, for now.
Full Changelog: 0.4.1b...0.5b
AOTriton 0.4.1 Beta
Summary
This is an emergency fix for the build process. It delivers the same (in terms of functional and performance) as of 0.4b version. AOTriton users who are not seeking for building the library from source can keep using 0.4b binary release packages.
Changes
- Triton's
setup.py
downloads CUDA Packages during the build, but it does not always success. AOTriton does not need them for now, and hence they were commented out.
AOTriton 0.4 Beta
Summary
This is the first release which is considered sufficiently stable for production.
Features
- Implement Flash Attention v2 Algorithm on MI200/MI300
- Implemented most features required by PyTorch's mha_fwd and mha_bwd
- Missing feature:
window_size_left
andwindow_size_right
- API can be found at include/aotriton/flash.h
AOTriton GA Preview for Legal Scan
This release is created for legal review before releasing to the public.
Compiled on ROCM 6.0, Ubuntu 20.04, Python 3.9