fast-ops is a personal project library containing efficient PyTorch operators, usually targeting (NVIDIA) GPUs.
Generally, we focus on operators that aren't already implemented in other high-performance operator libraries, unless we feel we can beat them on performance, features, or usability. Some other places you can go "shopping" for operators are:
- NVIDIA Apex
- Facebook xFormers
- ByteDance LightSeq
- FlashAttention - There's lots of other optimized operators in there other than FlashAttention.
- bitsandbytes - Various operations related to low precision (8-bit) training and inference.
-
(Flash) Multi-Head Attention:
Algorithm from FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness.
Significantly faster than vanilla attention due to fused implementation, and also
has
$O(n)$ rather than$O(n^2)$ memory complexity. - (Fused) Lion Optimizer: Optimizer described in Symbolic Discovery of Optimization Algorithms. Claims some improved convergence properties and optimizer states only consists of half precision momentum, meaning pretty large memory savings over commonly used optimizers like AdamW.
This project's Python dependencies are managed with Poetry.
You can install dependencies (or subsets for development and testing) using:
> poetry install --no-root
this will create a new virtual environment all dependencies installed, which can be activated using
> source $(poetry env info --path)/bin/activate
Some test files support using pytest-xdist
to parallelize tests across GPUs. After installing it
(you would have gotten it from poetry install
), you can run your tests like:
pytest -n 8
to utilize 8 devices. Sometimes you can get away with more workers than devices but other times you'll get OOMs.
We use Bear to generate the compile_commands.json file that is used for language servers. If you need to update this file you can run:
> bear python setup.py develop
or
> bear pytest
Run scripts/fmt
to reformat all project files using
clang-format and
black.