Skip to content

Commit

Permalink
fixed fused_rope naming in JIT + added readme for amd support (#1224)
Browse files Browse the repository at this point in the history
  • Loading branch information
R0n12 committed May 21, 2024
1 parent 2746d43 commit 1d55708
Show file tree
Hide file tree
Showing 3 changed files with 14 additions and 7 deletions.
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,6 @@ To install the remaining basic dependencies, run:
pip install -r requirements/requirements.txt
pip install -r requirements/requirements-wandb.txt # optional, if logging using WandB
pip install -r requirements/requirements-tensorboard.txt # optional, if logging via tensorboard
python ./megatron/fused_kernels/setup.py install # optional, if using fused kernels
```

from the repository root.
Expand All @@ -106,6 +105,16 @@ from the repository root.
</aside>

### Fused Kernels
We now support AMD GPUs (MI100, MI250X) through JIT fused-kernel compilation. Fused kernels will be built and loaded as needed. To avoid waiting during job launching, you can also do the following for manual pre-build:

```python
python
from megatron.fused_kernels import load
load()
```
This will automatically adapts building process over different GPU vendors (AMD, NVIDIA) without platform specific code changes. To further test fused kernels using `pytest`, use `pytest tests/model/test_fused_kernels.py`

### Flash Attention

To use [Flash-Attention](https://github.com/HazyResearch/flash-attention), install the additional dependencies in `./requirements/requirements-flashattention.txt` and set the attention type in your configuration accordingly (see [configs](./configs/)). This can provide significant speed-ups over regular attention on certain GPU architectures, including Ampere GPUs (such as A100s); see the repository for more details.
Expand Down
6 changes: 3 additions & 3 deletions megatron/fused_kernels/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,8 +135,8 @@ def _cpp_extention_load_helper(
srcpath / "fused_rotary_positional_embedding.cpp",
srcpath / "fused_rotary_positional_embedding_cuda.cu",
]
fused_rotary_positional_embedding_cuda = _cpp_extention_load_helper(
"fused_rotary_positional_embedding_cuda",
fused_rotary_positional_embedding = _cpp_extention_load_helper(
"fused_rotary_positional_embedding",
sources,
extra_cuda_flags,
extra_include_paths,
Expand Down Expand Up @@ -174,7 +174,7 @@ def load_fused_kernels():
print(e)
print("=" * 100)
print(
f"ERROR: Fused kernels configured but not properly installed. Please run `pip install {str(srcpath)}` to install them"
f"ERROR: Fused kernels configured but not properly installed. Please run `from megatron.fused_kernels import load()` then `load()` to load them correctly"
)
print("=" * 100)
exit()
Expand Down
4 changes: 1 addition & 3 deletions tests/model/test_fused_kernels.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,7 @@
)


@pytest.mark.xfail(
reason="ModuleNotFoundError: No module named 'scaled_masked_softmax_cuda'"
)
@pytest.mark.xfail(reason="SystemExit: None")
def test_load_fused_kernels():
load()
try:
Expand Down

0 comments on commit 1d55708

Please sign in to comment.