Flash Attention for Neuron #939

apoorvtintin · 2025-01-21T23:15:13Z

This PR adds support for flash attention kernel for Neuron implemented through Neuron Kernel Interface (NKI).

The flash attention kernel works with TRN1 and TRN2.

This PR is a newer version of #883 from a different fork. All comments from the previous PR are addressed in this one. It has dropout support.

Dropout and Segment ID support in the flash attention kernel is in progress and will be available at a later date.

kelvin-zou

Maybe wait until this PR is checked in. From what i can tell, your PR also has the remat bug not fixed. #942 (review)

axlearn/common/flash_attention/neuron_attention.py

axlearn/common/flash_attention/utils.py

ruomingp · 2025-01-22T15:03:22Z

axlearn/common/flash_attention/neuron_attention.py

+
+
+def _mha_forward(query, key, value, bias, causal, softmax_scale, dropout_rate):
+    # Get the batch size, sequence lengths, number of heads, and hidden dimension


Nit: end comments with . (here and everywhere)

axlearn/common/flash_attention/neuron_attention.py

kelvin-zou · 2025-01-23T02:38:20Z

axlearn/common/flash_attention/neuron_attention.py

+    key: Tensor,
+    value: Tensor,
+    bias: Tensor,
+    causal: bool = False,


Can we support segment ID? Or a more general masking fn (with optimized handling) is even better.

If not, I am fine with leaving a TODO here, but it is a hard blocker for enabling it for our internal training.

Can we do segment IDs in a separate PR? That involves non-trivial work and needs some time.

Sure, in this regard, I may ask for more, let's do general mask then, since we have want things beyond causal.

apoorvtintin · 2025-01-23T02:39:49Z

Thanks for all the reviews @ruomingp @kelvin-zou. I resolved all the comments, please let me know if any more changes are needed.

axlearn/common/flash_attention/neuron_attention.py

apivovarov · 2025-01-23T21:20:37Z

axlearn/common/flash_attention/neuron_attention.py

+    seed = jnp.array([1])
+
+    # Call the NKI kernel, duplicate the kernel if we cannot shard on num_heads.
+    if (num_heads % 2) == 0 and (num_heads // 2 > 0):


# even num_heads except 0 if num_heads > 0 and num_heads % 2 == 0:

apivovarov · 2025-01-23T21:26:26Z

axlearn/common/flash_attention/neuron_attention_test.py

+    input_dtype: jnp.dtype,
+    attention_bias_type: bool,
+):
+    softmax_scale = 1.0 / (per_head_dim**0.5)


Maybe just

per_head_dim**-0.5

apoorvtintin · 2025-01-24T22:02:09Z

I rebased the PR to avoid merge conflicts, can I please get a new approval? Thank you!

markblee · 2025-01-28T04:17:20Z

axlearn/common/flash_attention/neuron_attention_test.py

+from axlearn.common.flash_attention.utils import mha_reference
+
+if jax.default_backend() != "neuron":
+    pytestmark = pytest.mark.skip(reason="Incompatible hardware, AWS Neuron only test.")


Looks like a number of CI steps are failing -- I think we can either do something like

if jax.default_backend() != "neuron": pytest.skip(reason=..., allow_module_level=True)

or update run_tests.sh to exclude tests marked with neuron.

kelvin-zou

@apoorvtintin I see quite a few unit tests failed, can you take a look?

apoorvtintin requested review from ruomingp, markblee and a team as code owners January 21, 2025 23:15

apoorvtintin mentioned this pull request Jan 21, 2025

Flash Attention for Neuron #883

Closed

kelvin-zou reviewed Jan 22, 2025

View reviewed changes

axlearn/common/flash_attention/neuron_attention.py Show resolved Hide resolved

axlearn/common/flash_attention/neuron_attention.py Show resolved Hide resolved

ruomingp reviewed Jan 22, 2025

View reviewed changes

apoorvtintin force-pushed the mainline_upstream_fa branch 4 times, most recently from 8a92182 to 73a2808 Compare January 23, 2025 02:32

kelvin-zou reviewed Jan 23, 2025

View reviewed changes

apoorvtintin requested review from ruomingp and kelvin-zou January 23, 2025 02:39

ruomingp approved these changes Jan 23, 2025

View reviewed changes

axlearn/common/flash_attention/neuron_attention.py Show resolved Hide resolved

apivovarov reviewed Jan 23, 2025

View reviewed changes

apoorvtintin force-pushed the mainline_upstream_fa branch from 73a2808 to c226d03 Compare January 24, 2025 22:00

kelvin-zou approved these changes Jan 24, 2025

View reviewed changes

markblee reviewed Jan 28, 2025

View reviewed changes

kelvin-zou reviewed Jan 28, 2025

View reviewed changes

apoorvtintin force-pushed the mainline_upstream_fa branch from c226d03 to 42720ad Compare January 30, 2025 16:10

Flash Attention for Neuron

f7f06fd

apoorvtintin force-pushed the mainline_upstream_fa branch from 42720ad to f7f06fd Compare January 30, 2025 16:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flash Attention for Neuron #939

Flash Attention for Neuron #939

apoorvtintin commented Jan 21, 2025 •

edited

Loading

kelvin-zou left a comment

ruomingp Jan 22, 2025

kelvin-zou Jan 23, 2025

kelvin-zou Jan 23, 2025

indhub Jan 23, 2025

kelvin-zou Jan 24, 2025

apoorvtintin commented Jan 23, 2025

apivovarov Jan 23, 2025 •

edited

Loading

apivovarov Jan 23, 2025

apoorvtintin commented Jan 24, 2025

markblee Jan 28, 2025

kelvin-zou left a comment



		def _mha_forward(query, key, value, bias, causal, softmax_scale, dropout_rate):
		# Get the batch size, sequence lengths, number of heads, and hidden dimension

Flash Attention for Neuron #939

Are you sure you want to change the base?

Flash Attention for Neuron #939

Conversation

apoorvtintin commented Jan 21, 2025 • edited Loading

kelvin-zou left a comment

Choose a reason for hiding this comment

ruomingp Jan 22, 2025

Choose a reason for hiding this comment

kelvin-zou Jan 23, 2025

Choose a reason for hiding this comment

kelvin-zou Jan 23, 2025

Choose a reason for hiding this comment

indhub Jan 23, 2025

Choose a reason for hiding this comment

kelvin-zou Jan 24, 2025

Choose a reason for hiding this comment

apoorvtintin commented Jan 23, 2025

apivovarov Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

apivovarov Jan 23, 2025

Choose a reason for hiding this comment

apoorvtintin commented Jan 24, 2025

markblee Jan 28, 2025

Choose a reason for hiding this comment

kelvin-zou left a comment

Choose a reason for hiding this comment

apoorvtintin commented Jan 21, 2025 •

edited

Loading

apivovarov Jan 23, 2025 •

edited

Loading