Block scaled dot HLO op #22535
sergey-kozub
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Block scaled dot is an operation that accepts quantized inputs, then dequantizes and performs the dot op. (NVidia) Blackwell has the MMA unit that does this at the hardware level.
There are a few options how this could be implemented in XLA:
The question to discuss is, whether it's worth introducing a new HLO op (block-scaled-dot), or whether we could just continue using options (1) and/or (2) in the meanwhile.
Beta Was this translation helpful? Give feedback.
All reactions