-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Labels
good first issueGood for newcomersGood for newcomers
Description
Array reductions can be represented as a two-stage pipeline built on top of matrix-vector multiplications, where the vector is made of all ones.
Let's say our hardware supports fast 16 by 16 matrix multiplications with a single instruction. We can reshape the input array of length
In reality, we can't user Intel AMX with float32
inputs, but we can use Arm SME, and later apply similar techniques to SimSIMD.
lin72h
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomers