Commit f30202b
committed
Accelerate SVE128 SBGEMM/BGEMM
This accelerates SBGEMM/BGEMM by extending the existing 8x4 kernel to 8x8 (unrolling N by 8)
Not sure if it's a good idea to delete the previous 8x4 kernel?
Here are the speedups on single core Neoverse-V2 (SVE128) compared to prev state:
Per-shape speedup
M=N=K=64: SBGEMM 1.164x (16.42%), BGEMM 1.133x (13.30%)
M=N=K=128: SBGEMM 1.220x (22.02%), BGEMM 1.186x (18.56%)
M=N=K=256: SBGEMM 1.241x (24.08%), BGEMM 1.235x (23.54%)
M=N=K=512: SBGEMM 1.240x (23.95%), BGEMM 1.227x (22.75%)
M=N=K=1024: SBGEMM 1.251x (25.11%), BGEMM 1.232x (23.23%)
M=N=K=2048: SBGEMM 1.235x (23.47%), BGEMM 1.246x (24.64%)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>1 parent 18638c7 commit f30202b
File tree
5 files changed
+833
-7
lines changed- kernel/arm64
5 files changed
+833
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
272 | 272 | | |
273 | 273 | | |
274 | 274 | | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
191 | 191 | | |
192 | 192 | | |
193 | 193 | | |
| 194 | + | |
194 | 195 | | |
195 | 196 | | |
196 | | - | |
197 | | - | |
198 | 197 | | |
199 | 198 | | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
200 | 202 | | |
201 | 203 | | |
202 | 204 | | |
203 | 205 | | |
204 | 206 | | |
205 | 207 | | |
206 | 208 | | |
| 209 | + | |
207 | 210 | | |
208 | 211 | | |
209 | | - | |
210 | | - | |
211 | 212 | | |
212 | 213 | | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
213 | 217 | | |
214 | 218 | | |
215 | 219 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
0 commit comments