Kernel 13 implementation, even faster matmul with conflict-free registers -> smem storage without padding by Aladoro · Pull Request #15 · pranjalssh/fast.cu

Aladoro · 2026-02-15T16:59:46Z

Kernel 13 gets rid of the padding introduced in kernel 12, and applies swizzling to the C tile when doing the register to shared memory transfer without bank conflicts.

In my tests with 8192-dimensional inputs, with this change, kernel 13 gets to 823.5 flops from 817.8 of kernel 12 and 809.1 of kernel 11.

Kudos to @gordicaleksa for posting one of the best explanations of swizzling out there and making me aware of your awesome repo ;)

On a side note, one of your comments stated:

"// We use 3d tiling to load from GMEM to SMEM. 2d tiling only works for tiles <= 64 columns."

This is a bit imprecise. In your previous implementation, I believe 2d tiling would have worked as well. The actual reason why 3D tiling is used is precisely to support swizzling with the CU_TENSOR_MAP_SWIZZLE_128B layout. Otherwise, if the fastest dimensions of a tile are not [64, columns], the swizzling pattern would be suboptimal at reducing bank conflicts when loading from and storing to shared memory.

Please do not hesitate to let me know if you have any questions, and thanks for sharing this repo!

… without padding

Aladoro added 2 commits February 15, 2026 16:45

Kernel 13 implementation with conflict-free registers -> smem storage…

40b32b6

… without padding

Improve readability and add comments

112a548

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel 13 implementation, even faster matmul with conflict-free registers -> smem storage without padding#15

Kernel 13 implementation, even faster matmul with conflict-free registers -> smem storage without padding#15
Aladoro wants to merge 2 commits intopranjalssh:mainfrom
Aladoro:padding-free-bank-conflict-free

Aladoro commented Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Aladoro commented Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant