[CK_TILE] B matrix 2D block scale gemm #3074

samremes · 2025-10-22T11:13:30Z

Proposed changes

Introduces 2d block scale support for B matrix (grouping both on N and K axes). The tile distribution for the scale matrix has different options depending on the group size.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

…d_blockscale

illsilin · 2025-10-28T17:18:35Z

Hi @samremes, could you please resolve the merge conflicts?

…d_blockscale

CongMa13 · 2025-10-24T18:04:01Z

test/ck_tile/gemm_block_scale/test_gemm_quant_typed.cpp

+    std::tuple<RowMajor, ColumnMajor, RowMajor, BF8, PkInt4, BF8,   Half, BQuantGrouped, GemmConfigBase, GroupSize>,
+
+    std::tuple<RowMajor, ColumnMajor, RowMajor, FP8, FP8,    float, Half, BQuantGrouped, GemmConfigBase, GroupSize64>,
+    std::tuple<RowMajor, ColumnMajor, RowMajor, BF8, BF8,    float, Half, BQuantGrouped, GemmConfigBase, GroupSize64>,
+    std::tuple<RowMajor, ColumnMajor, RowMajor, FP8, PkInt4, FP8,   Half, BQuantGrouped, GemmConfigBase, GroupSize64>,
+    std::tuple<RowMajor, ColumnMajor, RowMajor, BF8, PkInt4, BF8,   Half, BQuantGrouped, GemmConfigBase, GroupSize64>,
+
+    // 2d cases with grouping also on the n axis
+    std::tuple<RowMajor, ColumnMajor, RowMajor, FP8, FP8,    float, Half, BQuantGrouped, GemmConfigBase, GroupSize2D>,
+    std::tuple<RowMajor, ColumnMajor, RowMajor, BF8, BF8,    float, Half, BQuantGrouped, GemmConfigBase, GroupSize2D>,
+    std::tuple<RowMajor, ColumnMajor, RowMajor, FP8, PkInt4, FP8,   Half, BQuantGrouped, GemmConfigBase, GroupSize2D>,
+    std::tuple<RowMajor, ColumnMajor, RowMajor, BF8, PkInt4, BF8,   Half, BQuantGrouped, GemmConfigBase, GroupSize2D>


It is awesome to have these unit tests 👍

test/ck_tile/gemm_block_scale/test_gemm_quant_typed.cpp

ThomasNing · 2025-10-29T03:40:44Z

example/ck_tile/38_block_scale_gemm/gemm_quant_basic.cpp


    std::string quant_mode = arg_parser.get_str("quant_mode");

+    using QuantGroupSize = ck_tile::QuantGroupShape<ck_tile::sequence<1, 1, 128>>;


Could we make the Quant Group Size as an interface? Currently, we need to manually put the quant dim size.

ThomasNing · 2025-10-29T03:42:37Z

@CongMa13 Please try the solution we discussed of the tile distribution today and see the perf difference.

Co-authored-by: Copilot <[email protected]>

samremes · 2025-10-29T17:36:37Z

@ThomasNing @CongMa13 Did you have some ideas for the tile distribution? I think the current versions require that it exactly splits with NWarps and/or NIterPerWarp.

CongMa13 · 2025-10-31T03:13:29Z

I updated the distribution and calculation of the offset of bq.

There are 3 kinds of distribution according to the N group size.

N group size < warp::N
One warp needs multiple bq
N group size <= warp::N * NWarp
Warp group needs multiple bq
other
Multiple NIters share one bq

Tests with N group size {1, 8, 16, 32, 64, 128} passed.

I provided wrong statement in one comment that N group size should be greater than Warp::N. Obviously, 1 and 8 are all legal value of N group size.

samremes · 2025-10-31T20:15:52Z

Thanks a lot @CongMa13!

I've added an example for the 2d block scale, separately as the dispatching was getting a bit complex with non-preshuffle and other quants. We can maybe merge them again once every variant supports 2d blocks too.

ThomasNing · 2025-10-31T23:28:08Z

Reformat the example. It should be good now. If we need we could separate the example out again.

Review addressed

samremes added 14 commits October 13, 2025 14:05

Refactor quant group size to be configurable for M/N/K, not just K

8bb5255

add some asserts for configurations not implemented

98365f5

start setting of group size for N dimension

f6b07dc

enable 2d for reference quant gemm

22362f2

WIP: trying to figure out tile dstr and/or indexing for scale matrix

9988a46

WIP

36b88c6

Fix handling of n dim blocks in tile windows etc

bb52cd9

remove commented code and enable all tests again

f179a8a

fix formatting

d100ab6

Add more specialized tile distributions

37738e4

Enable NWarps replication for bquant tile dstr

98deefa

fix formatting

2d86cd0

Merge remote-tracking branch 'origin/develop' into samremes/bmatrix_2…

470d6e4

…d_blockscale

fix format

1f13003

samremes marked this pull request as ready for review October 27, 2025 15:24

samremes requested review from ThomasNing, afagaj, andriy-ca, aosewski, aska-0096, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz and tenpercent as code owners October 27, 2025 15:24

samremes added 4 commits October 28, 2025 17:49

Merge remote-tracking branch 'origin/develop' into samremes/bmatrix_2…

a449728

…d_blockscale

Fix some issues from the merge

e12ab56

fix formatting

7c93551

one more fix to tile dstr, and revert debug initialization

e1475d4

CongMa13 reviewed Oct 28, 2025

View reviewed changes

CongMa13 approved these changes Oct 29, 2025

View reviewed changes

CongMa13 previously requested changes Oct 29, 2025

View reviewed changes

test/ck_tile/gemm_block_scale/test_gemm_quant_typed.cpp Outdated Show resolved Hide resolved

test/ck_tile/gemm_block_scale/test_gemm_quant_typed.cpp Show resolved Hide resolved

ThomasNing requested changes Oct 29, 2025

View reviewed changes

samremes and others added 3 commits October 29, 2025 11:19

Remove commented code

5e0a356

Co-authored-by: Copilot <[email protected]>

simplify conditions that are needed for tile distributions

1290b1b

only enable the working group sizes in tests

306e25a

samremes and others added 2 commits October 30, 2025 08:48

fix formatting

68e41da

Update tile distribution for 2D bquant

bcccafe

add some documentation and 2d block scale example

fe92102

samremes and others added 2 commits October 31, 2025 20:16

fix formatting

6f90564

Add in Changlog and restructure the quant 2d example

89be44d

ThomasNing requested review from a team and ddembeckAMD as code owners October 31, 2025 23:22

solve the merge conflict

346ee26

ThomasNing previously approved these changes Oct 31, 2025

View reviewed changes

fix CMake

6b4b6fb

ThomasNing dismissed their stale review via 6b4b6fb November 2, 2025 01:49

support the change for blockscale 2d

c494b23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CK_TILE] B matrix 2D block scale gemm #3074

[CK_TILE] B matrix 2D block scale gemm #3074

samremes commented Oct 22, 2025 •

edited

Loading

Uh oh!

illsilin commented Oct 28, 2025

Uh oh!

CongMa13 Oct 24, 2025

Uh oh!

Uh oh!

Uh oh!

ThomasNing Oct 29, 2025

Uh oh!

ThomasNing commented Oct 29, 2025

Uh oh!

samremes commented Oct 29, 2025

Uh oh!

CongMa13 commented Oct 31, 2025

Uh oh!

samremes commented Oct 31, 2025

Uh oh!

ThomasNing commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants


		std::string quant_mode = arg_parser.get_str("quant_mode");

		using QuantGroupSize = ck_tile::QuantGroupShape<ck_tile::sequence<1, 1, 128>>;

[CK_TILE] B matrix 2D block scale gemm #3074

Are you sure you want to change the base?

[CK_TILE] B matrix 2D block scale gemm #3074

Conversation

samremes commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Discussion

Uh oh!

illsilin commented Oct 28, 2025

Uh oh!

CongMa13 Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ThomasNing Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

ThomasNing commented Oct 29, 2025

Uh oh!

samremes commented Oct 29, 2025

Uh oh!

CongMa13 commented Oct 31, 2025

Uh oh!

samremes commented Oct 31, 2025

Uh oh!

ThomasNing commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

samremes commented Oct 22, 2025 •

edited

Loading