Skip to content

Commit 57d8d66

Browse files
authored
Merge pull request #3078 from spolifroni-amd/spolifroni-amd/cherry-pick-changhelog-changes
updated the changelog for 7.1 and beyond
2 parents 833ae1d + 6490427 commit 57d8d66

File tree

1 file changed

+38
-20
lines changed

1 file changed

+38
-20
lines changed

CHANGELOG.md

Lines changed: 38 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,40 @@
22

33
Documentation for Composable Kernel available at [https://rocm.docs.amd.com/projects/composable_kernel/en/latest/](https://rocm.docs.amd.com/projects/composable_kernel/en/latest/).
44

5+
## (Unreleased) Composable Kernel for ROCm
6+
7+
### Added
8+
9+
* Added a compute async pipeline in the CK TILE universal GEMM on gfx950
10+
* Added support for B Tensor type pk_int4_t in the CK TILE weight preshuffle GEMM.
11+
* Added the new api to load different memory sizes to SGPR.
12+
* Added support for B Tensor Preshuffle in CK TILE Grouped GEMM.
13+
* Added a basic copy kernel example and supporting documentation for new CK Tile developers.
14+
* Added support for grouped_gemm kernels to perform multi_d elementwise operation.
15+
* Added support for Multiple ABD GEMM
16+
* Added benchmarking support for tile engine GEMM Multi D.
17+
* Added block scaling support in CK_TILE GEMM, allowing flexible use of quantization matrices from either A or B operands.
18+
* Added the row-wise column-wise quantization for CK_TILE GEMM & CK_TILE Grouped GEMM.
19+
* Added support for f32 to FMHA (fwd/bwd).
20+
* Added tensor-wise quantization for CK_TILE GEMM.
21+
* Added support for batched contraction kernel.
22+
* Added pooling kernel in CK_TILE
23+
24+
### Changed
25+
26+
* Removed `BlockSize` in `make_kernel` and `CShuffleEpilogueProblem` to support Wave32 in CK_TILE (#2594)
27+
28+
## Composable Kernel 1.1.0 for ROCm 7.1.0
29+
30+
### Added
31+
32+
* Added support for hdim as a multiple of 32 for FMHA (fwd/fwd_splitkv/bwd)
33+
* Added support for elementwise kernel.
34+
35+
### Upcoming changes
36+
37+
* Non-grouped convolutions are deprecated. Their functionality is supported by grouped convolution.
38+
539
## Composable Kernel 1.1.0 for ROCm 7.0.0
640

741
### Added
@@ -19,26 +53,18 @@ Documentation for Composable Kernel available at [https://rocm.docs.amd.com/proj
1953
* Added support for Split K for grouped convolution backward data.
2054
* Added logit soft-capping support for fMHA forward kernels.
2155
* Added support for hdim as a multiple of 32 for FMHA (fwd/fwd_splitkv)
22-
* Added support for hdim as a multiple of 32 for FMHA (fwd/fwd_splitkv/bwd)
2356
* Added benchmarking support for tile engine GEMM.
2457
* Added Ping-pong scheduler support for GEMM operation along the K dimension.
2558
* Added rotating buffer feature for CK_Tile GEMM.
2659
* Added int8 support for CK_TILE GEMM.
27-
* Added support for elementwise kernel.
2860

2961
### Optimized
3062

63+
* Optimize the gemm multiply multiply preshuffle & lds bypass with Pack of KGroup and better instruction layout.
64+
* Added Vectorize Transpose optimization for CK Tile
65+
* Added the asynchronous copy for gfx950
3166

32-
* Optimize the gemm multiply multiply preshuffle & lds bypass with Pack of KGroup and better instruction layout. (#2166)
33-
* Added Vectorize Transpose optimization for CK Tile (#2131)
34-
* Added the asynchronous copy for gfx950 (#2425)
35-
36-
37-
### Fixes
38-
39-
None
40-
41-
### Changes
67+
### Changed
4268

4369
* Removed support for gfx940 and gfx941 targets (#1944)
4470
* Replaced the raw buffer load/store intrinsics with Clang20 built-ins (#1876)
@@ -47,14 +73,6 @@ None
4773
* Number of instances in instance factory for grouped convolution backward weight NGCHW/GKYXC/NGKHW has been reduced.
4874
* Number of instances in instance factory for grouped convolution backward data NGCHW/GKYXC/NGKHW has been reduced.
4975

50-
### Known issues
51-
52-
None
53-
54-
### Upcoming changes
55-
56-
* Non-grouped convolutions are deprecated. All of their functionality is supported by grouped convolution.
57-
5876
## Composable Kernel 1.1.0 for ROCm 6.1.0
5977

6078
### Additions

0 commit comments

Comments
 (0)