From 6490427c89ea75cbd9818da8f8e2f5184fed09af Mon Sep 17 00:00:00 2001 From: spolifroni-amd Date: Fri, 17 Oct 2025 14:06:04 -0400 Subject: [PATCH] updated the changelog for 7.1 and beyond --- CHANGELOG.md | 58 ++++++++++++++++++++++++++++++++++------------------ 1 file changed, 38 insertions(+), 20 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 7a21634b7d..28bcaae5b6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,40 @@ Documentation for Composable Kernel available at [https://rocm.docs.amd.com/projects/composable_kernel/en/latest/](https://rocm.docs.amd.com/projects/composable_kernel/en/latest/). +## (Unreleased) Composable Kernel for ROCm + +### Added + +* Added a compute async pipeline in the CK TILE universal GEMM on gfx950 +* Added support for B Tensor type pk_int4_t in the CK TILE weight preshuffle GEMM. +* Added the new api to load different memory sizes to SGPR. +* Added support for B Tensor Preshuffle in CK TILE Grouped GEMM. +* Added a basic copy kernel example and supporting documentation for new CK Tile developers. +* Added support for grouped_gemm kernels to perform multi_d elementwise operation. +* Added support for Multiple ABD GEMM +* Added benchmarking support for tile engine GEMM Multi D. +* Added block scaling support in CK_TILE GEMM, allowing flexible use of quantization matrices from either A or B operands. +* Added the row-wise column-wise quantization for CK_TILE GEMM & CK_TILE Grouped GEMM. +* Added support for f32 to FMHA (fwd/bwd). +* Added tensor-wise quantization for CK_TILE GEMM. +* Added support for batched contraction kernel. +* Added pooling kernel in CK_TILE + +### Changed + +* Removed `BlockSize` in `make_kernel` and `CShuffleEpilogueProblem` to support Wave32 in CK_TILE (#2594) + +## Composable Kernel 1.1.0 for ROCm 7.1.0 + +### Added + +* Added support for hdim as a multiple of 32 for FMHA (fwd/fwd_splitkv/bwd) +* Added support for elementwise kernel. + +### Upcoming changes + +* Non-grouped convolutions are deprecated. Their functionality is supported by grouped convolution. + ## Composable Kernel 1.1.0 for ROCm 7.0.0 ### Added @@ -19,26 +53,18 @@ Documentation for Composable Kernel available at [https://rocm.docs.amd.com/proj * Added support for Split K for grouped convolution backward data. * Added logit soft-capping support for fMHA forward kernels. * Added support for hdim as a multiple of 32 for FMHA (fwd/fwd_splitkv) -* Added support for hdim as a multiple of 32 for FMHA (fwd/fwd_splitkv/bwd) * Added benchmarking support for tile engine GEMM. * Added Ping-pong scheduler support for GEMM operation along the K dimension. * Added rotating buffer feature for CK_Tile GEMM. * Added int8 support for CK_TILE GEMM. -* Added support for elementwise kernel. ### Optimized +* Optimize the gemm multiply multiply preshuffle & lds bypass with Pack of KGroup and better instruction layout. +* Added Vectorize Transpose optimization for CK Tile +* Added the asynchronous copy for gfx950 -* Optimize the gemm multiply multiply preshuffle & lds bypass with Pack of KGroup and better instruction layout. (#2166) -* Added Vectorize Transpose optimization for CK Tile (#2131) -* Added the asynchronous copy for gfx950 (#2425) - - -### Fixes - -None - -### Changes +### Changed * Removed support for gfx940 and gfx941 targets (#1944) * Replaced the raw buffer load/store intrinsics with Clang20 built-ins (#1876) @@ -47,14 +73,6 @@ None * Number of instances in instance factory for grouped convolution backward weight NGCHW/GKYXC/NGKHW has been reduced. * Number of instances in instance factory for grouped convolution backward data NGCHW/GKYXC/NGKHW has been reduced. -### Known issues - -None - -### Upcoming changes - -* Non-grouped convolutions are deprecated. All of their functionality is supported by grouped convolution. - ## Composable Kernel 1.1.0 for ROCm 6.1.0 ### Additions