Releases: fanshiqing/grouped_gemm
Releases · fanshiqing/grouped_gemm
v1.1.4
Token drop support for permute & unpermute ops
Token drop support for permute & unpermute ops.
Add streams sync to multi-stream cublas.
- Add streams sync to multi-stream cublas.
Optimized permute/unpermute kernels for topk router.
Optimized permute/unpermute kernels for topk router.
Initial release
- Megablocks based gmm;
- Multi-stream cublas gemm for sm90;
- permute/unpermute kernel;
- sinkhorn kernel.