Conversation
…ttps://docs.nvidia.com/deeplearning/cudnn/backend/latest/release-notes.html#cudnn-9-18-1) and later releases. - Move away from internally using the v0.x API. Rather, now the cudnn backend API is directly called. - Improve the execution overhead by caching repeated graph query. New open source kernel for Grouped Gemm and Swiglu fussion - [Grouped GEMM + SwiGLU](gemm_fusions/grouped_gemm_swiglu.md) - **New Features**: Allows support for dynamic shapes for fprop. This will help reduce the graph building across different batch and sequence lengths. - **Support Surface**: - Now allows deterministic bprop for SDPA - Added support for bprop for ragged tensors in A100 - **More samples**: - Open sourcing our sdpa [test harness](test/python/test_mhas_v2.py). Showcase additional testing for determinism, fp8 sizes for MLA - Added samples to showcase chunked prefill. - **New API**: Added support for `moe_grouped_matmul`. See [cpp sample](samples/cpp/moe_grouped_matmul/moe_grouped_matmul.cpp) and documentation for API reference. - **More samples**: Open sourcing cudnn`s [fuzzy testing of matmuls](test/python/test_matmul_fuzzer.py) - **More samples**: Open sourcing cudnn`s [fuzzy testing of convolutions](test/python/test_conv_fuzzer.py) - Updated the benchmark results for the sdpa improvements added in cuDNN 9.18.1
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
cuDNN Frontend v1.18.0 Release Notes
cuDNN Frontend v1.18.0 is the recommended version for cuDNN 9.18.1 and later releases.
General Improvements 🚀
Open-Source Kernels
New open source kernel for Grouped Gemm and Swiglu fussion
Enhancements ✨
Scaled Dot-Product Attention (SDPA)
New Features: Allows support for dynamic shapes for fprop. This will help reduce the graph building across different batch and sequence lengths.
Support Surface:
More samples:
Mixture of Expers (MoE)
moe_grouped_matmul. See cpp sample and documentation for API reference.Matmul
Convolution
Additional Improvements
Benchmarking 📊