Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
cmake_minimum_required(VERSION 3.23)

project(cudnn_frontend VERSION 1.22.1)
project(cudnn_frontend VERSION 1.23.0)

option(CUDNN_FRONTEND_SKIP_JSON_LIB "Defines whether FE should not include nlohmann/json.hpp." OFF)
option(CUDNN_FRONTEND_BUILD_SAMPLES "Defines if samples are built or not." ON)
Expand Down
35 changes: 27 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,17 @@

# cuDNN FrontEnd(FE)
# cuDNN Frontend (FE)

**cuDNN FE** is the modern, open-source entry point to the NVIDIA cuDNN library and high performance open-source kernels. It provides a C++ header-only library and a Python interface to access the powerful cuDNN Graph API and open-source kernels.
[![PyPI version](https://img.shields.io/pypi/v/nvidia-cudnn-frontend.svg)](https://pypi.org/project/nvidia-cudnn-frontend/)
[![PyPI downloads](https://img.shields.io/pypi/dm/nvidia-cudnn-frontend.svg)](https://pypi.org/project/nvidia-cudnn-frontend/)
[![Python versions](https://img.shields.io/pypi/pyversions/nvidia-cudnn-frontend.svg)](https://pypi.org/project/nvidia-cudnn-frontend/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Docs](https://img.shields.io/badge/docs-nvidia.github.io-blue.svg)](https://nvidia.github.io/cudnn-frontend/)

**cuDNN Frontend** is NVIDIA's modern, open-source entry point to the cuDNN library and a growing collection of high-performance open-source kernels — scaled dot-product attention (**SDPA / Flash Attention**), grouped GEMM fusions for **Mixture-of-Experts (MoE)** training, fused normalization + activation, and more.

It provides a **header-only C++ API** and a **Python interface** (with native PyTorch integration) to the cuDNN Graph API, targeting NVIDIA **Hopper** (H100/H200) and **Blackwell** (B200/GB200/GB300) GPUs across FP16, BF16, FP8, and **MXFP8** precision.

**Links:** [Documentation](https://docs.nvidia.com/deeplearning/cudnn/frontend/latest/) · [Blog & Deep Dives](https://nvidia.github.io/cudnn-frontend/) · [PyPI](https://pypi.org/project/nvidia-cudnn-frontend/) · [Release Notes](https://github.com/NVIDIA/cudnn-frontend/releases) · [Samples](samples/)

## 🚀 Latest news:

Expand All @@ -11,10 +21,15 @@ We are now shipping **OSS kernels**, allowing you to inspect, modify, and contri

* **[GEMM + Amax](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/gemm_amax):** Optimized FP8 matrix multiplication with absolute maximum calculation.
* **[GEMM + SwiGLU](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/gemm_swiglu):** High-performance implementation of the SwiGLU activation fused with GEMM.
* **[GEMM + sReLU](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/gemm_srelu):** High-performance implementation of squared-ReLU fused with GEMM.
* **[GEMM + dsReLU](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/gemm_dsrelu):** High-performance implementation of dsquared-ReLU fused with GEMM.
* **[Grouped GEMM + GLU](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/grouped_gemm/grouped_gemm_glu):** Unified grouped GEMM GLU API supporting dense and discrete MoE weight layouts.
* **[Grouped GEMM + GLU + Hadamard](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/grouped_gemm/grouped_gemm_glu_hadamard):** Dense grouped GEMM GLU forward fusion with a fused Hadamard transform and per-expert AMAX reduction.
* **[Grouped GEMM + dGLU](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/grouped_gemm/grouped_gemm_dglu):** Unified grouped GEMM dGLU backward API supporting dense and discrete MoE weight layouts.
* **[Grouped GEMM + SwiGLU](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/grouped_gemm/grouped_gemm_swiglu):** SwiGLU activation fused with Grouped GEMM.
* **[Grouped GEMM + dSwiglu](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/grouped_gemm/grouped_gemm_dswiglu):** dSwiglu activation fused with Grouped GEMM.
* **[Grouped GEMM + sReLU](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/grouped_gemm/grouped_gemm_srelu):** Contiguous grouped squared-ReLU GEMM for MoE workloads.
* **[Grouped GEMM + dsReLU](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/grouped_gemm/grouped_gemm_dsrelu):** Contiguous grouped dsquared-ReLU GEMM for MoE workloads.
* **[Discrete Grouped GEMM + SwiGLU](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/discrete_grouped_gemm/discrete_grouped_gemm_swiglu):** Per-expert-pointer SwiGLU grouped GEMM for MoE workloads without weight packing.
* **[Discrete Grouped GEMM + dSwiGLU](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/discrete_grouped_gemm/discrete_grouped_gemm_dswiglu):** Per-expert-pointer dSwiGLU backward grouped GEMM for MoE workloads without weight packing.
* **[Grouped GEMM + Quant](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/grouped_gemm/grouped_gemm_quant):** Legacy dense-only grouped GEMM quant API for MoE FC2/dFC1 workloads.
Expand All @@ -30,13 +45,13 @@ We are now shipping **OSS kernels**, allowing you to inspect, modify, and contri

#### Llama 3.1 style Forward and Bprop with causal masking (GB300)
<p align="center">
<img src="benchmark/sdpa_benchmark_training/results/gb300_919_only_cudnn/llama3.1_top_left.png" alt="Llama 3.1 SDPA Benchmark on GB300 (only cuDNN)" width="600"/>
<img src="https://raw.githubusercontent.com/NVIDIA/cudnn-frontend/main/benchmark/sdpa_benchmark_training/results/gb300_919_only_cudnn/llama3.1_top_left.png" alt="Llama 3.1 SDPA Benchmark on GB300 (only cuDNN)" width="600"/>
</p>

#### Deepseek v3 style Forward and Bprop with causal masking (GB300)

<p align="center">
<img src="benchmark/sdpa_benchmark_training/results/gb300_919_only_cudnn/dsv3_top_left.png" alt="DSv3 SDPA Benchmark on GB300 (only cuDNN)" width="600"/>
<img src="https://raw.githubusercontent.com/NVIDIA/cudnn-frontend/main/benchmark/sdpa_benchmark_training/results/gb300_919_only_cudnn/dsv3_top_left.png" alt="DSv3 SDPA Benchmark on GB300 (only cuDNN)" width="600"/>
</p>

## Key Features
Expand All @@ -56,8 +71,9 @@ pip install nvidia-cudnn-frontend
```

**Requirements:**
* Python 3.8+
* Python 3.9+
* NVIDIA driver and CUDA Toolkit
* NVIDIA cuDNN (minimum 8.5.0)

### ⚙️ C++ (Header Only)

Expand Down Expand Up @@ -93,9 +109,12 @@ cmake --build . -j16

## Documentation & Examples

* **Developer Guide:** [Official NVIDIA Documentation](https://docs.nvidia.com/deeplearning/cudnn/frontend/v1.9.0/developer/overview.html)
* **C++ Samples:** See `samples/cpp` for comprehensive usage examples.
* **Python Samples:** See `samples/python` for pythonic implementations.
* **Developer Guide:** [Official NVIDIA Documentation (latest)](https://docs.nvidia.com/deeplearning/cudnn/frontend/latest/)
* **Blog & Deep Dives:** [nvidia.github.io/cudnn-frontend](https://nvidia.github.io/cudnn-frontend/) — release notes, installation guides, and technical deep-dives (MXFP8 attention, FP8 scale layouts, etc.)
* **C++ Samples:** See [`samples/cpp`](samples/cpp) for end-to-end examples covering convolution, matmul, SDPA / Flash Attention, normalization, and more.
* **Python Samples:** See [`samples/python`](samples/python) for Jupyter notebooks and PyTorch integration patterns.
* **OSS Kernels:** See [`python/cudnn/`](python/cudnn/) for source of SDPA, grouped GEMM + SwiGLU/GLU, RMSNorm + SiLU, Native Sparse Attention, and other open-sourced kernels.
* **PyTorch Custom Ops:** See [`python/cudnn/experimental/ops`](python/cudnn/experimental/ops) for `torch.compile`-compatible wrappers around cuDNN kernels.

## 🤝 Contributing

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
config_name,model_name,backend,data_type,attn_mask,batch_size,q_seqlen,kv_seqlen,num_q_heads,num_kv_heads,head_dim_qk,head_dim_vo,profile_pass,deterministic_bwd,time_ms,tflops,max_diff,num_iterations,sliding_window_size,success,error_message,gpu_name,cudnn_version,cudnn_backend_version
dsv3,dsv3,cudnn,bfloat16,top_left,2,32768,32768,128,128,192,128,fwd,False,50.456,1743.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,top_left,2,32768,32768,128,128,192,128,bwd,False,212.546,1076.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,top_left,2,32768,32768,128,128,192,128,bwd,True,210.687,1086.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,no_mask,2,32768,32768,128,128,192,128,fwd,False,111.031,1584.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,no_mask,2,32768,32768,128,128,192,128,bwd,False,421.194,1086.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,no_mask,2,32768,32768,128,128,192,128,bwd,True,411.727,1111.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,top_left,2,32768,32768,128,128,192,128,fwd,False,35.942,2447.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,top_left,2,32768,32768,128,128,192,128,bwd,False,123.181,1857.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,top_left,2,32768,32768,128,128,192,128,bwd,True,122.033,1874.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,no_mask,2,32768,32768,128,128,192,128,fwd,False,75.166,2340.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,no_mask,2,32768,32768,128,128,192,128,bwd,False,236.705,1932.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,no_mask,2,32768,32768,128,128,192,128,bwd,True,234.246,1953.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,top_left,2,32768,32768,128,128,192,128,fwd,False,37.773,2329.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,top_left,2,32768,32768,128,128,192,128,bwd,False,149.238,1532.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,top_left,2,32768,32768,128,128,192,128,bwd,True,152.079,1504.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,no_mask,2,32768,32768,128,128,192,128,fwd,False,81.379,2162.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,no_mask,2,32768,32768,128,128,192,128,bwd,False,293.095,1561.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,no_mask,2,32768,32768,128,128,192,128,bwd,True,289.395,1581.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,top_left,2,16384,16384,128,128,192,128,fwd,False,13.103,1678.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,top_left,2,16384,16384,128,128,192,128,bwd,False,50.424,1134.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,top_left,2,16384,16384,128,128,192,128,bwd,True,50.136,1140.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,no_mask,2,16384,16384,128,128,192,128,fwd,False,24.270,1812.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,no_mask,2,16384,16384,128,128,192,128,bwd,False,105.115,1088.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,no_mask,2,16384,16384,128,128,192,128,bwd,True,102.798,1112.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,top_left,2,16384,16384,128,128,192,128,fwd,False,8.845,2486.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,top_left,2,16384,16384,128,128,192,128,bwd,False,29.050,1968.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,top_left,2,16384,16384,128,128,192,128,bwd,True,29.760,1921.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,no_mask,2,16384,16384,128,128,192,128,fwd,False,16.984,2590.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,no_mask,2,16384,16384,128,128,192,128,bwd,False,57.481,1989.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,no_mask,2,16384,16384,128,128,192,128,bwd,True,57.615,1985.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,top_left,2,16384,16384,128,128,192,128,fwd,False,9.337,2355.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,top_left,2,16384,16384,128,128,192,128,bwd,False,36.835,1552.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,top_left,2,16384,16384,128,128,192,128,bwd,True,36.960,1547.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,no_mask,2,16384,16384,128,128,192,128,fwd,False,17.648,2492.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,no_mask,2,16384,16384,128,128,192,128,bwd,False,69.820,1638.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,no_mask,2,16384,16384,128,128,192,128,bwd,True,71.538,1598.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,top_left,2,8192,8192,128,128,192,128,fwd,False,3.470,1585.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,top_left,2,8192,8192,128,128,192,128,bwd,False,11.984,1193.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,top_left,2,8192,8192,128,128,192,128,bwd,True,13.023,1098.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,no_mask,2,8192,8192,128,128,192,128,fwd,False,6.249,1759.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,no_mask,2,8192,8192,128,128,192,128,bwd,False,25.403,1125.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,no_mask,2,8192,8192,128,128,192,128,bwd,True,25.571,1118.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,top_left,2,8192,8192,128,128,192,128,fwd,False,2.294,2397.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,top_left,2,8192,8192,128,128,192,128,bwd,False,7.296,1959.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,top_left,2,8192,8192,128,128,192,128,bwd,True,7.530,1898.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,no_mask,2,8192,8192,128,128,192,128,fwd,False,4.258,2582.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,no_mask,2,8192,8192,128,128,192,128,bwd,False,13.472,2122.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,no_mask,2,8192,8192,128,128,192,128,bwd,True,13.256,2157.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,top_left,2,8192,8192,128,128,192,128,fwd,False,2.478,2219.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,top_left,2,8192,8192,128,128,192,128,bwd,False,9.567,1494.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,top_left,2,8192,8192,128,128,192,128,bwd,True,9.205,1553.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,no_mask,2,8192,8192,128,128,192,128,fwd,False,4.567,2408.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,no_mask,2,8192,8192,128,128,192,128,bwd,False,17.014,1680.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,no_mask,2,8192,8192,128,128,192,128,bwd,True,16.251,1759.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,top_left,2,4096,4096,128,128,192,128,fwd,False,0.977,1406.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,top_left,2,4096,4096,128,128,192,128,bwd,False,3.383,1057.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,top_left,2,4096,4096,128,128,192,128,bwd,True,3.301,1083.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,no_mask,2,4096,4096,128,128,192,128,fwd,False,1.650,1666.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,no_mask,2,4096,4096,128,128,192,128,bwd,False,6.085,1175.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,no_mask,2,4096,4096,128,128,192,128,bwd,True,6.043,1183.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,top_left,2,4096,4096,128,128,192,128,fwd,False,0.656,2095.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,top_left,2,4096,4096,128,128,192,128,bwd,False,2.112,1692.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,top_left,2,4096,4096,128,128,192,128,bwd,True,2.108,1696.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,no_mask,2,4096,4096,128,128,192,128,fwd,False,1.092,2518.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,no_mask,2,4096,4096,128,128,192,128,bwd,False,3.450,2071.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,no_mask,2,4096,4096,128,128,192,128,bwd,True,3.449,2072.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,top_left,2,4096,4096,128,128,192,128,fwd,False,0.714,1925.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,top_left,2,4096,4096,128,128,192,128,bwd,False,2.620,1364.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,top_left,2,4096,4096,128,128,192,128,bwd,True,2.621,1364.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,no_mask,2,4096,4096,128,128,192,128,fwd,False,1.187,2316.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,no_mask,2,4096,4096,128,128,192,128,bwd,False,4.267,1675.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,no_mask,2,4096,4096,128,128,192,128,bwd,True,4.272,1673.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,top_left,2,2048,2048,128,128,192,128,fwd,False,0.321,1072.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,top_left,2,2048,2048,128,128,192,128,bwd,False,1.056,846.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,top_left,2,2048,2048,128,128,192,128,bwd,True,1.015,881.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,no_mask,2,2048,2048,128,128,192,128,fwd,False,0.461,1492.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,no_mask,2,2048,2048,128,128,192,128,bwd,False,1.706,1048.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,bfloat16,no_mask,2,2048,2048,128,128,192,128,bwd,True,1.633,1094.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,top_left,2,2048,2048,128,128,192,128,fwd,False,0.207,1659.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,top_left,2,2048,2048,128,128,192,128,bwd,False,0.681,1312.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,top_left,2,2048,2048,128,128,192,128,bwd,True,0.682,1311.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,no_mask,2,2048,2048,128,128,192,128,fwd,False,0.316,2175.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,no_mask,2,2048,2048,128,128,192,128,bwd,False,0.989,1807.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,fp8,no_mask,2,2048,2048,128,128,192,128,bwd,True,0.988,1808.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,top_left,2,2048,2048,128,128,192,128,fwd,False,0.224,1535.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,top_left,2,2048,2048,128,128,192,128,bwd,False,0.854,1047.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,top_left,2,2048,2048,128,128,192,128,bwd,True,0.854,1046.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,no_mask,2,2048,2048,128,128,192,128,fwd,False,0.341,2018.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,no_mask,2,2048,2048,128,128,192,128,bwd,False,1.188,1504.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
dsv3,dsv3,cudnn,mxfp8,no_mask,2,2048,2048,128,128,192,128,bwd,True,1.206,1482.000,0.000,10,,True,,NVIDIA GB200,1.21.1,92200.000
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading