Skip to content

perf(glm52): MLA decode arena + CUDA graph capture on top of #535 (−36%/layer) + kernel bench#533

Closed
n-WN wants to merge 8 commits into
openinfer-project:mainfrom
n-WN:feat/glm52-kernel-bench
Closed

perf(glm52): MLA decode arena + CUDA graph capture on top of #535 (−36%/layer) + kernel bench#533
n-WN wants to merge 8 commits into
openinfer-project:mainfrom
n-WN:feat/glm52-kernel-bench