perf(glm52): MLA decode arena + CUDA graph capture on top of #535 (−36%/layer) + kernel bench#533
Open
n-WN wants to merge 8 commits into
Open
perf(glm52): MLA decode arena + CUDA graph capture on top of #535 (−36%/layer) + kernel bench#533n-WN wants to merge 8 commits into
n-WN wants to merge 8 commits into
background
wait
wait-all
cancel
parallel
Loading