Skip to content

ztxdcyy/LLM_d2l

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Todo List

Todo顺序:Llama Transformer -> FlashAttention -> llama.cpp -> 量化

把优化的gemm和pytorch实现对比一下性能?以及融合进框架里,跑端到端实验对比naive pytorch baseline

  • Llama

  • FlashAttention

    • python
    • cpp
    • cuda!!
  • GEMM

    • Cpp
    • CUDA【0409】
    • CUDA优化:warp优化(Shared Memory)对比
    • TensorRT 及其 Nsight System分析
      • 真的能自动安排grid/block这些吗?需要设置一个极端不合理的+TensorRT对比分析)
      • 对比量化前后,时间分析
    • Triton
  • Quant

    • AutoAWQ
    • LLM.int8
    • GPTQ

参考:

https://github.com/meta-pytorch/gpt-fast

https://github.com/zjhellofss/KuiperInfer

https://github.com/ifromeast/cuda_learning/blob/main/03_gemm/sgemm_v3.cu

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors