Skip to content

basujindal/llm.cu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llm.cu

  • GPT-2 in CUDA
  • All the kernels are written from scratch
  • Experiments are done on RTX 2060

Performance

Time refers to generating 32 tokens starting from the 2016 till 2048 tokens.

Iteration 1: 48.5s Iteration 2: ?

Future Work

  • Use 3D grid for the MultiHeadAttention
  • Combine the kernels
  • Add tests
  • Use Flash Attention
  • KV cache
  • Use Tensor Cores
  • Support for batch size > 1
  • Support for fp16/int8
  • Support for multi-GPU

Small improvements

  • Remove weights from cpu

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published