Skip to content

not-so-Large Language Model implementation using PyTorch & small datasets that my setup can afford to train on

Notifications You must be signed in to change notification settings

forknay/pocketllm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pocket LLM

Low scale LLM for autocompletion on open corpus database (currently Tiny Shakespeare, 1.1M tokens)
Change max_length in the generate function to decide how many tokens to generate (context length will stay block_length regardless), toggle nb_iter if you want to train
Character level tokenizer:
Training Loss: 1.33
Validation Loss: 1.55
Byte-Pair Encoding tokenizer:
Training Loss: 1.56
Validation Loss: 3.34

To-DO

  • Add device support (cuda, load move all params to cuda)

  • Add weight decay

  • Add Multi-Latent Attention

  • Gradient Accumulation

  • Gradient clipping

  • RoPE


Resources

Andrej Karparthy DeepSeek-V3 Technical Report DeepSeek-V3 Github (and many many youtube videos & medium articles)

About

not-so-Large Language Model implementation using PyTorch & small datasets that my setup can afford to train on

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages