Low scale LLM for autocompletion on open corpus database (currently Tiny Shakespeare, 1.1M tokens)
Change max_length in the generate function to decide how many tokens to generate (context length will stay block_length regardless), toggle nb_iter if you want to train
Character level tokenizer:
Training Loss: 1.33
Validation Loss: 1.55
Byte-Pair Encoding tokenizer:
Training Loss: 1.56
Validation Loss: 3.34
-
Add device support (cuda, load move all params to cuda)
-
Add weight decay
-
Add Multi-Latent Attention
-
Gradient Accumulation
-
Gradient clipping
-
RoPE
Andrej Karparthy DeepSeek-V3 Technical Report DeepSeek-V3 Github (and many many youtube videos & medium articles)