Skip to content

Daniel-Sinkin/LLM-Tokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM-Tokenizer

Introduction

My personal implementation of an LLM Tokenizer.

As of now I'm not entirely sure what the scope of this repo will be, I'll follow the lecture and will consider where to go next afterwards.

Using Python version 3.11.6

Acknowledgements

This project was mainly inspired by the great video lectures on Neural Networks by Andrej Karpathy.

Links and References

Bibliography

  • Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. arXiv preprint arXiv:1906.05231. https://doi.org/10.48550/arXiv.1906.05231
  • Touvron, H., Martin, L., Stone, K., ... & Scialom, T. (2023). Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv preprint arXiv:2307.09288.

About

My personal implementation of an LLM Tokenizer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published