jax GPT-2 Exercises

this repository has two jupyter notebooks, meant to guide you through building a GPT-2 style language model using JAX. its meant to be an educational resource for learning about both some basics of jax as well as transformer based language models

click 'Open in Colab' to run this notebook. you'll be working on a personal copy and any changes you make won't affect the original notebook.

overview

the notebook covers the following topics:

intro to GPT/language modeling
jax basics like jit/vmap/grad/pytrees
model training with sgd/adamw
token/positional embeddings
attention
feed forward models
layer norm
residuals
byte pair encoding
sampling strategies like top-k
loading and using GPT-2 weights/tokenizer

prerequisites

to get the most out of this notebook, you should have:

basic python familiarity
interest in learning about language models and JAX

contributing

if you have any ideas on how to make this notebook better (bug fixes, improved explanations, best practices, etc) please feel free open an issue, or message me on twitter/X, i want to make this notebook as useful as possible!

in the future, i hope to add/make new notebooks for the following topics:

inference performance improvements with kv cache
distributed training with jax
rotary positional encoding
llama forward pass
sparse autoencoders
...

acknowledgements

@_xjdr on twitter for opening my eyes to jax
andrej karpathy's youtube channel
https://jax.readthedocs.io/en/latest/
arxiv

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
jax_gpt2.ipynb		jax_gpt2.ipynb
jax_gpt2_sols.ipynb		jax_gpt2_sols.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

jax GPT-2 Exercises

overview

prerequisites

contributing

acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

arb8020/jax-gpt2-exercises

Folders and files

Latest commit

History

Repository files navigation

jax GPT-2 Exercises

overview

prerequisites

contributing

acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages