Skip to content

GPT2 implementation in Haskell with the Hasktorch library, inspired by Andrej Karpathy's Pytorch implementation.

License

Notifications You must be signed in to change notification settings

theosorus/GPT2-Hasktorch

Repository files navigation

GPT2 Hasktorch implementation

The goal of this project is to reproduce GPT-2, created by OpenAI, in the Haskell programming language using the Hasktorch library, drawing inspiration from Andrej Karpathy's implementation in PyTorch.

Haskell : https://www.haskell.org/

Haskorch : http://hasktorch.org/

Nano GPT(Karpathy's implementation) : https://github.com/karpathy/nanoGPT

GPT2 Paper : https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf


GPT2 Parameters

Parameters Value
nBlock 12
nHead 12
nEmbd 768
vocabSize 50 257
nbParameters 117M
seqLen 1024
activation gelu
optimizer Adam

Features

  • All modules of GPT2 ✅
  • Forward Pass ✅
  • Backward Pass ✅
  • LazyDataloader to manage big txt files ✅
  • variable learning rate ✅
  • complete training ✅
  • use gradient accumation ✅
  • Save the training state ✅
  • Performant training tracker ✅
  • Plot metrics in real time ✅
  • Load and use real GPT2 tokenizer ✅
  • Put on CUDA ✅

TODOs

  • Variable Batch Size ❌
  • Weights sharing between the input token embedding layer (wte) and the output language modeling head (lm_head) ❌
  • Use Weights Decay ❌
  • Use Flash Attention ❌
  • Use Distributed Data Parallel ❌
  • Generation Function ❌

Launch the program

docker compose up -d  # launch the docker
stack run  # launch the main
stack test # launch the tests

use Jupyter

http://localhost:8890/lab

About

GPT2 implementation in Haskell with the Hasktorch library, inspired by Andrej Karpathy's Pytorch implementation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published