Skip to content

This project is a complete, from-scratch implementation of the Transformer architecture as described in the "Attention is All You Need" paper.

Notifications You must be signed in to change notification settings

smebad/Transformer-from-Scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transformer From Scratch in PyTorch

This project is a complete, from-scratch implementation of the Transformer architecture as described in the "Attention is All You Need" paper. It uses the OPUS Books translation dataset to perform English to Spanish translation.

Motivation

The goal of this project was to deeply understand the inner workings of the Transformer architecture by implementing it without relying on prebuilt libraries like Hugging Face Transformers. This hands-on approach helped build confidence in core NLP concepts and the foundational technologies behind Large Language Models (LLMs).

Features

  • Custom tokenizer using Hugging Face Tokenizers (WordLevel)
  • Positional Encoding
  • Multi-Head Attention
  • Encoder and Decoder stacks
  • Greedy Decoding
  • Training loop with TensorBoard support
  • Configurable hyperparameters

1. Model Architecture

The implementation follows the original Transformer architecture:

  • Input Embeddings: Learnable embeddings scaled by $\sqrt{d_{model}}$
  • Positional Encoding: Added to input embeddings
  • Multi-Head Attention: Implemented from scratch
  • Feedforward Layers: Two layer MLP
  • Layer Normalization and Residual Connections
  • Stacked Encoder and Decoder Blocks (configurable depth)
  • Final Linear + Softmax projection layer

2. Dataset

  • Source: OPUS Books - HuggingFace
  • Languages: English to Spanish
  • Tokenizer: Custom trained WordLevel tokenizer with special tokens [SOS], [EOS], [PAD], and [UNK]

3. Configuration

Defined in config.py:

{
  "batch_size": 32,
  "num_epochs": 3,
  "lr": 1e-4,
  "seq_len": 128,
  "d_model": 256,
  "lang_src": "en",
  "lang_tgt": "es",
  "model_folder": "weights",
  "model_filename": "tmodel_",
  "preload": None,
  "tokenizer_file": "tokenizer_{lang}.json",
  "experiment_name": "runs/tmodle"
}

4. Directory Structure

.
├── train.py              # Training script
├── model.py              # Transformer architecture
├── dataset.py            # Dataset class and causal masking
├── config.py             # Config and utility functions
├── weights/              # Model checkpoints
├── tokenizer_en.json     # Tokenizer for English
├── tokenizer_es.json     # Tokenizer for Spanish
└── runs/tmodle/          # TensorBoard logs

5. How to Run

1. Install Dependencies

pip install torch datasets tokenizers tensorboard tqdm

2. Run Training

python train.py

3. Monitor with TensorBoard

tensorboard --logdir=runs/tmodle

6. Example Output (Greedy Decoding)

--------------------------------------------------------------------------------
SOURCE: I saw the cat sleeping on the sofa.
TARGET: Vi al gato durmiendo en el sofá.
PREDICTED: Vi al gato durmiendo en el sofá.

(Note: prediction quality depends on training duration and tokenizer vocabulary.)


7. Key Learnings

  • Understood how multi-head attention and masking works
  • Gained confidence in working with positional encoding and residual layers
  • Learned how to process datasets and build tokenizers
  • Developed deeper insights into how LLMs build on the Transformer foundation

8. Future Improvements

  • Add support for beam search decoding
  • Implement learning rate scheduler with warm-up
  • Add BLEU score evaluation
  • Add inference script for custom input

9. Acknowledgements


About

This project is a complete, from-scratch implementation of the Transformer architecture as described in the "Attention is All You Need" paper.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages