Transformer From Scratch in PyTorch

This project is a complete, from-scratch implementation of the Transformer architecture as described in the "Attention is All You Need" paper. It uses the OPUS Books translation dataset to perform English to Spanish translation.

Motivation

The goal of this project was to deeply understand the inner workings of the Transformer architecture by implementing it without relying on prebuilt libraries like Hugging Face Transformers. This hands-on approach helped build confidence in core NLP concepts and the foundational technologies behind Large Language Models (LLMs).

Features

Custom tokenizer using Hugging Face Tokenizers (WordLevel)
Positional Encoding
Multi-Head Attention
Encoder and Decoder stacks
Greedy Decoding
Training loop with TensorBoard support
Configurable hyperparameters

1. Model Architecture

The implementation follows the original Transformer architecture:

Input Embeddings: Learnable embeddings scaled by $\sqrt{d_{model}}$
Positional Encoding: Added to input embeddings
Multi-Head Attention: Implemented from scratch
Feedforward Layers: Two layer MLP
Layer Normalization and Residual Connections
Stacked Encoder and Decoder Blocks (configurable depth)
Final Linear + Softmax projection layer

2. Dataset

Source: OPUS Books - HuggingFace
Languages: English to Spanish
Tokenizer: Custom trained WordLevel tokenizer with special tokens [SOS], [EOS], [PAD], and [UNK]

3. Configuration

Defined in config.py:

{
  "batch_size": 32,
  "num_epochs": 3,
  "lr": 1e-4,
  "seq_len": 128,
  "d_model": 256,
  "lang_src": "en",
  "lang_tgt": "es",
  "model_folder": "weights",
  "model_filename": "tmodel_",
  "preload": None,
  "tokenizer_file": "tokenizer_{lang}.json",
  "experiment_name": "runs/tmodle"
}

4. Directory Structure

.
├── train.py              # Training script
├── model.py              # Transformer architecture
├── dataset.py            # Dataset class and causal masking
├── config.py             # Config and utility functions
├── weights/              # Model checkpoints
├── tokenizer_en.json     # Tokenizer for English
├── tokenizer_es.json     # Tokenizer for Spanish
└── runs/tmodle/          # TensorBoard logs

5. How to Run

1. Install Dependencies

pip install torch datasets tokenizers tensorboard tqdm

2. Run Training

python train.py

3. Monitor with TensorBoard

tensorboard --logdir=runs/tmodle

6. Example Output (Greedy Decoding)

--------------------------------------------------------------------------------
SOURCE: I saw the cat sleeping on the sofa.
TARGET: Vi al gato durmiendo en el sofá.
PREDICTED: Vi al gato durmiendo en el sofá.

(Note: prediction quality depends on training duration and tokenizer vocabulary.)

7. Key Learnings

Understood how multi-head attention and masking works
Gained confidence in working with positional encoding and residual layers
Learned how to process datasets and build tokenizers
Developed deeper insights into how LLMs build on the Transformer foundation

8. Future Improvements

Add support for beam search decoding
Implement learning rate scheduler with warm-up
Add BLEU score evaluation
Add inference script for custom input

9. Acknowledgements

Attention Is All You Need Paper
The Annotated Transformer
Hugging Face Datasets and Tokenizers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Transformer From Scratch in PyTorch

Motivation

Features

1. Model Architecture

2. Dataset

3. Configuration

4. Directory Structure

5. How to Run

1. Install Dependencies

2. Run Training

3. Monitor with TensorBoard

6. Example Output (Greedy Decoding)

7. Key Learnings

8. Future Improvements

9. Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
README.md		README.md
config.py		config.py
dataset.py		dataset.py
model.py		model.py
train.py		train.py

smebad/Transformer-from-Scratch

Folders and files

Latest commit

History

Repository files navigation

Transformer From Scratch in PyTorch

Motivation

Features

1. Model Architecture

2. Dataset

3. Configuration

4. Directory Structure

5. How to Run

1. Install Dependencies

2. Run Training

3. Monitor with TensorBoard

6. Example Output (Greedy Decoding)

7. Key Learnings

8. Future Improvements

9. Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages