This repository contains a notebook-first, from-scratch implementation of the Transformer architecture in PyTorch, based on the Attention Is All You Need paper.
The notebook (transformer_from_scratch_with_PyTorch.ipynb) walks through:
- Input embedding and positional embedding
- Layer normalization and feed-forward blocks
- Multi-head attention
- Residual connections
- Encoder and decoder blocks
- Projection layer and full Transformer assembly
- Dataset preparation
- Training loop setup
- Inference
- Attention visualization
- PyTorch
- Hugging Face
datasets tokenizers- NumPy
- Pandas
- Altair
- TensorBoard (
torch.utils.tensorboard) tqdm
- Open the notebook in Google Colab using the badge above, or run it locally in Jupyter.
- Execute cells in order, from model components through training and inference.
🚧 Work in progress / educational implementation.