This repository contains a C implementation of a Transformer model, focusing on the core components of the Transformer architecture including self-attention mechanisms, positional encoding, and feed-forward networks.
The implementation includes the following key components:
-
Data Processing
- Text loading and cleaning
- Tokenization
- Word embedding generation
-
Transformer Architecture
- Self-attention mechanism
- Positional encoding
- Feed-forward networks
- Backpropagation and weight updates
-
Model Training
- Loss calculation
- Gradient clipping
- Weight optimization
Transformer-From-Scratch/
βββ include/ # Header files
β βββ transformer_block.h # Transformer block implementation
β βββ self_attention_layer.h
β βββ feed_forward_layer.h
β βββ tokenizer.h
β βββ utils.h
β βββ backprop.h
β βββ activation_functions.h
β βββ Data_Preprocessing.h
β βββ Data_Loading_Cleaning.h
βββ src/ # Source files
β βββ transformer_block.c
β βββ self_attention_layer.c
β βββ feed_forward_layer.c
β βββ tokenizer.c
β βββ utils.c
β βββ backprop.c
β βββ activation_functions.c
β βββ Data_Preprocessing.c
β βββ Data_Loading_Cleaning.c
βββ examples/ # Example code
β βββ main.c # Main training loop
βββ test_data.txt # Sample training data
- Self-Attention Mechanism: Implements scaled dot-product attention
- Positional Encoding: Adds positional information to embeddings
- Feed-Forward Networks: Implements non-linear transformations
- Backpropagation: Includes gradient computation and weight updates
- Activation Functions: Implements various activation functions including LeakyReLU and Swish
- C compiler (GCC recommended)
- OpenMP (for parallel processing)
- Standard C libraries
- Math library (-lm)
-
Compile the project:
gcc -o transformer src/*.c examples/main.c -lm -fopenmp -
Run the model:
./transformer
The model can be configured by modifying the following parameters in the code:
MAX_SENTENCE_LENGTH: Maximum length of input sequences (default: 512)MATRIX_SIZE: Size of attention matrices (default: 2)EMBEDDING_DIM: Dimension of word embeddings (default: 2)LEARNING_RATE: Learning rate for optimization (default: 0.01)
The model expects input data in the format of test_data.txt, which should contain text data for training. The data will be automatically tokenized and processed by the model.
The self-attention mechanism computes attention scores between all positions in the input sequence, allowing the model to capture long-range dependencies.
Positional information is added to the embeddings using sine and cosine functions of different frequencies.
The feed-forward network consists of two linear transformations with a non-linear activation function in between.
The implementation includes gradient computation and weight updates using the mean squared error loss function.
- This implementation is a simplified version of the original Transformer model
- The model is designed for educational purposes and may need modifications for production use
- The current implementation uses fixed-size matrices and may need adjustments for different input sizes
This project is open-source and available for educational and research purposes.