Skip to content

Commit 4fde075

Browse files
lock-in
1 parent ba51804 commit 4fde075

File tree

1 file changed

+13
-12
lines changed

1 file changed

+13
-12
lines changed

README.md

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,21 @@
11
# Building Transformers from Scratch
2-
## My second attempt at building transformers from scratch using the [Attention paper](https://arxiv.org/abs/1706.03762) as a guide.
2+
* My second attempt at building transformers from scratch using the [Attention paper](https://arxiv.org/abs/1706.03762) as a guide.
33
* Special thanks to [Joris Baan](https://github.com/jsbaan/transformer-from-scratch) for the original code and the inspiration to build this project.
4+
45
## Introduction
5-
### Transformers have become the go-to model for many natural language processing tasks. They have been shown to outperform RNNs and LSTMs on many tasks. The transformer model was introduced in the paper [Attention is All You Need](https://arxiv.org/abs/1706.03762) by Vaswani et al. The transformer model is based on the self-attention mechanism, which allows the model to focus on different parts of the input sequence when making predictions. The transformer model consists of an encoder and a decoder, each of which is composed of multiple layers of self-attention and feed-forward neural networks. The transformer model has been shown to achieve state-of-the-art performance on many natural language processing tasks, including machine translation, text summarization, and question answering.
6+
* Transformers have become the go-to model for many natural language processing tasks. They have been shown to outperform RNNs and LSTMs on many tasks. The transformer model was introduced in the paper [Attention is All You Need](https://arxiv.org/abs/1706.03762) by Vaswani et al. The transformer model is based on the self-attention mechanism, which allows the model to focus on different parts of the input sequence when making predictions. The transformer model consists of an encoder and a decoder, each of which is composed of multiple layers of self-attention and feed-forward neural networks. The transformer model has been shown to achieve state-of-the-art performance on many natural language processing tasks, including machine translation, text summarization, and question answering.
67

7-
### In this project, I will build a transformer model from scratch using PyTorch. The model will be trained on a simple dataset and will be evaluated on a test set. The goal of this project is to gain a better understanding of how the transformer model works and how it can be implemented in code.
8+
* In this project, I will build a transformer model from scratch using PyTorch. The model will be trained on a simple dataset and will be evaluated on a test set. The goal of this project is to gain a better understanding of how the transformer model works and how it can be implemented in code.
89

9-
## The goal of this project is to build a transformer model from scratch using PyTorch. The model will be trained on a simple dataset and will be evaluated on a test set. The model will be built using the following components:
10-
- Multi-Head Attention - The model will use multi-head attention to allow the model to focus on different parts of the input sequence when making predictions.
11-
- Position-wise Feed-Forward Networks - The model will use position-wise feed-forward networks to process the output of the multi-head attention layer.
12-
- Layer Normalization - The model will use layer normalization to normalize the output of the multi-head attention and feed-forward layers.
13-
- Residual Connections - The model will use residual connections to allow the model to learn the identity function.
14-
- Positional Encoding - The model will use positional encoding to encode the position of each token in the input sequence.
15-
- Masking - The model will use masking to prevent the model from attending to future tokens during training.
10+
* The goal of this project is to build a transformer model from scratch using PyTorch. The model will be trained on a simple dataset and will be evaluated on a test set. The model will be built using the following components:
11+
- Multi-Head Attention - The model will use multi-head attention to allow the model to focus on different parts of the input sequence when making predictions.
12+
- Position-wise Feed-Forward Networks - The model will use position-wise feed-forward networks to process the output of the multi-head attention layer.
13+
- Layer Normalization - The model will use layer normalization to normalize the output of the multi-head attention and feed-forward layers.
14+
- Residual Connections - The model will use residual connections to allow the model to learn the identity function.
15+
- Positional Encoding - The model will use positional encoding to encode the position of each token in the input sequence.
16+
- Masking - The model will use masking to prevent the model from attending to future tokens during training.
1617

17-
## The model will be trained using the Adam optimizer and the learning rate will be scheduled using the Noam learning rate scheduler. The model will be evaluated using the [BLEU score metric](https://en.wikipedia.org/wiki/BLEU).
18+
* The model will be trained using the Adam optimizer and the learning rate will be scheduled using the Noam learning rate scheduler. The model will be evaluated using the [BLEU score metric](https://en.wikipedia.org/wiki/BLEU).
1819

1920
## The project will be divided into the following sections:
2021
1. Data Preprocessing
@@ -23,4 +24,4 @@
2324
4. Evaluation
2425
5. Conclusion
2526

26-
#### Side note: I was listening to the theory of consciousness from the YouTube video [Consciousness of Artificial Intelligence](https://www.youtube.com/watch?v=sISkAb7suqo) while building this. It's a very interesting video and I highly recommend it.
27+
* Side note: I was listening to the theory of consciousness from the YouTube video [Consciousness of Artificial Intelligence](https://www.youtube.com/watch?v=sISkAb7suqo) while building this. It's a very interesting video and I highly recommend it.

0 commit comments

Comments
 (0)