Skip to content

GL3MON/TransformerFromScratchUsingTensorflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Transformer Language Model

This is a TensorFlow implementation of a Transformer-based language model. The Transformer architecture is a powerful model for various natural language processing tasks, including language generation and translation. This particular implementation focuses on language generation tasks.

Description

The Transformer model consists of encoder and decoder blocks, each containing multi-head self-attention layers and feed-forward neural networks. The model learns to generate text autoregressively, predicting the next token based on the preceding context.

Usage

Install TensorFlow and its dependencies. Make sure to have an input.txt file containing the text corpus for training. Update the configuration parameters such as block_size, batch_size, learning_rate, etc., according to your requirements. Run the script, which will train the Transformer model on the provided text corpus. After training, you can test the model's generation capability by calling the generate() function and passing an initial sequence of tokens.

Requirements

TensorFlow Numpy

Acknowledgements

This implementation is based on the Transformer architecture proposed in the paper "Attention is All You Need" by Vaswani et al.

About

This is a TensorFlow implementation of a Transformer-based language model.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages