Skip to content

An implementation of Encoder-Decoder architecture built with Transformers to translate English sentences to their Turkish equivalents.

License

Notifications You must be signed in to change notification settings

expellialbus/English-to-Turkish-Translator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

English to Turkish Translator

This project aims to translate English sentences to their Turkish equivalents via a model which is built upon Transformer Architecture. The model consists of two distinct parts:

and trained on this dataset. The dataset can also be found under the dataset folder of the project.

Files and folders of the project:

⚠️ Due to file upload limitations, this project does not contain any saved model file.

dataset

As mentioned above, this folder contains the dataset download from the provided link.

layers

This folder contains layers that tensorflow does not contain itself. These layers are:

  • TransformerEncoder: Encoder layer of the model which is built with transformer architecture.

  • TransformerDecoder: Decoder layer of the model which is built with transformer architecture.

  • PositionalEmbedding: Embedding layer that additionally implements positional encodings for input sentences.

preprocessing.py

Contains codes for downloading and preprocessing the dataset.

train.py

Contains codes for building and training the the model.

inference.py

Contains codes to use the trained model.


How Things Work

What does the preprocessing.py file do?

In short, as the name suggests, the preprocessing.py file contains some preprocess functions for the dataset.

It offers a method to download the dataset. If you get an error like:

BadZipFile: File is not a zip file

Just try to change the User-Agent header from Mozilla/5.0 to something different. This problem occurs due to some kind of server security feature. To get more information about this problem, see this link.

After complete the dowloading, it decompresses the file and save it to the path specified by a parameter.

Another function inside of the preprocessing.py file splits the dataset into a train, validation and test sets and returns them. The ratio of how much of the dataset will be splitted as validation and test sets can be controlled via function parameters.

The build_vectorizers function will built vectorizers for both source sentence and target sentence and adapts both vectorizers to its dataset.

At last, the create_dataset function creates the dataset from pairs that sent to the function.


What does train.py file do?

train.py file contains functions to build and train according to specified parameters. The main function, first gets the raw texts and adapt a vectorizer on the train part of this texts. Then creates the vectorized datasets. After the dataset creation, it invokes the get_model function to build the model (this model can be adjusted according to function parameters. For more detail about parameters, see the doc string of the function). Finally, it trains the model and saves it.


What does inference.py file do?

At last, inference.py file contains some functions to use the model to make inferences on example inputs. It creates the vectorizers (since the target vectorizer contains user defined standardization function, it could not be saved, this the reason why inference.py has a function to create new vectorizers). After create the vectorizers, loads the model from disk. The translate function is the main function to make inferences and all other test functions (e.g. test_with_console_input) use this function.

P.S.: Additional information to the above explanations can be found in the source codes.

Test Results

Metrics


Train

  • Loss: 0.0607
  • Accuracy: 0.7840


Validation

  • Loss: 0.1576
  • Accuracy: 0.6381


Test

  • Loss: 0.1584
  • Accuracy: 0.6382

Test on the Test Set


English Sentence: Tom and I don't eat out as often as we used to.
Turkish Equivalent: [start] tom ve ben çoğu zaman [UNK] hakkında birlikte yemek yeriz [end]
-----------------------------------------------------------------------------------------------
English Sentence: If you can't read, it's not my fault.
Turkish Equivalent: [start] eğer benim hatam [UNK] değildir [end]
-----------------------------------------------------------------------------------------------
English Sentence: Tom is a former world triathlon champion.
Turkish Equivalent: [start] tom tüm ocak ayı şubat [end]
-----------------------------------------------------------------------------------------------
English Sentence: I hope no one steals my stuff.
Turkish Equivalent: [start] umarım herhangi bir şey [UNK] olmaz [end]
-----------------------------------------------------------------------------------------------
English Sentence: Can we change rooms?
Turkish Equivalent: [start] [UNK] değişim edebilir miyiz [end]
-----------------------------------------------------------------------------------------------
English Sentence: Tom never borrows money from his friends.
Turkish Equivalent: [start] tom arkadaşlarından hiç borç para almaz [end]
-----------------------------------------------------------------------------------------------
English Sentence: I didn't know Tom was allergic to bees.
Turkish Equivalent: [start] tomun [UNK] alerjisi olduğunu bilmiyordum [end]
-----------------------------------------------------------------------------------------------
English Sentence: I can't go swimming today.
Turkish Equivalent: [start] bugün yüzmeye gidemem [end]
-----------------------------------------------------------------------------------------------
English Sentence: Tom was the third victim.
Turkish Equivalent: [start] tom üçüncü [UNK] [end]
-----------------------------------------------------------------------------------------------
English Sentence: I saw Tom walking down the beach.
Turkish Equivalent: [start] tomu plajda kumdan gördüm [end]
-----------------------------------------------------------------------------------------------

About

An implementation of Encoder-Decoder architecture built with Transformers to translate English sentences to their Turkish equivalents.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages