This repository contains a PyTorch implementation of our Auto-Encoding Variational Neural Machine Translation paper published at the 4th Workshop on Representation Learning for NLP (RepL4NLP). Note that the results in the paper are based on a TensorFlow implementation.
You will need python3.6 or newer:
virtualenv -p python3.6 ~/envs/aevnmt.pt
source ~/envs/aevnmt.pt/bin/activate
You will need an extension to torch distributions which you can install easily:
git clone https://github.com/probabll/dists.pt.git
cd dists.pt
pip install -r requirements.txt
pip install .
cd ..
git clone https://github.com/probabll/dgm.pt.git
cd dgm.pt
pip install -r requirements.txt
pip install .
Then you will need AEVNMT.pt:
git clone https://github.com/Roxot/AEVNMT.pt.git
cd AEVNMT.pt
pip install -r requirements.txt
For developers, we recommend
pip install --editable .
For other users, we recommend
pip install .
python -u -m aevnmt.train \
--hparams_file HYPERPARAMETERS.json \
--training_prefix BILINGUAL-DATA \
--validation_prefix VALIDATION-DATA \
--src SRC --tgt TGT \
--output_dir OUTPUT-DIRECTORY
python -u -m aevnmt.translate \
--output_dir OUTPUT-DIRECTORY \
--verbose true \
--translation_input_file INPUT \
--translation_output_file TRANSLATION \
--translation_ref_file REFERENCE
There is also a sentence VAE mode:
python -u -m aevnmt.train_monolingual \
--hparams_file HYPERPARAMETERS.json \
--training_prefix BILINGUAL-DATA \
--validation_prefix VALIDATION-DATA \
--src SRC \
--output_dir OUTPUT-DIRECTORY
python -u -m aevnmt.generate \
--output_dir OUTPUT-DIRECTORY \
--verbose true \
--translation_output_file SAMPLED_TEXT \
--decoding.sample true --translation.num_prior_samples 100
See some example training and translation scripts, and a demo notebook.
- Development: only de-BPE'd outputs
Model | English-German | German-English |
---|---|---|
Conditional | 40.1 | 43.5 |
AEVNMT | 40.9 | 43.4 |
- Test: post-processed
Model | English-German | German-English |
---|---|---|
Conditional | 38.0 | 40.9 |
AEVNMT | 38.5 | 40.9 |