This is a distribution of Monoses, an unsupervised Machine Translation system using monolingual corpora only.
It containes everything needed to run trainings and tests (so it contains Moses, fast_align, PyTorch (CPU), VecMap and Phrase2Vec), plus a drop-in HTTP API server that reads the model built by Monoses. All in a self-contained, handy Docker image.
This is just an addon for a proper Monoses installation, it won't work alone. Please build the Docker image, it will supply everything:
docker build -t aijanai/monoses .
- Create a directory with 2 big files inside (e.g, /it-no/ with it.txt and no.txt inside): these files are your monolingual training corpora.
- Issue the following (RECOMMENDED that you run this in tmux since it will take days):
docker run --rm --name train-it-no -v ~/it-no:/it-no aijanai/monoses python3 train.py --src /it-no/no.txt --src-lang sv --trg /it-no/it.txt --trg-lang it --working /it-no/model --threads 49
- When the process has finished, you will have several GBs of files inside your training directory.
- Launch your translation server with the following (point environment variable MODEL to the directory containing the steps and the ini files):
docker run --rm --name translate-it-no -v ~/it-no:/it-no -e MODEL=/it-no/model -p 5000:5000 aijanai/monoses
- Now query your server with the following:
curl "130.61.252.183:5000/translate?q=Ulver&source=sv&target=it"
Queries are rather slow (~ 4 seconds each) since the model is loaded in RAM every time, but works.