Train Opus-MT models

This package includes scripts for training NMT models using MarianNMT and OPUS data for OPUS-MT. More details are given in the Makefile but documentation needs to be improved. Also, the targets require a specific environment and right now only work well on the CSC HPC cluster in Finland.

Pre-trained models

The subdirectory models contains information about pre-trained models that can be downloaded from this project. They are distribted with a CC-BY 4.0 license license. More pre-trained models trained with the OPUS-MT training pipeline are available from the Tatoeba translation challenge also under a CC-BY 4.0 license license.

Quickstart

Setting up:

git clone https://github.com/Helsinki-NLP/OPUS-MT-train.git
git submodule update --init --recursive --remote
make install

Training a multilingual NMT model (Finnish and Estonian to Danish, Swedish and English):

make SRCLANGS="fi et" TRGLANGS="da sv en" train
make SRCLANGS="fi et" TRGLANGS="da sv en" eval
make SRCLANGS="fi et" TRGLANGS="da sv en" release

More information is available in the documentation linked below.

Documentation

Tutorials

References

Please, cite the following paper if you use OPUS-MT software and models:

@InProceedings{TiedemannThottingal:EAMT2020,
  author = {J{\"o}rg Tiedemann and Santhosh Thottingal},
  title = {{OPUS-MT} — {B}uilding open translation services for the {W}orld},
  booktitle = {Proceedings of the 22nd Annual Conferenec of the European Association for Machine Translation (EAMT)},
  year = {2020},
  address = {Lisbon, Portugal}
 }

Acknowledgements

None of this would be possible without all the great open source software including

… and many other tools like terashuf, pigz, jq, Moses SMT, fast_align, sacrebleu …

We would also like to acknowledge the support by the University of Helsinki, the IT Center of Science CSC, the funding through projects in the EU Horizon 2020 framework (FoTran, MeMAD, ) and the contributors to the open collection of parallel corpora OPUS.

GitHub

GitHub - Helsinki-NLP/OPUS-MT-train at pythonawesome.com
Training open neural machine translation models. Contribute to Helsinki-NLP/OPUS-MT-train development by creating an account on GitHub.