A library to calculate multilingual sentence embeddings

Jul 14, 2018 1 min read

LASER

Language-Agnostic SEntence Representations.

LASER is a library to calculate multilingual sentence embeddings.

Currently, we include an encoder which supports nine European languages:

Germanic languages: English, German, Dutch, Danish
Romanic languages: French, Spanish, Italian, Portuguese
Uralic languages: Finnish

All these languages are encoded by the same BLSTM encoder, and there is no need to specify the input language (but tokenization is language specific). According to our experience, the sentence encoder supports code-switching, i.e. the same sentences can contain words in several different languages.

We have also some evidence that the encoder generalizes somehow to other languages of the Germanic and Romanic language families (e.g. Swedish, Norwegian, Afrikaans, Catalan or Corsican), although no data of these languages was used during training.

A detailed description how the multilingual sentence embeddings are trained can be found in.

Dependencies

Python 3 with NumPy
PyTorch 0.40
Faiss (for mining bitexts)
tokenization from the Moses encoder and byte-pair-encoding

Installation

set the environment variable 'LASER' to the root of the installation, e.g. export LASER="${HOME}/projects/laser"
download encoders from Amazon s3
download third party software

./install_models.sh
./install_external_tools.sh

download the data used in the examples tasks (see description for each task)

GitHub

Machine Learning

John was the first writer to have joined pythonawesome.com. He has since then inculcated very effective writing and reviewing culture at pythonawesome which rivals have found impossible to imitate.

A library to calculate multilingual sentence embeddings

LASER

Dependencies

Installation

GitHub

John

Starter code in PyTorch for the Visual Dialog challenge

Tensorflow implementation of Convolutional Pose Machines

LASER

Dependencies

Installation

GitHub

Starter code in PyTorch for the Visual Dialog challenge

Tensorflow implementation of Convolutional Pose Machines

You might also like...