Lingtrain Alignment Studio
Lingtrain Alignment Studio is the ML based app for accurate texts alignment on different languages.
- Extracts parallel corpora from two texts.
- Makes the formatted parallel book from it with sentence highlightning.
Models
Automated alignment process relies on the sentence embeddings models. Embeddings are multidimensional vectors of a special kind which are used to calculate a distance between the sentences. You can also plug your own model using the interface described in models directory. Supported languages list depend on the selected backend model.
- distiluse-base-multilingual-cased-v2
- more reliable and fast
- moderate weights size — 500MB
- supports 50+ languages
- full list of supported languages can be found in this paper
- LaBSE (Language-agnostic BERT Sentence Embedding)
- can be used for rare languages
- pretty heavy weights — 1.8GB
- supports 100+ languages
- full list of supported languages can be found here
Running on local machine
You can run the application on your computer using docker.
- Make sure that docker is installed by typing the
docker version
command in your console. - Images configured to run locally are available on Docker Hub.
- Run the following commads in your console:
docker pull lingtrain/aligner:habr
docker run -p 80:80 lingtrain/aligner:habr
- App will be available in your browser on the
localhost
address.
Running in development mode
Clone this repo on your machine.
Backend
Flask/uwsgi backend REST API service. It's pretty simple and contains all the alignment logic.
cd /be
python main.py
Frontend
SPA. Vue + vuex + vuetify. UI for managing alignment process using BE and a tool for translators to edit processing documents.
cd /fe
Setup
npm install
Compile and run with hot-reloads for development
npm run serve