Corset is a web-based data selection portal that helps you getting relevant data from massive amounts of parallel data. So, if you don't need the whole corpus, but just a suitable subset (indeed, a cor(pus sub)set, this is what Corset will do for you--and the reason of the name of the tool.
Here are some highlights of what you will find in Corset:
Millions of parallel sentences to explore
- Dive into parallel corpora performing searches at the speed of light.
- Search in either source or target sides of corpora.
- Keep track of your preferred searches and their details.
Tailored corpora (corsets) from big corpora
- Get smaller and custom corpora that fit your sample text.
- Set up the details of your corset (name, topic, languages, size) and launch your search over millions of parallel sentences.
Monitor and download corsets
- See the status of your corsets, preview them, download them, remove or share them!
- Take a look to shared corsets to see if they are already tailored to your needs.