The prime repository for state-of-the-art Multilingual Question Answering research and development.
PrimeQA is a public open source repository that enables researchers and developers to train state-of-the-art models for question answering (QA). By using PrimeQA, a researcher can replicate the experiments outlined in a paper published in the latest NLP conference while also enjoying the capability to download pre-trained models (from an online repository) and run them on their own custom data.
The models within PrimeQA supports End-to-end Question Answering. PrimeQA answers questions via
- Information Retrieval: Retrieving documents and passages using both traditional (e.g. BM25) and neural (e.g. ColBERT) models
- Machine Reading Comprehension: Extract and/ or generate answers given the source document or passage.
- Question Generation: Supports generation of questions for effective domain adaptation.
Some examples of models (applicable on benchmark datasets) supported are :
- [Traditional IR with BM25] Pyserini
- [Neural IR with ColBERT, DPR (coming soon)]: to replicate the experiments that Dr. Decr (Li et. al, 2022) performed to reach the top of the XOR TyDI leaderboard.
- [Machine Reading Comprehension with XLM-R]: to replicate the experiments to get to the top of the TyDI leaderboard similar to the performance of the IBM GAAMA system. Coming soon: code to replicate GAAMA’s performance on Natural Questions.
# cd to project root # If you want to run on GPU make sure to install torch appropriately # E.g. for torch 1.11 + CUDA 11.3: pip install 'torch~=1.11.0' --extra-index-url https://download.pytorch.org/whl/cu113 # Install as editable (-e) or non-editable using pip, with extras (e.g. tests) as desired # Example installation commands: # Minimal install (non-editable) pip install . # Full install (editable) pip install -e .[all]
Please note that dependencies (specified in setup.py) are pinned to provide a stable experience. When installing from source these can be modified, however this is not officially supported.
Java 11 is required for BM25 retrieval.
Download Java 11 package from https://jdk.java.net/archive/ and uncompress
export JAVA_HOME=<jdk-dir> export PATH=$JAVA_HOME/bin:$PATH
To run the unit tests you first need to install PrimeQA.
Make sure to install with the
[all] extras from pip.
From there you can run the tests via pytest, for example:
pytest --cov PrimeQA --cov-config .coveragerc tests/
For more information, see: