PrimeQA

The prime repository for state-of-the-art Multilingual Question Answering research and development.

Build Status LICENSE|Apache2.0

PrimeQA is a public open source repository that enables researchers and developers to train state-of-the-art models for question answering (QA). By using PrimeQA, a researcher can replicate the experiments outlined in a paper published in the latest NLP conference while also enjoying the capability to download pre-trained models (from an online repository) and run them on their own custom data.

The models within PrimeQA supports End-to-end Question Answering. PrimeQA answers questions via

  • Information Retrieval: Retrieving documents and passages using both traditional (e.g. BM25) and neural (e.g. ColBERT) models
  • Machine Reading Comprehension: Extract and/ or generate answers given the source document or passage.
  • Question Generation: Supports generation of questions for effective domain adaptation.

Some examples of models (applicable on benchmark datasets) supported are :

  • [Traditional IR with BM25] Pyserini
  • [Neural IR with ColBERT, DPR (coming soon)]: to replicate the experiments that Dr. Decr (Li et. al, 2022) performed to reach the top of the XOR TyDI leaderboard.
  • [Machine Reading Comprehension with XLM-R]: to replicate the experiments to get to the top of the TyDI leaderboard similar to the performance of the IBM GAAMA system. Coming soon: code to replicate GAAMA’s performance on Natural Questions.

Getting Started

Installation

# cd to project root

# If you want to run on GPU make sure to install torch appropriately

# E.g. for torch 1.11 + CUDA 11.3:
pip install 'torch~=1.11.0' --extra-index-url https://download.pytorch.org/whl/cu113

# Install as editable (-e) or non-editable using pip, with extras (e.g. tests) as desired
# Example installation commands:

# Minimal install (non-editable)
pip install .

# Full install (editable)
pip install -e .[all]

Please note that dependencies (specified in setup.py) are pinned to provide a stable experience. When installing from source these can be modified, however this is not officially supported.

JAVA requirements

Java 11 is required for BM25 retrieval.

Download Java 11 package from https://jdk.java.net/archive/ and uncompress

Set JAVA_HOME:

export JAVA_HOME=<jdk-dir>
export PATH=$JAVA_HOME/bin:$PATH

Unit Tests

To run the unit tests you first need to install PrimeQA. Make sure to install with the [tests] or [all] extras from pip.

From there you can run the tests via pytest, for example:

pytest --cov PrimeQA --cov-config .coveragerc tests/

For more information, see:

GitHub

View Github