The multitask and transfer learning toolkit for natural language processing research

jiant is an NLP toolkit

The multitask and transfer learning toolkit for natural language processing research.

Why should I use jiant?

jiant supports multitask learning
jiant supports transfer learning
jiant supports 50+ natural language understanding tasks
jiant supports the following benchmarks:
- GLUE
- SuperGLUE
- XTREME
jiant is a research library and users are encouraged to extend, change, and contribute to match their needs!

A few additional things you might want to know about jiant:

jiant is configuration file driven
jiant is built with PyTorch
jiant integrates with datasets to manage task data
jiant integrates with transformers to manage models and tokenizers.

Getting Started

Get started with some simple Examples
Learn more about jiant by reading our Guides
See our list of supported tasks

Installation

To import jiant from source (recommended for researchers):

git clone https://github.com/nyu-mll/jiant.git
cd jiant
pip install -r requirements.txt

# Add the following to your .bash_rc or .bash_profile 
export PYTHONPATH=/path/to/jiant:$PYTHONPATH

If you plan to contribute to jiant, install additional dependencies with pip install -r requirements-dev.txt.

To install jiant from source (alternative for researchers):

git clone https://github.com/nyu-mll/jiant.git
cd jiant
pip install . -e

To install jiant from pip (recommended if you just want to train/use a model):

pip install jiant

We recommended that you install jiant in a virtual environment or a conda environment.

To check jiant was correctly installed, run a simple example.

Quick Introduction

The following example fine-tunes a RoBERTa model on the MRPC dataset.

Python version:

from jiant.proj.simple import runscript as run
import jiant.scripts.download_data.runscript as downloader

EXP_DIR = "/path/to/exp"

# Download the Data
downloader.download_data(["mrpc"], f"{EXP_DIR}/tasks")

# Set up the arguments for the Simple API
args = run.RunConfiguration(
   run_name="simple",
   exp_dir=EXP_DIR,
   data_dir=f"{EXP_DIR}/tasks",
   hf_pretrained_model_name_or_path="roberta-base",
   tasks="mrpc",
   train_batch_size=16,
   num_train_epochs=3
)

# Run!
run.run_simple(args)

Bash version:

EXP_DIR=/path/to/exp

python jiant/scripts/download_data/runscript.py \
    download \
    --tasks mrpc \
    --output_path ${EXP_DIR}/tasks
python jiant/proj/simple/runscript.py \
    run \
    --run_name simple \
    --exp_dir ${EXP_DIR}/ \
    --data_dir ${EXP_DIR}/tasks \
    --hf_pretrained_model_name_or_path roberta-base \
    --tasks mrpc \
    --train_batch_size 16 \
    --num_train_epochs 3

Examples of more complex training workflows are found here.