TRAnsformer Routing Networks (TRAR)
This is an official implementation for ICCV 2021 paper “TRAR: Routing the Attention Spans in Transformers for Visual Question Answering”. It currently includes the code for training TRAR on VQA2.0 and CLEVR dataset. Our TRAR model for REC task is coming soon.
Updates
- (2021/10/10) Release our TRAR-VQA project.
- (2021/08/31) Release our pretrained
CLEVR
TRAR model ontrain
split: TRAR CLEVR Pretrained Models. - (2021/08/18) Release our pretrained TRAR model on
train+val
split andtrain+val+vg
split: VQA-v2 TRAR Pretrained Models - (2021/08/16) Release our
train2014
,val2014
andtest2015
data. Please check our dataset setup page DATA.md for more details. - (2021/08/15) Release our pretrained weight on
train
split. Please check our model page MODEL.md for more details. - (2021/08/13) The project page for TRAR is avaliable.
Introduction
TRAR vs Standard Transformer
TRAR Overall
Table of Contents
Installation
- Clone this repo
git clone https://github.com/rentainhe/TRAR-VQA.git
cd TRAR-VQA
- Create a conda virtual environment and activate it
conda create -n trar python=3.7 -y
conda activate trar
- Install
CUDA==10.1
withcudnn7
following the official installation instructions - Install
Pytorch==1.7.1
andtorchvision==0.8.2
withCUDA==10.1
:
conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch
pip install -r requirements.txt
wget https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.1.0/en_vectors_web_lg-2.1.0.tar.gz -O en_vectors_web_lg-2.1.0.tar.gz
pip install en_vectors_web_lg-2.1.0.tar.gz
Dataset setup
see DATA.md
Config Introduction
In trar.yml config we have these specific settings for TRAR
model
ORDERS: [0, 1, 2, 3]
IMG_SCALE: 8
ROUTING: 'hard' # {'soft', 'hard'}
POOLING: 'attention' # {'attention', 'avg', 'fc'}
TAU_POLICY: 1 # {0: 'SLOW', 1: 'FAST', 2: 'FINETUNE'}
TAU_MAX: 10
TAU_MIN: 0.1
BINARIZE: False
ORDERS=list
, to set the local attention window size for routing.0
for global attention.IMG_SCALE=int
, which should be equal to theimage feature size
used for training. You should setIMG_SCALE: 16
for16 × 16
training features.ROUTING={'hard', 'soft'}
, to set theRouting Block Type
in TRAR model.POOLING={'attention', 'avg', 'fc}
, to set theDownsample Strategy
used inRouting Block
.TAU_POLICY={0, 1, 2}
, to set thetemperature schedule
in training TRAR when usingROUTING: 'hard'
.TAU_MAX=float
, to set the maximum temperature in training.TAU_MIN=float
, to set the minimum temperature in training.BINARIZE=bool
, binarize the predicted alphas (alphas: the prob of choosing one path), which means during test time, we only keep the maximum alpha and set others to zero. IfBINARIZE=False
, it will keep all of the alphas and get a weight sum of different routing predict result by alphas. It won’t influence the training time, just a small difference during test time.
Note that please set BINARIZE=False
when ROUTING='soft'
, it’s no need to binarize the path prob in soft routing block.
TAU_POLICY
visualization
For MAX_EPOCH=13
with WARMUP_EPOCH=3
we have the following policy strategy:
Training
Train model on VQA-v2 with default hyperparameters:
python3 run.py --RUN='train' --DATASET='vqa' --MODEL='trar'
and the training log will be seved to:
<div class="snippet-clipboard-content position-relative overflow-auto" data-snippet-clipboard-copy-content="results/log/log_run_.txt
“>
results/log/log_run_
.txt