VL-BERT
Code for the paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".
VL-BERT is a simple yet powerful pre-trainable generic representation for visual-linguistic tasks. It is pre-trained on the massive-scale caption dataset and text-only corpus, and can be fine-tuned for various down-stream visual-linguistic tasks, such as Visual Commonsense Reasoning, Visual Question Answering and Referring Expression Comprehension.
Thanks to PyTorch and its 3rd-party libraries, this codebase also contains following features:
- Distributed Training
- FP16 Mixed-Precision Training
- Various Optimizers and Learning Rate Schedulers
- Gradient Accumulation
- Monitoring the Training Using TensorboardX
Citing VL-BERT
@article{su2019vl,
title={Vl-bert: Pre-training of generic visual-linguistic representations},
author={Su, Weijie and Zhu, Xizhou and Cao, Yue and Li, Bin and Lu, Lewei and Wei, Furu and Dai, Jifeng},
journal={arXiv preprint arXiv:1908.08530},
year={2019}
}
Prepare
Environment
- Ubuntu 16.04, CUDA 9.0
- Python 3.6.x
# We recommend you to use Anaconda/Miniconda to create a conda environment conda create -n vl-bert python=3.6 pip conda activate vl-bert
- PyTorch 1.0.0 or 1.1.0
conda install pytorch=1.1.0 cudatoolkit=9.0 -c pytorch
- Apex (optional, for speed-up and fp16 training)
git clone https://github.com/jackroos/apex cd ./apex pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
- Other requirements:
pip install -r requirements.txt
- Compile
./scripts/init.sh
Data
See PREPARE_DATA.md.
Pre-trained Models
See PREPARE_PRETRAINED_MODELS.md.
Training
Distributed Training on Single-Machine
./scripts/dist_run_single.sh <num_gpus> <task>/train_end2end.py <path_to_cfg> <dir_to_store_checkpoint>
<num_gpus>
: number of gpus to use.<task>
: pretrain/vcr/vqa/refcoco.<path_to_cfg>
: config yaml file under./cfgs/<task>
.<dir_to_store_checkpoint>
: root directory to store checkpoints.
Following is a more concrete example:
./scripts/dist_run_single.sh 4 vcr/train_end2end.py ./cfgs/vcr/base_q2a_4x16G_fp32.yaml ./
Distributed Training on Multi-Machine
For example, on 2 machines (A and B), each with 4 GPUs,
run following command on machine A:
./scripts/dist_run_multi.sh 2 0 <ip_addr_of_A> 4 <task>/train_end2end.py <path_to_cfg> <dir_to_store_checkpoint>
run following command on machine B:
./scripts/dist_run_multi.sh 2 1 <ip_addr_of_A> 4 <task>/train_end2end.py <path_to_cfg> <dir_to_store_checkpoint>
Non-Distributed Training
./scripts/nondist_run.sh <task>/train_end2end.py <path_to_cfg> <dir_to_store_checkpoint>
Note:
-
In yaml files under
./cfgs
, we set batch size for GPUs with at least 16G memory, you may need to adapt the batch size and
gradient accumulation steps according to your actual case, e.g., if you decrease the batch size, you should also
increase the gradient accumulation steps accordingly to keep 'actual' batch size for SGD unchanged. -
For efficiency, we recommend you to use distributed training even on single-machine. But for RefCOCO+, you may meet deadlock
using distributed training due to unknown reason (it may be related to PyTorch dataloader deadloack), you can simply use
non-distributed training to solve this problem.
Evaluation
VCR
-
Local evaluation on val set:
python vcr/val.py \ --a-cfg <cfg_of_q2a> --r-cfg <cfg_of_qa2r> \ --a-ckpt <checkpoint_of_q2a> --r-ckpt <checkpoint_of_qa2r> \ --gpus <indexes_of_gpus_to_use> \ --result-path <dir_to_save_result> --result-name <result_file_name>
Note:
<indexes_of_gpus_to_use>
is gpu indexes, e.g.,0 1 2 3
. -
Generate prediction results on test set for leaderboard submission:
python vcr/test.py \ --a-cfg <cfg_of_q2a> --r-cfg <cfg_of_qa2r> \ --a-ckpt <checkpoint_of_q2a> --r-ckpt <checkpoint_of_qa2r> \ --gpus <indexes_of_gpus_to_use> \ --result-path <dir_to_save_result> --result-name <result_file_name>
VQA
- Generate prediction results on test set for EvalAI submission:
python vqa/test.py \ --cfg <cfg_file> \ --ckpt <checkpoint> \ --gpus <indexes_of_gpus_to_use> \ --result-path <dir_to_save_result> --result-name <result_file_name>
RefCOCO+
- Local evaluation on val/testA/testB set:
python refcoco/test.py \ --split <val|testA|testB> \ --cfg <cfg_file> \ --ckpt <checkpoint> \ --gpus <indexes_of_gpus_to_use> \ --result-path <dir_to_save_result> --result-name <result_file_name>