Faster-R-CNN-with-model-pretrained-on-Visual-Genome

Faster RCNN model in Pytorch version, pretrained on the Visual Genome with ResNet 101.

Introduction

we provide

Model

we use the same setting and benchmark as faster-rcnn.pytorch. The results of the model are shown below.

model dataset #GPUs batch size lr lr_decay max_epoch mAP
Res-101 Visual Genome 1 1080TI 4 1e-3 5 20 10.19

Download the pretrained model and put it to the folder $load_dir.

Utilization

Prerequisites

  • Python 3.6 or higher
  • Pytorch 1.0

Preparation

Clone the code

git clone https://github.com/shilrley6/Faster-R-CNN-with-model-pretrained-on-Visual-Genome.git

Pretrained image model

Download the pretrained VGG16 and ResNet101 models according to your requirement, which are provided by faster-rcnn.pytorch.

Then put them into the path 'data/pretrained_model/'.

Compilation

Install all the python dependencies using pip:

pip install -r requirements.txt

Compile the cuda dependencies using following simple commands:

cd lib
python setup.py build develop

Pycocotools (Optional)

If you didn't install COCO API before, you are supposed to follow the following steps.

cd data
git clone https://github.com/pdollar/coco.git
cd coco/PythonAPI
make

Data processing

Generate tsv

Run generate_tsv.py to extract features of image regions. The output file format will be a tsv, where the columns are ['image_id', 'image_w', 'image_h', 'num_boxes', 'boxes', 'features'].

python generate_tsv.py --net res101 --dataset vg  \
                       --out $out_file --cuda

Change the parameter $load_dir (the path to the model, default is 'models') to adapt your environment.

PS. If you download other pretrained models, you can rename the model as 'faster_rcnn_$net_$dataset.pth' and modify the parameter $net and $dataset.

Convert data

Run convert_data.py to convert the above output to a numpy array. The output file format will be a npy, including image region features.

python convert_data.py --imgid_list $imgid_list  \
                       --input_file $input_file --output_file $output_file

The ' $imgid_list is a list of image ids, the format of which is 'txt'.

Demo

You can use this function to show object detections on demo images with a pre-trained model by running:

python demo.py --net res101 --dataset vg \
               --load_dir $load_dir --cuda

You can also add images to the folder 'images' and change the parameter $image_file.

Below are some detection results:

Faster-R-CNN-with-model-pretrained-on-Visual-Genome-1

PS. If you download other pretrained models, you can rename the model as 'faster_rcnn_$net_$dataset.pth' and modify the parameter $net and $dataset.

Acknowledgments

Thanks to 'bottom-up-attention' and faster-rcnn.pytorch.

GitHub