/ Machine Learning

Differentiable Mask-Matching Network for Video Object Segmentation

Differentiable Mask-Matching Network for Video Object Segmentation

DMM-Net

Differentiable Mask-Matching Network for Video Object Segmentation (ICCV 2019).
paper.pdf

Requirements:

  • PyTorch 1.1.0
  • matplotlib 3.0.2
  • not in requirements.txt:
    • cython
    • torchvision 0.2.2 or 0.3.0 if cuda10.0
    • pycocotools # (2.0-py3.7-linux-x86_64)
    • maskrnn-benchmark
    • pyyaml yacs
    • opencv-python scikit-image
    • easydict prettytable lmdb tabulate

Installation

Data

YouTube-VOS

  • Download the YouTube-VOS dataset from their website.
    Please note that our code is trained and tested only on YouTube-VOS dataset for 2018 version. There is a newer version released 2019 but it is not tested.

  • We recommend to symlink the path to the youtube dataset to datasets/ as follows

cd datasets 
ln -s path/to/youtubeVOS youtubeVOS 
  • The files structure should look like:
DMM/datasets 
       ├── youtubeVOS
       │       ├── train 
       │       │      ├── JPEGImages
       │       │      │        ├── ... 
       │       │      ├── Annotations 
       │       │      │        ├── ... 
       │       ├── valid 
       │       │      ├── JPEGImages
       │       │      │        ├── ... 
       │       │      ├── Annotations 
       │       │      │        ├── ... 
       │       ├── train_testdev_ot (optional)
       │       │      ├── JPEGImages
       │       │      │        ├── ... 
       │       │      ├── Annotations 
       │       │      │        ├── ... 

Prepare proposals

Option1: Download the extracted file

for evaluation

  • To eval DMMnet on youtubeVOS with the fine-tuned proposal net, use the propsoals generated by our fine-tuned Mask R-CNN model:
    mkdir -p experiments/proposals/ 
    cd experiments/proposals/  
    wget https://www.cs.toronto.edu/~xiaohui/proposals_ytb_train.tar.gz
    tar xzf proposals_ytb_train.tar.gz 
    

for training

  • To train the DMMnet on youtubeVOS train-train split, need to prepare 1. proposals for both train-train and train-val split extracted by coco pretrained X101 Mask R-CNN model

  • proposals can be downloaded:

     mkdir -p experiments/proposals/  
     cd experiments/proposals/ 
     wget http://www.cs.toronto.edu/~xiaohui/feature_coco81.tar.gz
     tar xzf feature_coco81.tar.gz 
    
  • preprocess the proposals for training DMM:

python src/tools/reduce_pth_size_by_videos.py  experiments/proposals/coco81/inference/youtubevos_train3k_meta/predictions.pth  train 50
python src/tools/reduce_pth_size_by_videos.py  experiments/proposals/coco81/inference/youtubevos_val200_meta/predictions.pth  trainval 50
python src/tools/reduce_pth_size_by_videos.py  experiments/proposals/coco81/inference/youtubevos_testdev_online_meta/predictions.pth  train_testdev_ot 90 
  • The files structure should look like:
DMM/experiments
       ├── propnet
       │     ├── join_ytb_bin
       │     │       ├── model_0172500.pth 
       │     ├── online_ytb
       │     │       ├── model_0225000.pth 
       ├── dmmnet 
       │     ├── ytb_255_50_matchloss_epo13
       │     │       ├── epo13_iter01640
       │     ├── ytb_255_50
       │     │       ├── epo08_iter01640
       │     ├── online_ytb
       │     │       ├── epo101 
       ├── proposals 
       │     ├── coco81 
       │     │     ├── inference
       │     │     │       ├── youtubevos_train3k_meta (optional)
       │     │     │       ├── youtubevos_val200_meta
       │     │     │       ├── youtubevos_testdev_online_meta (optional)
       │     ├── ytb_train 
       │     │     ├── inference
       │     │     │       ├── youtubevos_val200_meta
       │     ├── ytb_ot 
       │     │     ├── inference
       │     │     │       ├── youtubevos_testdev_meta

Option2: extract the proposals

  • The model trained on youtubeVOS dataset can be found in MODEL_ZOO.md
  • The scripts used to extract proposals from the trained model can be found in scripts/extract/

Training

  • Train DMMnet on youtubeVOS:
sh scripts/train/train_load_prop.sh

Online training

Train DMMnet on the first frame of validation set,

  • first download the preprocessed data used for online training from here, extract the data and put/link the extracted folder as /PATH/TO/datasets/youtubeVOS/train_testdev_ot

  • prepare proposal, check the Section: Prepare proposals - for training

  • get the DMMnet trained on train-train set for 1 epoch from here and put it under experiments/dmmnet/

  • start online training

sh scripts/train/train_online.sh # it takes ~0.17h for one epoch

Evaluation

  • Evaluate DMMnet on trainval split:
    • will need the trained model and the extracted train-val proposal:
    cd ./experiments/dmmnet/
    wget http://www.cs.toronto.edu/~xiaohui/dmm/models/dmmnet_ytb_255_50_matchloss_epo13.tar.gz 
    tar xzf dmmnet_ytb_255_50_matchloss_epo13.tar.gz 
    wget http://www.cs.toronto.edu/~xiaohui/dmm/models/dmmnet_ytb_255_50.tar.gz 
    tar xzf dmmnet_ytb_255_50.tar.gz
    cd ../../ 
    
    cd ./experiments/proposals/ 
    wget http://www.cs.toronto.edu/~xiaohui/dmm/proposals/proposals_ytb_train.tar.gz
    tar xzf proposals_ytb_train.tar.gz 
    cd ../../
    
    • run sh scripts/eval/eval_r50.sh
    • compute the J and F score by sh scripts/metric/full_eval.sh /PATH/TO/OUTPUT/merged/
      expected results:
Method J_mean J_recall J_decay F_mean F_recall F_decay
ytb_R50_w_match_loss_epo13 model: ytb_255_50_matchloss_epo13 0.611 0.702 0.104 0.747 0.824 0.111
ytb_R50_wo_match_loss_epo08 model: ytb_255_50 0.6 0.684 0.104 0.742 0.819 0.109
  • Evaluate online-trained DMMnet:
    • Download proposals extracted by online-trained proposal net:
    cd ./experiments/proposals/
    wget http://www.cs.toronto.edu/~xiaohui/dmm/proposals/proposals_ytb_ot.tar.gz 
    tar xzf proposals_ytb_ot.tar.gz 
    cd ../../
    
    • Download model:
    cd experiments/dmmnet/
    wget http://www.cs.toronto.edu/~xiaohui/dmm/models/dmmnet_online_ytb.tar.gz 
    tar xzf dmmnet_online_ytb.tar.gz 
    cd ../../
    
    • run scripts/eval/eval_testdev.sh
    • prepare the submission data with scripts/submit.sh and submit to the server, expected resules: G mean = 0.579

GitHub