Unsupervised Single Object Tracking (USOT)

:herb: Learning to Track Objects from Unlabeled Videos

Jilai Zheng, Chao Ma, Houwen Peng and Xiaokang Yang

2021 IEEE/CVF International Conference on Computer Vision (ICCV)


This repository implements unsupervised deep tracker USOT,
which learns to track objects from unlabeled videos.

Main ideas of USOT are listed as follows.

  • Coarsely discovering moving objects from videos, with pseudo boxes precise enough for bbox regression.
  • Training a naive Siamese tracker from single-frame pairs, then gradually extending it to longer temporal spans.
  • Following cycle memory training paradigm, enabling unsupervised tracker to update online.


Results of USOT and USOT* on recent tracking benchmarks.

Model VOT2016
AUC (%)
AUC (%)
AUC (%)
USOT 0.351 0.290 0.222 33.7 59.9 58.9
USOT* 0.402 0.344 0.219 35.8 61.5 57.4

Raw result files can be found in folder result
from Google Drive.



The environment we utilize is listed as follows.

  • Preprocessing: Pytorch 1.1.0 + CUDA-9.0 / 10.0 (following ARFlow)
  • Train / Test / Eval: Pytorch 1.7.1 + CUDA-10.0 / 10.2 / 11.1

If you have problems for preprocessing, you can actually skip it by downloading off-the-shelf
preprocessed materials.


Assume the project root path is $USOT_PATH.
You can build an environment for development with the provided script,
where $CONDA_PATH denotes your anaconda path.

bash ./preprocessing/install_model.sh $CONDA_PATH USOT
source activate USOT && export PYTHONPATH=$(pwd)

You can revise the CUDA toolkit version for pytorch in install_model.sh (by default 10.0).

Test and Eval

First, we provide both models utilized in our paper (USOT.pth and USOT_star.pth).
You can download them in folder snapshot from Google Drive,
and place them in $USOT_PATH/var/snapshot.

Next, you can link your wanted benchmark dataset (e.g. VOT2018) to $USOT_PATH/datasets_test as follows.
The ground truth json files for some benchmarks (e.g VOT2018.json)
can be downloaded in folder test from Google Drive,
and placed also in $USOT_PATH/datasets_test.

cd $USOT_PATH && mkdir datasets_test
ln -s $your_benchmark_path ./datasets_test/VOT2018

After that, you can test the tracker on these benchmarks (e.g. VOT2018) as follows.
The raw results will be placed in $USOT_PATH/var/result/VOT2018/USOT.

python -u ./scripts/test_usot.py --dataset VOT2018 --resume ./var/snapshot/USOT_star.pth

The inference result can be evaluated with pysot-toolkit. Install pysot-toolkit before evaluation.

cd $USOT_PATH/lib/eval_toolkit/pysot/utils
python setup.py build_ext --inplace

Then the evaluation can be conducted as follows.

python ./lib/eval_toolkit/bin/eval.py --dataset_dir datasets_test \
        --dataset VOT2018 --tracker_result_dir var/result/VOT2018 --trackers USOT


First, download the pretrained backbone in folder pretrain
from Google Drive
into $USOT_PATH/pretrain.
Note that USOT* and USOT are respectively trained from imagenet_pretrain.model and moco_v2_800.model.

Second, preprocess the raw datasets with the paradigm of DP + Flow.
Refer to $USOT_PATH/preprocessing/datasets_train for details.

In fact, we have provided two shortcuts for skipping this preprocessing procedure.

  • You can directly download the generated pseudo box files (e.g. got10k_flow.json)
    in folder train/box_sample_result
    from Google Drive,
    and place them into the corresponding dataset preprocessing path
    (e.g. $USOT_PATH/preprocessing/datasets_train/got10k), in order to skip the box generation procedure.
  • You can directly download the whole cropped training dataset (e.g. got10k_flow.tar) in
    dataset folder from Google Drive (Coming soon) (e.g. train/GOT-10k),
    which enables you to skip all procedures in preprocessing.

Third, revise the config file for training as $USOT_PATH/experiments/train/USOT.yaml.
Very important options are listed as follows.

  • GPUS: the gpus for training, e.g. '0,1,2,3'
  • TRAIN/PRETRAIN: the pretrained backbone, e.g. 'imagenet_pretrain.model'
  • DATASET: the folder for your cropped training instances and their pseudo annotation files,
    e.g. PATH: '/data/got10k_flow/crop511/', ANNOTATION: '/data/got10k_flow/train.json'

Finally, you can start the training phase with the following script.
The training checkpoints will also be placed automatically in $USOT_PATH/var/snapshot.

python -u ./scripts/train_usot.py --cfg experiments/train/USOT.yaml --gpus 0,1,2,3 --workers 32

We also provide a onekey script for train, test and eval.

python ./scripts/onekey_usot.py --cfg experiments/train/USOT.yaml


If any parts of our paper and codes are helpful to your work, please generously citing:

   title={Learning to Track Objects from Unlabeled Videos},
   author={Jilai Zheng and Chao Ma and Houwen Peng and Xiaokang Yang},
   booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},


We refer to the following repositories when implementing our unsupervised tracker. Thanks for their great work.


Feel free to contact me if you have any questions.


GitHub - VISION-SJTU/USOT: [ICCV2021] Learning to Track Objects from Unlabeled Videos
[ICCV2021] Learning to Track Objects from Unlabeled Videos - GitHub - VISION-SJTU/USOT: [ICCV2021] Learning to Track Objects from Unlabeled Videos