Self-Supervised Multi-Frame Monocular Scene Flow

3D visualization of estimated depth and scene flow (overlayed with input image) from temporally consecutive images.

Trained on KITTI in a self-supervised manner, and tested on DAVIS.

This repository is the official PyTorch implementation of the paper:

Self-Supervised Multi-Frame Monocular Scene Flow
Junhwa Hur and Stefan Roth
CVPR, 2021

Contact: junhwa.hur[at]


The code has been tested with Anaconda (Python 3.8), PyTorch 1.8.1 and CUDA 10.1 (Different Pytorch + CUDA version is also compatible).
Please run the provided conda environment setup file:

conda env create -f environment.yml
conda activate multi-mono-sf

(Optional) Using the CUDA implementation of the correlation layer accelerates training (~50% faster):


After installing it, turn on this flag --correlation_cuda_enabled=True in training/evaluation script files.


Please download the following to datasets for the experiment:

To save space, we convert the KITTI Raw png images to jpeg, following the convention from MonoDepth:

find (data_folder)/ -name '*.png' | parallel 'convert {.}.png {.}.jpg && rm {}'

We also converted images in KITTI Scene Flow 2015 as well. Please convert the png images in image_2 and image_3 into jpg and save them into the seperate folder image_2_jpg and image_3_jpg.
To save space further, you can delete the velodyne point data in KITTI raw data as we don't need it.

Training and Inference

The scripts folder contains training/inference scripts.

For self-supervised training, you can simply run the following script files:

Script Training Dataset
./ Self-supervised KITTI Split

Fine-tuning is done with two stages: (i) first finding the stopping point using train/valid split, and then (ii) fune-tuning using all data with the found iteration steps.

Script Training Dataset
./ Semi-supervised finetuning KITTI raw + KITTI 2015
./ Semi-supervised finetuning KITTI raw + KITTI 2015

In the script files, please configure these following PATHs for experiments:

  • DATA_HOME : the directory where the training or test is located in your local system.
  • EXPERIMENTS_HOME : your own experiment directory where checkpoints and log files will be saved.

To test pretrained models, you can simply run the following script files:

Script Training Dataset
./ self-supervised KITTI 2015 Train
./ fine-tuned KITTI 2015 Test
./ self-supervised DAVIS (one scene)
./ self-supervised DAVIS (all scenes)
  • To save visuailization of outputs, please turn on --save_vis=True in the script.
  • To save output images for KITTI Scene Flow 2015 Benchmark submission, please turn on --save_out=True in the script.

Pretrained Models

The checkpoints folder contains the checkpoints of the pretrained models.


Please cite our paper if you use our source code.

  Author = {Junhwa Hur and Stefan Roth},  
  Booktitle = {CVPR},  
  Title = {Self-Supervised Multi-Frame Monocular Scene Flow},  
  Year = {2021}  
  • Portions of the source code (e.g., training pipeline, runtime, argument parser, and logger) are from Jochen Gast