Consistent Depth of Moving Objects in Video
Code for the SIGGRAPH 2021 paper "Consistent Depth of Moving Objects in Video".
This repository contains training code for the SIGGRAPH 2021 paper
"Consistent Depth of Moving Objects in
This is not an officially supported Google product.
We provide both conda and pip installations for dependencies.
- To install with conda, run
conda create --name dynamic-video-depth --file ./dependencies/conda_packages.txt
- To install with pip, run
pip install -r ./dependencies/requirements.txt
We provide two preprocessed video tracks from the DAVIS dataset. To download the pre-trained single-image depth prediction checkpoints, as well as the example data, run:
This script will automatically download and unzip the checkpoints and data. If you would like to download manually
To train using the example data, run:
bash ./experiments/davis/train_sequence.sh 0 --track_id dog
The first argument indicates the GPU id for training, and
--track_id indicates the name of the track. ('dog' and 'train' are provided.)
After training, the results should look like:
|Video||Our Depth||Single Image Depth|
To help with generating custom datasets for training, We provide examples of preparing the dataset from DAVIS, and two sequences from ShutterStock, which are showcased in our paper.
The general work flow for preprocessing the dataset is:
Calibrate the scale of camera translation, transform the camera matrices into camera-to-world convention, and save as individual files.
Calculate flow between pairs of frames, as well as occlusion estimates.
Pack flow and per-frame data into training batches.
To be more specific, example codes are provided in
Davis data preparation
Download the DAVIS dataset here, and unzip it under
python ./scripts/preprocess/davis/generate_frame_midas.py. This requires
trimeshto be installed (
pip install trimeshshould do the trick). This script projects the triangulated 3D points to calibrate camera translation scales.
python ./scripts/preprocess/davis/generate_flows.pyto generate optical flows between pairs of images. This stage requires
RAFT, which is included as a submodule in this repo.
python ./scripts/preprocess/davis/generate_sequence_midas.pyto pack camera calibrations and images into training batches.
Cast the videos as images, put them under
./datafiles/shutterstock/images, and rename them to match the file names in
./datafiles/shutterstock/triangulation. Note that not all frames are triangulated; time stamp of valid frames are recorded in the triangulation file name.
python ./scripts/preprocess/shutterstock/generate_frame_midas.pyto pack per-frame data.
python ./scripts/preprocess/shutterstock/generate_flows.pyto generate optical flows between pairs of images.
python ./scripts/preprocess/shutterstock/generate_sequence_midas.pyto pack flows and per-frame data into training batches.
Example training script is located at