A PyTorch implementation of EGVSR: Efficcient & Generic Video Super-Resolution

EGVSR-PyTorch

This is a PyTorch implementation of EGVSR: Efficcient & Generic Video Super-Resolution (VSR), using subpixel convolution to optimize the inference speed of TecoGAN VSR model. Please refer to the official implementation ESPCN and TecoGAN for more information.

Features

Unified Framework: This repo provides a unified framework for various state-of-the-art DL-based VSR methods, such as VESPCN, SOFVSR, FRVSR, TecoGAN and our EGVSR.
Multiple Test Datasets: This repo offers three types of video datasets for testing, i.e., standard test dataset -- Vid4, Tos3 used in TecoGAN and our new dataset -- Gvt72 (selected from Vimeo site and including more scenes).
Better Performance: This repo provides model with faster inferencing speed and better overall performance than prior methods. See more details in Benchmarks section.

Dependencies

Ubuntu >= 16.04
NVIDIA GPU + CUDA & CUDNN
Python 3
PyTorch >= 1.0.0
Python packages: numpy, matplotlib, opencv-python, pyyaml, lmdb (requirements.txt & req.txt)
(Optional) Matlab >= R2016b

Datasets

A. Training Dataset

Download the official training dataset based on the instructions in TecoGAN-TensorFlow, rename to VimeoTecoGAN and then place under ./data.

B. Testing Datasets

Vid4 -- Four video sequences: city, calendar, foliage and walk;
Tos3 -- Three video sequences: bridge, face and room;
Gvt72 -- Generic VSR Test Dataset: 72 video sequences (including natural scenery, culture scenery, streetscape scene, life record, sports photography, etc, as shown below)

You can get them at :arrow_double_down: 百度网盘 (提取码:8tqc) and put them into :file_folder: Datasets.
The following shows the structure of the above three datasets.

data
  ├─ Vid4
    ├─ GT                # Ground-Truth (GT) video sequences
      └─ calendar
        ├─ 0001.png
        └─ ...
    ├─ Gaussian4xLR      # Low Resolution (LR) video sequences in gaussian degradation and x4 down-sampling
      └─ calendar
        ├─ 0001.png
        └─ ...
  └─ ToS3
    ├─ GT
    └─ Gaussian4xLR
  └─ Gvt72
    ├─ GT
    └─ Gaussian4xLR

Benchmarks

Experimental Environment

	Version	Info.
System	Ubuntu 18.04.5 LTS	X86_64
CPU	Intel i9-9900	3.10GHz
GPU	Nvidia RTX 2080Ti	11GB GDDR6
Memory	DDR4 2666	32GB×2

A. Test on Vid4 Dataset

1.LR 2.VESPCN 3.SOFVSR 4.DUF 5.Ours:EGVSR 6.GT
Objective metrics for visual quality evaluation[1]

B. Test on Tos3 Dataset

1.VESPCN 2.SOFVSR 3. FRVSR 4.TecoGAN 5.Ours:EGVSR 6.GT

C. Test on Gvt72 Dataset

1.LR 2.VESPCN 3.SOFVSR 4.DUF 5.Ours:EGVSR 6.GT
Objective metrics for visual quality and temporal coherence evaluation[1]

D. Optical-Flow based Motion Compensation

Please refer to FLOW_walk, FLOW_foliage and FLOW_city.

E. Comprehensive Performance

Comparison of various SOTA VSR model on video quality score and speed performance[3]

^[1] :arrow_down::smaller value for better performance, :arrow_up:: on the contrary; Red: stands for Top1, Blue: Top2.
^[2] The calculation formula of video quality score considering both spatial and temporal domain, using lambda1=lambda2=lambda3=1/3.
^[3] FLOPs & speed are computed on RGB with resolution 960x540 to 3840x2160 (4K) on NVIDIA GeForce GTX 2080Ti GPU.