/ Machine Learning

A Large-Scale Dataset for Real-World Face Forgery Detection

A Large-Scale Dataset for Real-World Face Forgery Detection

DeeperForensics-1.0

This repository will provide code, model and data for the following paper:

DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection
Liming Jiang, Wayne Wu, Ren Li, Chen Qian and Chen Change Loy
ArXiv:2001.03024, 2020.

Abstract: In this paper, we present our on-going effort of constructing a large-scale benchmark, DeeperForensics-1.0, for face forgery detection. Our benchmark represents the largest face forgery detection dataset by far, with 60, 000 videos constituted by a total of 17.6 million frames, 10 times larger than existing datasets of the same kind. Extensive real-world perturbations are applied to obtain a more challenging benchmark of larger scale and higher diversity. All source videos in DeeperForensics-1.0 are carefully collected, and fake videos are generated by a newly proposed end-to-end face swapping framework. The quality of generated videos outperforms those in existing datasets, validated by user studies. The benchmark features a hidden test set, which contains manipulated videos achieving high deceptive scores in human evaluations. We further contribute a comprehensive study that evaluates five representative detection baselines and make a thorough analysis of different settings. We believe this dataset will contribute to real-world face forgery detection research.

comparison

Overview

Data Collection

We invited 100 paid actors from 26 countries to record the source videos. Our high-quality collected data vary in identities, poses, expressions, emotions, lighting conditions, and 3DMM blendshapes.

Face Manipulation

We also propose a new learning-based many-to-many face swapping method,
DeepFake Variational Auto-Encoder (DF-VAE). DF-VAE improves scalability,
style matching, and temporal continuity to ensure face swapping quality.
DF-VAE

Several face manipulation results:
manipulation

Many-to-many (three-to-three) face swapping by a single model:
m2m

Real-World Perturbation

We apply 7 types (transmission errors, compression, etc.) of distortions
at 5 intensity levels. Some videos are subjected to a mixture of more than
one distortion. These perturbations make DeeperForensics-1.0 better simulate
real-world scenarios.
perturbations

Benchmark

We benchmark five representative open-source forgery detection methods
using our DeeperForensics-1.0 dataset. Please refer to our paper
for more information.

Installation

The code of our face manipulation method, DF-VAE, will be open-source.
Please stay tuned.

Downloads

DeeperForensics-1.0 dataset and the models will be made publicly available
for non-commercial research purposes. Please stay tuned.

Citation

If you find this work useful for your research, please cite our paper:

@article{jiang2020deeperforensics10,
  title={DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection},
  author={Jiang, Liming and Wu, Wayne and Li, Ren and Qian, Chen and Loy, Chen Change},
  journal={arXiv preprint arXiv:2001.03024},
  year={2020}
}

Acknowledgments

We gratefully acknowledge the exceptional help from Hao Zhu and Keqiang Sun for source data collection and coordination.

GitHub