The dataset contains 3.31 million images of 9131 subjects (identities), with an average of 362.6 images for each subject. Images are downloaded from Google Image Search and have large variations in pose, age, illumination, ethnicity and profession (e.g. actors, athletes, politicians). The whole dataset is split to a training set (including 8631 identites) and a test set (including 500 identites).
VGGFace2: A dataset for recognising faces across pose and age, Q. Cao, L. Shen, W. Xie, O. M. Parkhi, A. Zisserman, In FG 2018.
|2018-09-18||Added models with lower-dimensional embedding layers for feature representation.|
|2018-04-10||Added new models trained on VGGFace2 (see below). Training details can be found in the paper.|
We report 1:1 verification performance (center 224x224 crop from resized image with shorter side = 256) on IJB-B  for reference (ROC. Higher is better). More evaluation results can be found in the paper. Models in pretrain setting are trained on MS-Celeb-1M  dataset and then fine-tuned on VGGFace2 dataset. ResNet-50 models follow the architectural configuration in  and SE-ResNet-50 models follow the one in . "<model-#D>" means that a lower-dimensional embedding layer is stacked on the top of the original final feature layer adjacent to classifier.
|Architecture||Feat dim||Pretrain||[email protected] = 0.001||[email protected] = 0.01||Model Link|
|ResNet-50||2048||N||0.878||0.938||Caffe, MatConvNet, [TF], PyTorch|
|ResNet-50||2048||Y||0.891||0.947||Caffe, MatConvNet, [TF], PyTorch|
|SE-ResNet-50||2048||N||0.888||0.949||Caffe, MatConvNet, [TF], [PyTorch]|
|SE-ResNet-50||2048||Y||0.908||0.956||Caffe, MatConvNet, [TF], [PyTorch]|
|ResNet-50-256D||256||Y||0.898||0.956||Caffe, MatConvNet, [TF], PyTorch|
|ResNet-50-128D||128||Y||0.904||0.956||Caffe, MatConvNet, [TF], PyTorch|
|SE-ResNet-50-256D||256||Y||0.912||0.965||Caffe, MatConvNet, [TF], [PyTorch]|
|SE-ResNet-50-128D||128||Y||0.910||0.959||Caffe, MatConvNet, [TF], [PyTorch]|
Caffe: SE models use the "Axpy" layer which is a combination of two consecutive operations channel-wise scale and element-wise summation (More information can be found here.) Please note that the input mean vector is in BGR order as opencv is used for loading images.
MatConvNet: This code uses the following three modules:
- autonn - a wrapper for MatConvNet
- mcnExtraLayers - some useful additional layers
- mcnSENets - supporting layers for SE models
All of these can be setup directly with
vl_contrib (i.e. run
vl_contrib install <module-name> then
vl_contrib setup <module-name>).
TensorFlow: coming soon.
PyTorch: Imported from Caffe models using the tool .
We use MTCNN for face detection. This bounding box is then extended by a factor 0.3 (except the extension outside image) to include the whole head, which is used as network input (Please note that the released faces are based on a larger extension ratio 1.0. The coordinates of bounding boxes and 5 facial keypoints referring to the loosely cropped faces can be found here.
 C. Whitelam, E. Taborsky, A. Blanton, B. Maze, J. Adams, T. Miller, N. Kalka, A. K. Jain, J. A. Duncan, K. Allen, J. Cheney and P. Grother. IARIA Janus Benchmark-B Face Dataset. In CVPR Workshop on Biometrics 2017.
 Y. Guo, L. Zhang, Y. Hu, X. He, and J. Gao. MS-Celeb-1M: A Dataset and Benchmark for Large Scale Face Recognition. In ECCV 2016.
 K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR 2016.
 J. Hu, L. Shen and G. Sun. Squeeze-and-Excitation Networks. In CVPR 2018.
 C. Chen, J. Yao, R. Zhang, Y. Zhou, T. Qin, T. Zhan and Q. Wang. MMdnn:http://github.com/Microsoft/MMdnn
This research is based upon work supported by the Office of the Director of National Intelligence (ODNI), Intel- ligence Advanced Research Projects Activity (IARPA), via contract number 2014-14071600010. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purpose notwithstanding any copyright annotation thereon.
We would like to thank Samuel Albanie for his help in model converting.