This is an unofficial Tensorflow re-implementation of PyramidBox: A Context-assisted Single Shot Face Detector, which achieves superior performance among the state-of-the-art on the two common face detection benchmarks, FDDB and WIDER FACE.
There is still a gap in performance from the paper. May be caused by several reasons:
- Without implementing data-anchor-sampling.
- Differences of data augmentation from original.
- The batch size in the paper is 16, but I used 1 because of the limitation of memory.
- Hyperparameters not mentioned in the paper.
- Differences of deep learning framework.
Results on WIDER FACE validation set:
This is just a very casual training result. I believe you can achieve better results after trying some other hyperparameters. For example: batch size, learning rate and some parameters related to the loss function,etc.
|Method||AP Easy||AP Medium||AP Hard|
(Only tested on) Ubuntu 16.04 with:
- Tensorflow-gpu 1.4
Clone the repo
git clone https://github.com/EricZgw/PyramidBox.git python makedir.py
Run the following script for visualization:
Train on WIDER FACE Datasets
- Download pre-trained VGG16 models from here and put it to /checkpoints.
- Download WIDER FACE Datasets and convert to VOC format. Path looks like below:
datasets/ |->widerface/ | |->WIDER_train/ | |->WIDER_val/ | |->WIDER_test/ | |->Annotations/ | |->JPEGImages/ | |...
- Run the following script to generate TFRecords:
python datasets/pascalvoc_to_tfrecords.py You can run `check_data_io.py` to check data. This step is not necessary.
- The training strategy is two-stages:
train_model.pywith below setting to train additional PyramidBox layers:
self.fine_tune_vgg16 = False
- Then set
self.fine_tune_vgg16 =Tureto run
train_model.pyto train total network.
Run the following script for evaluation and get mAP:
python widerface_eval.py cd eval/eval_tools octave wider_eval.m
- Add data-anchor-sampling
- Try more logical and rigorous data augmentation
- Transfer to other backbone networks