PyTorch re-implementation of Real-time Scene Text Detection with Differentiable Binarization
Difference between thesis and this implementation
Use dice loss instead of BCE(binary cross-entropy) loss.
Use normal convolution rather than deformable convolution in the backbone network.
The architecture of the backbone network is a simple FPN.
Have not implement OHEM.
The ground truth of the threshold map is constant 1 rather than 'the distance to the closest segment'.
thanks to these project:
The features are summarized blow:
- Use resnet18/resnet50/shufflenetV2 as backbone.
- pytorch 1.1.0
- ShuffleNet_V2 Models trained on ICDAR 2013+2015 (training set)
modify genText.py to generate txt list file for training/testing data
[ ] MobileNet backbone
[ ] Deformable convolution
[ ] tensorboard support
[ ] FPN --> Architecture in the thesis
[ ] Dice Loss --> BCE Loss
[ ] threshold map gt use 1 --> threshold map gt use distance （Use 1 will accelerate the label generation）
[ ] OHEM
[ ] OpenCV_DNN inference API for CPU machine
[ ] Caffe version (for deploying with MNN/NCNN)
[ ] ICDAR13 / ICDAR15 / CTW1500 / MLT2017 / Total-Text
Subscribe to Python Awesome
Get the latest posts delivered right to your inbox