Real-time-Text-Detection
PyTorch re-implementation of Real-time Scene Text Detection with Differentiable Binarization
Difference between thesis and this implementation
-
Use dice loss instead of BCE(binary cross-entropy) loss.
-
Use normal convolution rather than deformable convolution in the backbone network.
-
The architecture of the backbone network is a simple FPN.
-
Have not implement OHEM.
-
The ground truth of the threshold map is constant 1 rather than 'the distance to the closest segment'.
Introduction
thanks to these project:
The features are summarized blow:
- Use resnet18/resnet50/shufflenetV2 as backbone.
Installation
- pytorch 1.1.0
Download
- ShuffleNet_V2 Models trained on ICDAR 2013+2015 (training set)
https://pan.baidu.com/s/1Um0wzbTFjJC0jdJ703GR7Q
Train
-
modify genText.py to generate txt list file for training/testing data
-
modify config.json
-
run
python train.py
Predict
- run
python predict.py
Eval
run
python eval.py
Examples
Todo
-
[ ] MobileNet backbone
-
[ ] Deformable convolution
-
[ ] tensorboard support
-
[ ] FPN --> Architecture in the thesis
-
[ ] Dice Loss --> BCE Loss
-
[ ] threshold map gt use 1 --> threshold map gt use distance (Use 1 will accelerate the label generation)
-
[ ] OHEM
-
[ ] OpenCV_DNN inference API for CPU machine
-
[ ] Caffe version (for deploying with MNN/NCNN)
-
[ ] ICDAR13 / ICDAR15 / CTW1500 / MLT2017 / Total-Text