Detectron2
This repo contains the training configurations, code and trained models trained on PubLayNet dataset using Detectron2 implementation.
PubLayNet is a very large dataset for document layout analysis (document segmentation). It can be used to trained semantic segmentation/Object detection models.
NOTE
- Models are trained on a portion of the dataset (train-0.zip, train-1.zip, train-2.zip, train-3.zip)
- Trained on total 191,832 images
- Models are evaluated on dev.zip (~11,000 images)
- Backbone pretrained on COCO dataset is used but trained from scratch on PubLayNet dataset
- Trained using Nvidia GTX 1080Ti 11GB
- Trained on Windows 10
Steps to test pretrained models locally or jump to next section for docker deployment
- Install the latest
Detectron2
from https://github.com/facebookresearch/detectron2 - Copy config files (
DLA_*
) from this repo to the installed Detectron2 - Download the relevant model from the
Benchmarking
section. If you have downloaded model usingwget
then refer https://github.com/hpanwar08/detectron2/issues/22 - Add the below code in demo/demo.py in the
main
to get confidence along with label names
from detectron2.data import MetadataCatalog
MetadataCatalog.get("dla_val").thing_classes = ['text', 'title', 'list', 'table', 'figure']
- Then run below command for prediction on single image (change the config file relevant to the model)
python demo/demo.py --config-file configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml --input "<path to image.jpg>" --output <path to save the predicted image> --confidence-threshold 0.5 --opts MODEL.WEIGHTS <path to model_final_trimmed.pth> MODEL.DEVICE cpu
Docker Deployment
- For local docker deployment for testing use Docker DLA
Benchmarking
Architecture | No. images | AP | AP50 | AP75 | AP Small | AP Medium | AP Large | Model size full | Model size trimmed |
---|---|---|---|---|---|---|---|---|---|
MaskRCNN Resnext101_32x8d FPN 3X | 191,832 | 90.574 | 97.704 | 95.555 | 39.904 | 76.350 | 95.165 | 816M | 410M |
MaskRCNN Resnet101 FPN 3X | 191,832 | 90.335 | 96.900 | 94.609 | 36.588 | 73.672 | 94.533 | 480M | 240M |
MaskRCNN Resnet50 FPN 3X | 191,832 | 87.219 | 96.949 | 94.385 | 38.164 | 72.292 | 94.081 | 168M |
Configuration used for training
Architecture | Config file | Training Script |
---|---|---|
MaskRCNN Resnext101_32x8d FPN 3X | configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml | ./tools/train_net_dla.py |
MaskRCNN Resnet101 FPN 3X | configs/DLA_mask_rcnn_R_101_FPN_3x.yaml | ./tools/train_net_dla.py |
MaskRCNN Resnet50 FPN 3X | configs/DLA_mask_rcnn_R_50_FPN_3x.yaml | ./tools/train_net_dla.py |
Some helper code and cli commands
Add the below code in demo/demo.py to get confidence along with label names
from detectron2.data import MetadataCatalog
MetadataCatalog.get("dla_val").thing_classes = ['text', 'title', 'list', 'table', 'figure']
Then run below command for prediction on single image
python demo/demo.py --config-file configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml --input "<path to image.jpg>" --output <path to save the predicted image> --confidence-threshold 0.5 --opts MODEL.WEIGHTS <path to model_final_trimmed.pth> MODEL.DEVICE cpu
TODOs ⏰
- [ ] Train MaskRCNN resnet50
Sample results from detectron2
Detectron2 is Facebook AI Research's next generation software system
that implements state-of-the-art object detection algorithms.
It is a ground-up rewrite of the previous version,
Detectron,
and it originates from maskrcnn-benchmark.