Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 (PyTorch in beta)
What you can expect from this repository:
- efficient ways to parse textual information (localize and identify each word) from your documents
- guidance on how to integrate this in your current architecture
Getting your pretrained model
End-to-End OCR is achieved in DocTR using a two-stage approach: text detection (localizing words), then text recognition (identify all characters in the word). As such, you can select the architecture used for text detection, and the one for text recognition from the list of available implementations.
from doctr.models import ocr_predictor model = ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)
Documents can be interpreted from PDF or images:
from doctr.io import DocumentFile # PDF pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images() # Image single_img_doc = DocumentFile.from_images("path/to/your/img.jpg") # Webpage webpage_doc = DocumentFile.from_url("https://www.yoursite.com").as_images() # Multiple page images multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])
Putting it together
Let’s use the default pretrained model for an example:
from doctr.io import DocumentFile from doctr.models import ocr_predictor model = ocr_predictor(pretrained=True) # PDF doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images() # Analyze result = model(doc)
To make sense of your model’s predictions, you can visualize them interactively as follows:
Or even rebuild the original document from its predictions:
import matplotlib.pyplot as plt plt.imshow(result.synthesize()); plt.axis('off'); plt.show()
ocr_predictor returns a
Document object with a nested structure (with
Artefact). To get a better understanding of our document model, check our documentation:
You can also export them as a nested dict, more appropriate for JSON format:
json_output = result.export()
For examples & further details about the export format, please refer to this section of the documentation
Since we use weasyprint, you will need extra dependencies if you are not running Linux.
For MacOS users, you can install them as follows:
brew install cairo pango gdk-pixbuf libffi
For Windows users, those dependencies are included in GTK. You can find the latest installer over here.
You can then install the latest release of the package using pypi as follows:
pip install python-doctr
We try to keep framework-specific dependencies to a minimum. But if you encounter missing ones, you can install framework-specific builds as follows:
# for TensorFlow pip install python-doctr[tf] # for PyTorch pip install python-doctr[torch]
Alternatively, you can install it from source, which will require you to install Git. First clone the project repository:
git clone https://github.com/mindee/doctr.git pip install -e doctr/.
Again, if you prefer to avoid the risk of missing dependencies, you can install the TensorFlow or the PyTorch build:
# for TensorFlow pip install -e doctr/.[tf] # for PyTorch pip install -e doctr/.[torch]
Credits where it’s due: this repository is implementing, among others, architectures from published research papers.
- Real-time Scene Text Detection with Differentiable Binarization.
- LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation
- An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition.
- Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition.
- MASTER: Multi-Aspect Non-local Network for Scene Text Recognition.
The full package documentation is available here for detailed specifications.
A minimal demo app is provided for you to play with the text detection model!
You will need an extra dependency (Streamlit) for the app to run:
pip install -r demo/requirements.txt
You can then easily run your app in your default browser by running:
streamlit run demo/app.py
If you are to deploy containerized environments, you can use the provided Dockerfile to build a docker image:
<div class="highlight highlight-source-shell position-relative" data-snippet-clipboard-copy-content="docker build . -t
docker build . -t <YOUR_IMAGE_TAG>