A seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning

Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 (PyTorch in beta)

What you can expect from this repository:

efficient ways to parse textual information (localize and identify each word) from your documents
guidance on how to integrate this in your current architecture

Quick Tour

Getting your pretrained model

End-to-End OCR is achieved in DocTR using a two-stage approach: text detection (localizing words), then text recognition (identify all characters in the word). As such, you can select the architecture used for text detection, and the one for text recognition from the list of available implementations.

from doctr.models import ocr_predictor

model = ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)

Reading files

Documents can be interpreted from PDF or images:

from doctr.io import DocumentFile
# PDF
pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
# Image
single_img_doc = DocumentFile.from_images("path/to/your/img.jpg")
# Webpage
webpage_doc = DocumentFile.from_url("https://www.yoursite.com").as_images()
# Multiple page images
multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])

Putting it together

Let’s use the default pretrained model for an example:

from doctr.io import DocumentFile
from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True)
# PDF
doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
# Analyze
result = model(doc)

To make sense of your model’s predictions, you can visualize them interactively as follows:

result.show(doc)

Or even rebuild the original document from its predictions:

import matplotlib.pyplot as plt

plt.imshow(result.synthesize()); plt.axis('off'); plt.show()

The ocr_predictor returns a Document object with a nested structure (with Page, Block, Line, Word, Artefact). To get a better understanding of our document model, check our documentation:

You can also export them as a nested dict, more appropriate for JSON format:

json_output = result.export()

For examples & further details about the export format, please refer to this section of the documentation

Installation

Prerequisites

Python 3.6 (or higher) and pip are required to install DocTR. Additionally, you will need to install at least one of TensorFlow or PyTorch.

Since we use weasyprint, you will need extra dependencies if you are not running Linux.

For MacOS users, you can install them as follows:

brew install cairo pango gdk-pixbuf libffi

For Windows users, those dependencies are included in GTK. You can find the latest installer over here.

Latest release

You can then install the latest release of the package using pypi as follows:

pip install python-doctr

We try to keep framework-specific dependencies to a minimum. But if you encounter missing ones, you can install framework-specific builds as follows:

# for TensorFlow
pip install python-doctr[tf]
# for PyTorch
pip install python-doctr[torch]

Developer mode

Alternatively, you can install it from source, which will require you to install Git. First clone the project repository:

git clone https://github.com/mindee/doctr.git
pip install -e doctr/.

Again, if you prefer to avoid the risk of missing dependencies, you can install the TensorFlow or the PyTorch build:

# for TensorFlow
pip install -e doctr/.[tf]
# for PyTorch
pip install -e doctr/.[torch]

Models architectures

Credits where it’s due: this repository is implementing, among others, architectures from published research papers.

Text Detection

Text Recognition

More goodies

Documentation

The full package documentation is available here for detailed specifications.

Demo app

A minimal demo app is provided for you to play with the text detection model!

You will need an extra dependency (Streamlit) for the app to run:

pip install -r demo/requirements.txt

You can then easily run your app in your default browser by running:

streamlit run demo/app.py

Docker container

If you are to deploy containerized environments, you can use the provided Dockerfile to build a docker image:

docker build . -t <YOUR_IMAGE_TAG>

Example script

An example script is provided for a simple documentation analysis of a PDF or image file:

Minimal API integration

Looking to integrate DocTR into your API? Here is a template to get you started with a fully working API using the wonderful FastAPI framework.

Deploy your API locally

Specific dependencies are required to run the API template, which you can install as follows:

Alternatively, you can run the same server on a docker container if you prefer using:

What you have deployed

Your API should now be running locally on your port 8002. Access your automatically-built documentation at http://localhost:8002/redoc and enjoy your three functional routes (“/detection”, “/recognition”, “/ocr”). Here is an example with Python to send a request to the OCR route:

Citation

Contributing

If you scrolled down to this section, you most likely appreciate open source. Do you feel like extending the range of our supported characters? Or perhaps submitting a paper implementation? Or contributing in any other way?

You’re in luck, we compiled a short guide (cf. CONTRIBUTING) for you to easily do so!

A seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning

Quick Tour

Getting your pretrained model

Reading files

Putting it together

Installation

Prerequisites

Latest release

Developer mode

Models architectures

Text Detection

Text Recognition

More goodies

Documentation

Demo app

Docker container

Example script

Minimal API integration

Deploy your API locally

What you have deployed

Citation

Contributing

License

GitHub

John

300+ Python Interview Questions and Answers

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation

Quick Tour

Getting your pretrained model

Reading files

Putting it together

Installation

Prerequisites

Latest release

Developer mode

Models architectures

Text Detection

Text Recognition

More goodies

Documentation

Demo app

Docker container

Example script

Minimal API integration

Deploy your API locally

What you have deployed

Citation

Contributing

License

GitHub

300+ Python Interview Questions and Answers

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation

You might also like...