OCR of Chicago 1909 Renumbering Plan

Nov 12, 2021 1 min read

Requirements:

Python 3 (probably at least 3.4)
pipenv (pip3 install pipenv)
tesseract (brew install tesseract, at least if you have a mac and homebrew working)
imagemagick / ghostscript

Using this repository:

The working/ subfolders contain a folder for each page. Each contains a page.png file that’s the
baseline page. It’ll attempt to auto-deskew and crop each page. If you want to manually override
this process, create a page-handcrop.png file in the working directory. Some already have them.

pipenv install

make all at the top level should attempt to deskew, crop, split, and OCR everything, building
CSV output in each working dir.

pipenv shell

make setup

make all

After that, concatenating all the page.csv files in each working dir should work.

csvstack working/*/page.csv > all_data.csv

GitHub

View Github

OCR

John was the first writer to have joined pythonawesome.com. He has since then inculcated very effective writing and reviewing culture at pythonawesome which rivals have found impossible to imitate.

OCR of Chicago 1909 Renumbering Plan

GitHub

John

A backend for VCode Editor for saving & retriving data

Asynchronous parallel SSH client library

GitHub

A backend for VCode Editor for saving & retriving data

Asynchronous parallel SSH client library

You might also like...