Machine Leaning applied to denoise images to improve OCR Accuracy

Oct 22, 2021 1 min read

Machine Learning to Denoise Images for Better OCR Accuracy

This project is an adaptation of this tutorial and used only for learning purposes: https://www.pyimagesearch.com/2021/10/20/using-machine-learning-to-denoise-images-for-better-ocr-accuracy/

Setting Up the project ?

First and foremost clone the project with:

$ git clone https://github.com/AntonioBriPerez/Ocr-Denoiser

You don’t need to extract the zip files in order to train the model.

Once you have cloned the repository you will need to extract the features from the noisy images. This script will extract 5 x 5 – 25-d feature vectors and the it will extract the target (or cleaned) pixel value from the correspondiente ground truth standard image. And then, this features will be saved in a csv file (~200MB). To extract this features you will have to execute:

$ python3 build_features.py

It will generate the following output:

Once you have done that we will have to load those features in a proper split to train our Random Forest Regressor. That code is implemented in the file train_denoiser.py. To train the model you will have to run the command:

$ python train_denoiser.py

And it will generate:

To check that the model performs good you can execute:

$ python3 denoise_document.py --testing denoising-dirty-documents/test

And some images will be written in disk so you can check the original image and the image obtained by the model we just have trained.

Any doubts or suggestions please open an issue.

GitHub

View Github

John was the first writer to have joined pythonawesome.com. He has since then inculcated very effective writing and reviewing culture at pythonawesome which rivals have found impossible to imitate.

Machine Leaning applied to denoise images to improve OCR Accuracy

Machine Learning to Denoise Images for Better OCR Accuracy

Setting Up the project ?

GitHub

John

An intermediate Python course

League of Legends Reinforcement Learning Environment (LoLRLE) multiple training scenarios using PPO

Machine Learning to Denoise Images for Better OCR Accuracy

Setting Up the project ?

GitHub

An intermediate Python course

League of Legends Reinforcement Learning Environment (LoLRLE) multiple training scenarios using PPO

You might also like...