Tuesdays with Tom: Optical Character Recognition Engine

Post by
Tom Wilger

I recently wrote a post describing how our team at BP3 builds Intelligent Document Processing (IDP) pipelines in the cloud and the four major components that make up those pipelines.

The first component of an IDP pipeline is the Optical Character Recognition (OCR) engine. This component reads document images, and extracts the characters and the physical position of each word on the page. OCR is not a new idea. The concept can be traced back to the early 1900's when Emanuel Goldberg invented a machine that read characters from photographic images and converted them into telegraph code. Today, through the use of neural networks and deep learning, both open-source and commercial OCR engines make possible extracting of everything from printed characters, to check-boxes to handwritten text from business documents, passports and even street signs.

With BP3’s Sherpa document processing service, we utilize various OCR engines within our IDP pipelines including Tesseract, Azure Computer vision and AWS Textract. This fully managed machine learning OCR and data extraction service provides state-of-the-art capabilities to recognize and extract printed text, check-boxes and even handwriting from unstructured text, tables and printed forms in a variety of languages.

The OCR engine provides the foundational data from which document information is extracted. Understanding the capabilities and limitations of this component is important when creating and managing an Intelligent Document Processing pipeline.

More From Blog

You Might Also Like

Tax Season? Bring on the RPA bots!
Read More
The Intelligent Automation Market, Meet Tom Maizels New BP3 Board Member
Read More
For Release: BP3 Global Announces a $33 Million Growth Investment by Horizon Capital
Read More
We Work With Companies Just Like Yours
BP3 GLobal End to End Application Support

Global Application Support

BP3 provides always-on support for your critical applications
We offer various levels of pre-production and production support.

Visit the BP3 Help Center to open a ticket, search helpful articles,  and engage our community in the forums.