Tuesdays with Tom - Data Extractors

Post by
Tom Wilger

If the Optical Character Recognition (OCR) engine is the heart of the IDP pipeline, the Data Extractors are the brain. Data Extractors take the data from the OCR engine and return specific, structured information that you want to extract from a document.

I recently wrote a post describing how our team at BP3 builds Intelligent Document Processing (IDP) pipelines in the cloud and the four major components that make up those pipelines.

When it comes to Data Extractors for Intelligent Document Processing, one size does not fit all. With BP3 Sherpa, and our AWS-Powered IDP pipelines, we've created different Data Extractors for different document structures; sometimes, we even use multiple extractors on a single document. In general, there are three basic types of Data Extractors:

  • Positional Extractors - Also, known as template-based extractors, locate data elements based on either on their absolute position on the page, or using a relative offset from an identifiable anchor point.
  • Table Extractors - Specifically used for extracting rows and columns of data from tabular structures.
  • Named-Entity Recognition (NER) Extractors - Use general-purpose or specifically trained Natural Language Processing, Machine Learning models to extract specific types of entities like names, organizations or locations from unstructured documents.

Selecting and configuring the right Data Extractor for a targeted data element within a document can sometimes be more art than science. An experienced Document Engineer will draw from her past experiences of similar situations, but in the end, the ability to experiment with different strategies usually wins the day.


More From Blog

You Might Also Like

Driven 2021 Speaker Announcement: Volkswagen Credit's Tavis Addison
Read More
Driven 2021 Episode One with TTI's Chris McCarley Recap
Read More
Agilify| A BP3 Company at SSOW Orlando this week!
Read More
We Work With Companies Just Like Yours
BP3 GLobal Support Center

Global Application Support

BP3 provides always-on support for your critical applications
We offer various levels of pre-production and production support.

Visit the BP3 Help Center to open a ticket, search helpful articles,  and engage our community in the forums.