Professional Services

Tuesdays with Tom - Data Extractors

For Data Extractors for Intelligent Document Processing, you need several. With BP3 Sherpa, & AWS-Powered IDP pipelines, we've created three basic types

1 minute read

May 8, 2021

If the Optical Character Recognition (OCR) engine is the heart of the IDP pipeline, the Data Extractors are the brain. Data Extractors take the data from the OCR engine and return specific, structured information that you want to extract from a document.

I recently wrote a post describing how our team at BP3 builds Intelligent Document Processing (IDP) pipelines in the cloud and the four major components that make up those pipelines.

When it comes to Data Extractors for Intelligent Document Processing, one size does not fit all. With BP3 Sherpa, and our AWS-Powered IDP pipelines, we've created different Data Extractors for different document structures; sometimes, we even use multiple extractors on a single document. In general, there are three basic types of Data Extractors:

Positional Extractors - Also, known as template-based extractors, locate data elements based on either on their absolute position on the page, or using a relative offset from an identifiable anchor point.
Table Extractors - Specifically used for extracting rows and columns of data from tabular structures.
Named-Entity Recognition (NER) Extractors - Use general-purpose or specifically trained Natural Language Processing, Machine Learning models to extract specific types of entities like names, organizations or locations from unstructured documents.

Selecting and configuring the right Data Extractor for a targeted data element within a document can sometimes be more art than science. An experienced Document Engineer will draw from her past experiences of similar situations, but in the end, the ability to experiment with different strategies usually wins the day.

Professional Services Technology evaluation & design Advanced computing & AI Enhance my business with AI

WRITTEN BY

Tom Wilger

CONTACT US

Enhance my business with AI

Advance with expert consulting

Streamline efficiency with automation

Refine workflows with process optimization

Update systems through app modernization

Banking, Finance & Insurance

Government & Public Sector

Pharma & Healthcare

Telecom & IT

Retail, Travel & Hospitality

Professional Services

Manufacturing, Construction & Design

Document & Process Automation

Advanced Computing & AI

User Experience & Support

Business Process Optimization

Organizational Enablement

Application & System Modernization

Agentic Hub

Agentic AI Compliance Monitor

Brazos Design System

Brazos Task Manager

Consulting

AI - Artificial Intelligence

Workload Automation

IDP - Intelligent Document Processing

IA - Intelligent Automation

IPA - Intelligent Process Automation

UX - Enterprise User Experience

Low-Code Development

Application Modernization

End-to-End Support

Training

Blog

News

Use Cases

Company

Careers

Contact Us

ABBYY

Automation Anywhere

AWS

Blueprism

BMC

Broadcom

Camunda

Celonis

IBM

OutSystems

Stonebranch

UiPath

Tuesdays with Tom - Data Extractors

Similar posts

AI is Not a Thing - It is Many Things

How to Unlock the Full Capabilities of IDP

Video: Decision Management and AI

Want to stay up to date with BP3's insights?

UNITED STATES

UNITED KINGDOM

NETHERLANDS

PORTUGAL