
Intelligent Document Processing is not about machines taking over the world. It's really about pairing A.I. with human intelligence to radically accelerate the rate and accuracy of extracting information from written documents. Or, as we like to say at BP3, "There's a faster way to do that."
I recently wrote a post describing how our team at BP3 builds Intelligent Document Processing (IDP) pipelines in the AWS cloud and the four major components that make up those pipelines.
One of the components that is often overlooked is the Human In the Loop (HITL) user interface that allows humans to participate in the data extraction process. Sometimes humans need to review or correct pieces of extracted data. This could be due to poor scan quality, stray markings on the page, or even new document types or formats entering the pipeline. The HITL component contains user interfaces that are specifically designed to allow humans to be super-efficient at reviewing and correcting potential data extraction errors. These UI's work differently depending on the type(s) of data extractors used to process the document. In general, they all present the human reviewer with a visual image of the document as well as a correctable list of extracted data elements for which the accuracy of the extraction may be in question, or simply needs to be confirmed by a flesh-and-blood human because it's really important.
When machine learning is employed by a data extractor, the Human In the Loop process also acts as a source of additional training data that when fed back into the model to improve its accuracy. In this way, as the A.I. interacts more with humans, it gets smarter over time.
Machine Learning-based solutions like Intelligent Document Processing significantly improve our efficiency doing work. That said, in many cases, the accuracy of machine learning alone isn't enough. Bringing humans into the work process, even in a small role can significantly improve this accuracy which ultimately leads to better results.