Article

Automating Complex Tasks with Gen AI Agencies

Learn how to automate complex tasks with Gen AI Agencies in this insightful post. Dive into orchestrating workflows efficiently for enhanced productivity.


In my last article Using LLM's to do work, I showed you how we go beyond using large language models (LLM's) to simply solve first-order use cases, and use them to do real work. The scenario I presented used a single large language model prompt to extract fields from a structured document like an invoice or bank statement. But most of the tasks we perform as humans are much more complicated than that. In fact, most our simplest day-to-day tasks are not single tasks at all, but a carefully choreographed collection of tasks, which we call a workflow.

As head of the AI practice at BP3, one of my day-to-day workflows is setting up new AI projects in our project management system. Prior to setting up a new project, I first need to retrieve various pieces of information from the contracted statement work between BP3 and our client. Typically, I'll need to know information like:

  • The effective date of the contract
  • Whether it's fixed-price or time and materials (T&M)
  • The total price of the contract
  • Resource hourly rates (for T&M)
  • Deliverables and milestones

 

Unlike bank statements and invoices which are structured documents, contracts are unstructured, which means that extracting information from a contract usually requires reading the whole contract and then pulling out the relevant information from different sections.  LLM's can help with this work, but due to the lack of document structure, and the sheer length of some of these contracts, it's not as simple as our invoice example where we could write a single prompt to extract all the data at one go. I know this because we tried this approach on some of our contracts and it was wildly unsuccessful. Our direct "one and done" attempt failed to extract anything for many of the fields and hallucinated badly on others. For instance, our contracts tend to have lots of different dates in them like the submission date, signature dates and even the dates we use as part of our version control. So, when we asked the LLM for the effective date of the contract, without first directing the model to a particular section in the document, we'd usually get the most commonly appearing date in the document, which was the version date. It's on every page.

Think about how a contracts administrator would find the effective date on a contract. There's more (or even less) to it than just reading the contract end-to-end. The effective date is usually found in one of the Introductory paragraph sections of the contract. It's either directly stated as a date, or may be triggered by the last signature date. In either case, if you know where to look, it's not hard to find. When a human performs this work, she'll typically skim the document and look for the introductory paragraph section(s), then read more closely to find the data she's looking for. It's the same workflow to look up resource pricing, except for that information is found in a pricing section.

As useful as they are, LLM's are actually pretty dumb. Remember, that these models are trained to do one simple thing, predict the next word (or token) in a phrase. You're going to get pretty crummy results if you can't articulate in your prompt exactly what you want the model to do in all circumstances.  This is challenging if you're only given one prompt to perform an entire multi-step workflow.

"I asked you to kill Superman, and you're telling me you couldn't even do that one simple thing" --Robert Vaughn as Ross Webster in Superman III

But what if we break down our workflow into smaller, more atomic steps, and then use some form of orchestration to tie all those steps together? In the case of our contracts example, we could write one prompt to classify each section of the contract into: Introduction, Pricing, Signature, Deliverables, etc. We could then have other prompts that extract specific data elements. We might even write different prompts to read information from paragraphs vs. tables.

Building an AI that can manage this kind of orchestration is a bit more complex than just invoking the ChatGPT API. To pull all this together, we looked at a few different LLM frameworks to help with the heavy lifting. We considered langchain-ai, which is advertised as "a framework for developing applications powered by language models." But we eventually landed on an open source framework that is being developed by Microsoft called AutoGen. AutoGen is described as "a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks." One of the things we like about AutoGen is that each agent has it's own prompt so it can be configured to perform a single task. In fact, because AutoGen abstracts the LLM's API from the framework, different agents can use different LLM's from different vendors, which provides a lot of flexibility.

Microsoft's AutoGen was a great start for us. But from a workflow orchestration standpoint, they either didn't go far enough in providing an easy-to-use pattern, or they went too far in the generative AI direction by using an LLM agent try to orchestrate the workflow between the agents as a conversation. Although the demo is pretty cool, in reality, using an LLM agent to direct the efforts of other LLM agents very quickly degrades into all the dumb AI's talking in circles.

To solve this problem, we extended the AutoGen framework to create, what we call an "Orchestrated Agency". Each agent within the Agency has it's own prompt and performs its own task, but the overall orchestration of the workflow is managed as an activity flow. Using AutoGen, along with our own extension, we can separate the work of implementing our AI into two types of tasks:

  • Breaking down and modeling the workflow as a UML activity diagram where each activity (or task) can easily be performed by a single LLM agent.
  • Writing and testing the prompts for each LLM agent within the Orchestrated Agency.

 

Modelling our workflow as an activity diagram even allows us to include activities and conditions that are not implemented using generative AI. In the cases where an activity is more efficiently expressed as procedural code, we just write a method in python and include it in the flow.

Now, getting back to our contract understanding example, and armed with our Orchestrated Agency, we can model the workflow for extracting data from our contracts as such. (see above)

Prior to executing the Agency, we pass our contract through an optical character recognition (OCR) engine. OCR extracts all the words and their positions from each page of the document. When we pass the OCR data to our Contract Understanding Agency, we first extract all the contract sections (paragraphs and tables). This activity is just procedural logic written in python that does not require an LLM agent. So, in our activity diagram, we depict it as green.

The agency's orchestration engine then passes the data to the Contract Classification Agent, who reads the first 10 sections of the contract to determine what kind of contract it is. Note, it is usually apparent what type of contract we're looking at by just looking at the first few sections. The Contract Classification Agent (depicted in blue) uses a ChatGPT 3.5 Turbo LLM with the following prompt.

 

I am going to give you some CONTRACT_SECTIONS as a JSON-formatted list. Each element of the list is a dictionary specifying the 0-relative section index and text of a paragraph in a contract.
I'll also give you a list of ALLOWED_CONTRACT_CLASSIFICATIONS, which is a JSON-formatted list of allowed contract types and descriptions of the format:
[
{
"contract_type": "<the name of a contract type>",
"description": "<the description of a contract type>"
},
...
]
Based on the CONTRACT_SECTIONS, please determine the single most likely contract type of the contract from the ALLOWED_CONTRACT_CLASSIFICATIONS.
Your response should only be a JSON structure with a single attribute 'contract_type" from the ALLOWED_CONTRACT_CLASSIFICATIONS. Do not provide any descriptor of the output type or any other explanation.

CONTRACT_SECTIONS:
[
{
"section_index": 0,
"text": "workflow Transformation & Automation Program Statement of Work I April 28, 2021"
},
...
{
"section_index": 4,
"text": " PBJ is pleased to submit this Statement of Work (SOW) for professional services to staff a program to modernize Veus's applications to Camunda. This program includes assisting in the implementation of BPM software from Camunda, as well as building out the processes and automations using the Camunda platform and full-stack development as required. The program may include but is not limited to:"
}
]

ALLOWED_CONTRACT_CLASSIFICATIONS:
[
{
"contract_type": "Statement of Work",
"description": "a formal document that defines the scope, objectives, deliverables, timeline, and responsibilities of both parties in a specific project, ensuring clear understanding and agreement on what services will be performed and the expected outcomes."
},
...
]

 

As you can see, the activity diagram, the "Type" gate (yellow) sends the workflow down different paths depending on the type of contract. This allows us to extract different fields from each contract type. For instance, a Non-disclosure agreement (NDA) won't have resource pricing. For the sake of brevity, we'll only show the extraction activities for the statement of work (SOW) contract.

As I mentioned above, when looking for information in a contract, it's important to know where to look. So, once we've determined that we're looking at an SOW, we'll classify each paragraph and table so we can filter on the different section types when looking up each data element. For the Paragraph Classification Agent, here's the prompt we used.

 

I am going to give you some contiguous CONTRACT_SECTIONS as a JSON-formatted list. Each element of the list is a dictionary specifying the 0-relative section index and text of a section in a contract.
I'll also give you a list of ALLOWED_SECTION_CLASSIFICATIONS, which is a JSON-formatted list of allowed contract section classification name and descriptions of the format:
[
{
"classification_name": "<the classification of a section>",
"description": "<the description of a classification>"
},
]
Finally, I'll give you a list of PREVIOUS_CLASSIFIED_SECTIONS, which is a JSON-formatted list of previously classified contract sections specifying the section index, and the most likely contract section classification for all the sections in the contract prior to the specified CONTRACT_SECTIONS.
For example, if the sections provided in CONTRACT_SECTIONS start with section_index 3, PREVIOUS_CLASSIFIED_SECTIONS might look like this:
[
{"section_index": 0, "section_classification": "Title Page"},
{"section_index": 1, "section_classification": "Title Page"},
{"section_index": 2, "section_classification": "Introduction"}
]
If the CONTRACT_SECTIONS represents the beginning of the contract, PREVIOUS_CLASSIFIED_SECTIONS will be an empty list.
Your job is to evaluate the CONTRACT_SECTIONS and select the most likely contract section classification from the ALLOWED_SECTION_CLASSIFICATIONS for each section supplied in the CONTRACT_SECTIONS.
Your response should only be a valid JSON-formatted list with each element of the list specfying the section index, and the most likely contract section classification.
For example, if the supplied CONTRACT_SECTIONS contains 5 sections and the first two sections were classified as Deliverables, and the last three were classified as Acceptance Criteria section your response would be:
[
{"section_index": 23, "section_classification": "Deliverables"},
{"section_index": 24, "section_classification": "Deliverables"},
{"section_index": 25, "section_classification": "Acceptance Criteria"},
{"section_index": 26, "section_classification": "Acceptance Criteria"},
{"section_index": 27, "section_classification": "Acceptance Criteria"}
]
If the PREVIOUS_CLASSIFIED_SECTIONS is not empty, then the section_classification of the early paragraphs in the CONTRACT_SECTIONS may likely be the same section_classification as the last element of the PREVIOUS_CLASSIFIED_SECTIONS.

CONTRACT_SECTIONS:
[
{
"section_index": 0,
"text": "Process Transformation & Automation Program Statement of Work I April 28, 2021"
},
...
]

ALLOWED_SECTION_CLASSIFICATIONS:
[
{
"classification_name": "Introduction",
"description": "Provides an overview of the contract, including the parties involved, contract effective date and the purpose of the SOW."
},
{
"classification_name": "Scope of Work",
"description": "Defines the specific tasks, activities, and objectives that the service provider will undertake."
},
...
]

PREVIOUS_CLASSIFIED_SECTIONS:
[]

 

In the above prompt, notice the section that describes PREVIOUS_CLASSIFIED_SECTIONS. Dynamically passing this information to the prompt allows us to break this activity into multiple LLM calls while still giving the model context of where the paragraphs reside within the contract. This is important because of the model's input size limitation.

 

I am going to give you a CONTRACT_TABLE from a contract that is formatted as a JSON structure. The table header will be in the attribute "header" which contains a list of strings, one value for each column's header.
The non-header rows will be in a list called "rows", with each row containing a list of strings, one value for each column in the row.
...

 

Since our contract contains both paragraph and table sections, we've also created a Table Classification Agent. It's prompt is similar to the Paragraph version, except for the above section.

 

I am going to give you CONTRACT_PARAGRAPH_SECTIONS which is a list of paragraphs related to a statement work's introduction. Based on all of this information please extract the effective date of the contract.
If the effective date is not contained in the provide information, please return the value null.
Your response should only be a valid json structure of the form:
{"effective_date": <contract effective date>}

 

Now that we know the type of content contained in each paragraph and table, we can use this information to filter the content so that our various extraction agents can more reliably extract their data. For instance, since we know from experience that the contract's effective date is usually found in an introduction paragraph, we can create the Effective Date Extraction Agent using the above prompt.

In the last 18 months, generative AI as made significant contributions to automation. However in practice, the number of automation use cases where single-prompt LLM interactions apply, is quite limited; especially when compared to the number of more complex manual workflows that humans perform on a daily basis. Breaking complex human workflows down into simpler tasks and then orchestrating those tasks using a common workflow-based framework is one way to apply generative AI to many more situations so we can automate more work.

Similar posts

Want to keep up with BP3 news?

Subscribe to our newsletter