Process First, then Data

  • June 5, 2014
  • Scott

Every once in a while you read something that reveals that someone else in your market space just has a very different view of how it works.  I like to keep up with what other BPM software and services vendors publish in their blogs, and the other day ran across this post on Appian’s blog (they’re emphasis, not mine):

Here is a recap of the topics discussed during the final conference discussion, as well as some of the responses from panel members:

Start with data first, then build process models:

“Within short cycles of development you must start with the data,” said Deshpande. “From there, you can build, but you must start with the data.”

“Without the data in place, what are the processes you can put into place? The key is to first start with the data and analyze, then working your way back to what you want to build,” said Richard.

It is rare that I see something published in BPM circles that I am completely in 100% disagreement with.  This is one of of those times.  This is just wrong, and reflects old-school waterfall thinking. I’ve seen so many BPM projects either fail or get mired down because the BPM practitioners thought they needed to define all the data before they build any processes.  So what do you think happens when you start data-first?

  • You have, what we call in Lean, overproduction.  You definite and establish lots of data that isn’t needed.
  • You miss data elements that you do need – because you’re not starting from the requirements of the process… when you start defining the process you’ll find that it requires data that isn’t in your definition.
  • Schema is more rigid than process definition.  You build schema, and services against that schema, and queries … and pretty soon you can’t change the schema without breaking the things that depend on it.  This isn’t about how hard is it to change a database definition, it is about the change to the definition’s downstream effects on everything else build on top.
  • A data-first approach completely ignores process and how people interact with it. If you’re building a data maintenance application of course, that’s not a problem. But if you’re building a process, it is. Because the process, and how people interact with it will shape the way you want to interact with and store the data that supports the process.

You don’t start with the data.  You start with the people and the process. And you use the business and technical requirements you discover along the way to define your data.  If the data isn’t ready at hand, you find out where you can enter it, or from where you can retrieve it, and you tackle it accordingly.  But you can’t let data define your limitations and your scope.

Those who’ve worked with Lombardi in the mid-2000’s or BP3 since 2007 will recognize the echos of our approach in this post.  There’s a reason our approach is more productive.    The bold-face tagline should have read:

Start with business process first, then build data definitions/integrations.

Seems like such a small distinction, but it will make a huge difference in your business process outcomes.


Related Posts
  • July 16, 2018
  • Ariana

Driven 2018 is coming up quick and we wanted to share some of our most anticipated sessions with you. You can ...

  • July 15, 2018
  • Scott

From nearly the first year I began writing this blog on behalf of BP3, pundits and commentators have predicted...

  • July 9, 2018
  • Ariana

We are excited to announce Automation Anywhere will be sponsoring Driven 2018. Automation Anywhere will be...

  • Emiel Kelly

    Hey Scott,

    Not so sure I agree with you, but might be because I look different at terms like data and process. A process is not just the workflow and the people (the blocks with arrows in a swim lane). It’s also the data , tools, the steering etc needed to make the process perform.

    But we have to be aware that data exists on several levels:

    – Data used in execution to deliver the end product (names, numbers, etc)
    – Data used to manage the cases (time in process, amount of cases)
    – Data used to improve the process (throughput times, complaints)

    In this case we talk about type 1 I think.

    In administrative processes it’s mostly all about data processing. And I agree with you that in many processes too much or the wrong data is collected. But maybe that is because processes are designed from the start. ‘We get a request, let’s gather all the dat we can find, it might be useful in the future’

    So why not start at the end? See for example

    Why worry about the trip if you don’t know the destination?

    So, What is the (data)product that the customer wants? And the process is just the means to deliver that. And you can decide then how to collect or create all this data. In 15 steps by 15 people? In one step? You can then decide also what kind of tools you might need.

    I don’t think you can see workflow, people, data, supporting tools as separate things. They all make the (performing) process.

    Maybe it’s a question of definition, but in my opinion it’s not so strange with the wanted result of a process. And that doesn’t have to be a complete data model or architecture.

    Just, the awareness that a process that delivers useless things is not worth designing.

    • Emiel, you’re far too reasonable 🙂

      Of course, first principles in process design is to define the outputs or more generally, “outcomes”. But that’s not the data design being contemplated in the original article I linked to. They’re talking about designing all the data you will need for your processes before you touch the process design – that it is a precursor for process design – and that’s something I disagree with. That’s like laying the foundation before you decide what kind of house you want to build.

      I equate discovering the process with discovering the requirements of the process (and yes, that starts with understanding the destination). Destination is part of process design (so is origination!) But it is a mistake to think that the destination is only data, right?

      Destination would include all kinds of outcomes – customer success, revenue and margin impact, meeting previously agreed-to SLAs, etc. Data is just one of many things that feeds into good process. If you start with data, you’re going to miss the perspective of all the rest…

      Incidentally, that “well, we might need all kinds of data let’s collect it” approach is exactly what happens when you design the data first… much rarer when you focus on process first. When you do data first, you think “oh, they might collect this, we should design the ultimate data repository for all this stuff”

  • Data always has to have a context of use, thus is the very nature of the notion of Big Data. Big, yes, but so what? What are you going to do with that data? How will it be defined in a given context? where are the boundaries? Without fail, every single BPM project I have ever been on that started with “we need to figure out the data first” (against strong advice) constantly ended up with massive churn and that is no overstatement! Starting with a process context, then understanding the business domain and thus its supporting data, has increased velocity of delivery. Even Appian delivers this way as far as I know in previous discussions with them… so it is a tad confusing as to what the author of this original post was trying to surface.