Interesting Read on Hadoop Math

  • January 20, 2010
  • Scott
  • 2 Comments

Ok ok, what could possibly be interesting about Hadoop-based systems and Mathematics?

Well, it sounds fancier to say Hadoop-based system… but really this basic math applies to any batch-oriented system and those of us who have been writing batch processing solutions now-and-then for the last 20 years should at least be intuitively aware of the math presented in this article, if not consciously thinking about it at design time.

The key equation:

Runtime = Overhead / (1 – {Time to process one hour of data})

Or, stated differently:

Runtime = Overhead + {Time to process one hour of data} * {Hours of data}

Where hours of data and runtime are equal.  These equations help explain why a perfectly healthy batch processing system can suddenly fall tragically behind – if the time to process an hour’s worth of new information is greater than an hour, you have a problem – and the problem will just keep getting worse until you:

  • Improve the runtime of the algorithm
  • Apply more resources to your server / cluster.
  • Filter the incoming data better (if possible) to improve your signal to noise ratio and thereby eliminate unnecessary data processing

In the BPM and CEP worlds, often that third bullet is a key element to improving performance – it doesn’t require more hardware – it just requires you to move your filter “upstream” from your BPM infrastructure to your EAI infrastructure or your ESB infrastructure… or from your EAI/ESB infrastructure to the source of the noise… Some would say this is squeezing the balloon, moving the bottleneck elsewhere, but actually, filtering better up stream may make those systems more efficient as well (if generating the payload and calling out to a webservice or ESB requires cycles, then not doing it as a result of an efficient filter may well reduce processing cycles… )

At any rate, its a good read.  Andrew Paier, figured you especially would get a kick out of this article, given our experience back in 2003…

,
Related Posts
  • July 20, 2017
  • Scott
  • 0 Comments

Your BPM initiative has produced great ROI over the years, but your executive team is more focused on a hot ne...

  • July 18, 2017
  • Ariana
  • 0 Comments

In this video Andrew Paier explains when you might want to allows users to complete digital process automation...

  • July 13, 2017
  • Scott
  • 0 Comments

Today IBM made its announcement about partnering with Automation Anywhere to create waves in the RPA Space. (Y...