Posted by:Jyoti Trehan July 9th, 2014

Back in 1960s and 70s, batch jobs running on IBM mainframes were the first manifestation of computerization of manual processes. The initial computers used to be very costly and there was a need to timeshare these in order to be viable. IBM mastered the art of batch processing in the heydays. Utility bills, operations data, bank statements, payroll processing were identified as strong candidates for computerization by way of batch processing.

With the advent of cheaper versions of computers, also known as personal computers, a rethinking got triggered amongst corporations that were routinely footing a fat monthly rental bill for using an IBM mainframe or VAX/ VMS or HP mainframe.  Unix and Windows turned to be good natural fit for various scenarios where the use of mainframes could be avoided.

However, the original business case of processing millions of records in a time span of few minutes and printing the results of processing didn’t go away or get altered much. Hence, mainframes continued to rule the roost whenever massive data processing for the business case needed to be handled.

Then came Google Corporation in the early 2000s that showcased to the world that there are technologies like Big Table that can handle making a copy of all the web content of the world and perform tagging, indexing and analysis of the same. This created a ‘wow’ moment for all the people who were totally sold out to the idea that relational databases and regular file systems are the only ways to store data and living with their limitations is the only way of life. Everything that is built has to have only these technologies as underlying storage mechanism.

There is another facet of bulk processing or batch processing that surfaced when these Big Data frameworks started emerging. Batch processes in olden days usually used to get fed with data produced by data entry operators. However the modern day massive processing systems do not suffer from this limitation. They can capture data directly from card readers, sensors, and digital cameras. They can also perform analysis; calculate aggregates upfront as soon as a message or piece of information streams into the system. This largely eliminates the very need for having an elaborate batch processing architecture in place.

Now what we have today is various flavors of Big Data storage mechanisms that cater to varied batch processing requirements.

  1. We have columnar databases like Hadoop and Cassandra wherein the focus is shifted from rows (filtering or slicing of) of data to columns that can be tagged or stored along with some aggregate information.
  2. There are graph databases that are very suited to situations where links between pieces of information need to be analyzed and leveraged to accomplish a task.
  3. Document databases like Mongo DB have their own place in the Big Data spectrum. As the name suggest, these are suited for storing, indexing and tagging documents.
  4. MPP databases like Netezza, Vertica, Greenplum belong to a category that is geared towards massive parallel processing (MPP). When crunching numbers is more important than the amount of data to be crunched MPP databases can help.
  5. There are frameworks like Storm and Kafka that inherently support real-time processing rather than running batches frequently to keep the numbers updated.

In the light of all these technology advances, it is about time enterprises start thinking out-of-the-box before deciding to throw in another batch process into the mix and thus further complicating the IT infrastructure by introducing unnecessary delays in data processing.

What is your enterprises’ batch processing strategy? Are you having too many nightly jobs to monitor and see through their successful run? Is your budget for keeping the lights on at all times during the night worth a revisit? Are your customers waiting a little too long before their account statements become available. May be it is about time you take another look and consolidate batch jobs and eliminate the ones that became batch essentially because of lack of technology at the time they were built.

Take charge of your enterprise app strategy today! Get started by downloading the eBook below –

Leave a Reply

Your email address will not be published. Required fields are marked *