Death of batch – long live real-time
So why is it taking so long to die?
Remember batch processing? All your vital business reports and reconciliations ran overnight when everyone had gone home; and finished, with a bit of luck, just before everyone arrived back for work in the morning.
Well, it's been clear for 20 years that the day of batch processing is over:
- People work flexibly, 24x7, these days – a "batch slot" isn't available, and if you mix up batch and online processing you can corrupt databases.
- Global ecommerce is where we're all going; and when we stop work here someone is starting work on the far side of the world – and not everyone stops work on Sundays, either.
- Volumes are increasing and the hard processing-time limit implied by the "batch window" will be a pain in the proverbial. At very least, it may get you involved with expensive over-provisioning, to make sure you can cope with month or year-end (implying capacity lying idle. Or, perhaps, you'll have to get involved in the effort required to build a "capacity on demand" grid architecture).
But batch processing is taking an awful long time to die! If you've built a batch-oriented architecture, no matter what the long-term benefits of modernisation, change is expensive and risky in the short-term, and people are ruled by short-term market factors these days.
Batch processing also seems to make sense for data warehousing. You want business intelligence (BI) based on end-of-day or end-of-month consolidated positions, and that sounds like batch processing to populate the warehouse.
But this is misleading. Your BI must be based on appropriate latencies – there is nothing pleasant about trying to deal with an immediate stock crisis or some such if you have to wait until tomorrow until you can get the information you need to make decisions with.
If you build a real-time system, you can easily arrange for end-of-month consolidations if you need them; but getting real-time information out of a batch-oriented system can be tricky. You can do it, computers can do anything, but it is likely to be expensive and unreliable.
Charles Nichols, founder and CEO of SeeWhy software, advises that real-time BI capabilities should be built into the business process at design time – as what is usually called "non functional requirements" (if BI is as useful as it is hyped, these will, in fact, be very "functional" requirements).
This makes a lot of sense to me, even if Charles's enthusiasm carries him away a bit – not all existing BI systems are batch oriented and what he calls his "vision for the new BI 2.0" sounds not dissimilar to Information Builders' "BI for the masses" mantra, which is also very much about BI-enabling operational systems.
Nevertheless, SeeWhy's event driven view of BI is a bit out of the ordinary and if you do want to build real-time BI into your application, SeeWhy's free community edition is available here. So, you can build an event-driven BI-enabled proof-of-concept relatively easily – and cheaply.
Competitors to SeeWhy in the real-time space include Applix. By coincidence, just before I met Charles, Martin Richmond-Coggan, VP EMEA at Applix, told me about a possible use of its TM1 product as, in effect, a real-time cache for operational information feedback, embedded in a business system.
Martin worries about scalability and performance – which TM1 deals with by processing multi-dimensional data in-memory, in (if you need it) a 64 bit address space, perhaps copying the data out to a relational database for persistence in the background (if that's how you want to design it).
So there are several ways to skin this particular cat. But bear in mind four things:
- You should design operational business information feedback into the application architecture from the start, not bolt it on afterwards, especially not after the application has gone live.
- Not all BI applications need real-time information – but there's a spectrum between historical point-in-time information and zero-latency information feedback. It's not black and white.
- Zero latency is an impossible goal anyway, because of (at bottom) speed-of-light limitations. This affects geographically distributed systems (and aren't they all) in practice.
- It is easier to add point-in-time capabilities to a real-time architecture than vice-versa. ®
Sponsored: Hyper-scale data management