Number-crunching in the Cloud

Wave BY:BY to old-school data analytics

24th June 2011

Back in the mid-nineties every PC in your organisation potentially contained software that could destroy your company overnight. Not a virus, nor a Trojan: it was called the spreadsheet.

The spreadsheet was – and still is – broken by design. The vital raw data it crunches may be exposed to view, concealed behind the cells or parachuting in from somewhere outside the spreadsheet.

Mixed up in the same visible or invisible cells is the business logic used in its calculations. And unless rigorously locked down, each copy of every spreadsheet in circulation can be individually "improved" by its user, intentionally or by accident.

But the output from the spreadsheet, on the screen for the local worker, or in a paper report sent up to the boardroom, looked really nice, really authoritative. Which, of course, was the problem.

Today's tsunami of data input, and the changing requirement to get output directly to the decision makers in the field, has largely put paid to that kind of spreadsheet. Tom Nolle of Cimi Corp remembers the days when vital decision making depended on "ten thousand spreadsheets within the worker population".

Today we try to pull that all together to get a handle on data integrity. The subject is one of Nolle's specialities. Cimi Corp is a strategic consulting company that assesses trends and tries to build a picture of the future of telecommunications, media, and technology (TMT).

The 'future' of business analytics

In a web-based teleconference under the aegis of Internet Evolution, Nolle maps out the changing landscape of business analytics. "There was a time when business intelligence meant sending the information to the boardroom," he says. "Today it means sending the information to everyone in the company that has to make a decision. And they need the data when they make the decision – not some time in anticipation, and not when it's too late."

What data? There's all the traditional stuff your in-house data warehouses have been collecting for decades – raw material and output figures, customer satisfaction scores, employee churn per region, cost of sales... This data is abstracted, digested and projected in ways that probably have to be evolving rapidly as your business changes. But now there's new data to add to the mix: valuable demographic and other market information coming in over the internet.

Dave Suedkamp, head of everything for IBM's market research services, chips in: "Facebook, Twitter, news feeds and other social media, message boards, forums..." This new inundation from the Cloud can't be ignored by businesses trying to make a buck in the 21st century, he says. To make sense of the world you need to digest it all.

Ishan Sehgal tweets, in and out of his job as program director of software as a service for predictive analytics at SPSS, a company IBM acquired in 2009. "The amount of social data out there is increasing beyond measurement," he tells the web conference. "The overall amount of data currently stored in the world is estimated to exceed one zettabyte." Que? Count the number of grains of sand in a thousand worlds like ours, and there's your zettabyte.

If you're wondering about the weight of IBM input here, it's worth noting the Internet Evolution website is "sponsored" by the company. But there's not too much in the way of marketing hype in this particular conference. Inevitably a fog of abstract jargon hangs over the occasion, but there are some possible insights to be gained.

Big Money from 'Big Data'

A McKinsey study last month claims that this Cloud downpour, "Big Data" in the jargon, presents a huge opportunity for businesses. And as we know, every opportunity is a problem. But one that, as Suedkamp suggests, can be solved – or at least ameliorated – by the Cloud itself.

"Cloud computing is clearly enabling companies to reinvent the way they do business," he claims. "From an IT perspective the Cloud delivers services faster, because of automation and standardisation. It enables integration across Clouds as well as enterprises with pre-built templates. And increases efficiencies with virtualised consolidated scaleable resources". And Clouds can be used to crunch all this "big data".

Clouds come in different flavours, from the purely private Cloud run internally using the enterprise's software on its own hardware, through a variety of third-party-hosted but still private Clouds, all the way across to the public Cloud accessed through the general internet renting its services for commercial use by the hour on a credit card.

Sehgal judges the wholly public Cloud "...appropriate perhaps for some analytic apps. But the downside is security and difficulty of integration with the enterprise's existing services."

IBM currently favours a modified version of the public Cloud, "Shared Cloud Services", essentially a stack of standardised SaaS solutions that are rented to enterprises connected by VPNs over the Internet.

Suedkamp powerpoints this with some Forrester-derived slides where the opex versus capex argument tops the list of SaaS adoption drivers at 72 per cent, with lower overall TCO coming in at 68 per cent and at third place speed of implementation and deployment at 54 per cent.

On-prem may be cheaper

Then by way of balance there's more Forrester data about the possible downsides of SaaS. The percentages here are much smaller, though. Security concerns are number one at 48 per cent, and oddly TCO is in there too, coming third behind the challenges of integrating SaaS into existing applications (39 per cent). Apparently 34 per cent of those surveyed, in Suedkamp's words, "would argue that the total cost of ownership may be higher than an on-prem solution".

Who was being surveyed? These data may come from the Forrester report "SaaS Adoption 2010: Buyers See More Options But Must Balance TCO, Security, And Integration" in which "Forrester recently interviewed more than 1,000 enterprise software decision-makers to find out their investment strategy for 2010". But Suedkamp doesn't say, and it would cost you $500 to be sure of that.

Which may set you thinking. It did me. We're sitting here being talked through slides – how familiar is that? – with precise figures attached to somewhat less precise concepts, derived from reports, refined from summaries of raw data boiled down by machines, somewhere in the Cloud perhaps. It's analytics, being used to make us feel like informed decision-makers. And in this case to sell us analytics.

What do we really know about the provenance of this kind of data? And of the methodology used to crunch it? Dare we hope that what's going on behind the scenes – whether it's Forrester predicting trends, or our own business feeding information to our decision-makers – is far more rigorously audited than very similar processes on much the same machines that only very recently rolled up millions of sub-prime mortgages into slick collateralised debt obligations?

Or are we perhaps entering the age of the zettabyte spreadsheet? ®