Feeds

Big Data tools cost too much, do too little

SHOCKING REVELATION: Fashionable technology is high maintenance

Boost IT visibility and business value

Strata 2013 Hadoop and NoSQL are the technologies of choice among the web cognoscenti, but one developer and technical author says they are being adopted too enthusiastically by some companies when good 'ol SQL approaches could work just as well.

Ever since a team at Yahoo! did their turn at being prometheus and brought Google-magic down to the rest of us via Hadoop, companies have been on a tear to put the technology into play. But the costs are high, the effort is great, and the advantage it grants you can be slight, Tim O'Brien said in a packed session at the O'Reilly Strata conference in Santa Clara on Wednesday.

"There is a feeling afoot that some of the technologies we've been talking about at a conference like this end up having a huge price tag," he said.

Citing huge human costs (you need to hire expensive in-demand people who know how to use Hadoop), pricey implementation (migrate your data into NoSQL or HDFS without it going wonky) and the possibility of unanticipated problems (you may not fully understand what you are using), O'Brien poured water on the fiery enthusiasm with which it's been adopted by the tech world and its dog.

Big data is a necessity at scale: if you're trying to listen to every transatlantic phonecall, you need to use MapReduce. ... if you need to search the entire internet in milliseconds you need to use MapReduce, if you need to run the largest social network in the world you need to use MapReduce. If you don't you can probably scale with a database.

The way companies have adopted the gamut of "big data" technologies ranging from MongoDB to Hadoop or Impala, means that their own stacks have become difficult to maintain and hard to understand, O'Brien said. "The things I'm being asked to support in production. ... I couldn't even tell you how many databases they use."

For a few large-scale companies, "big data" products are a necessity. For others, they could be useful tools, but for some adopters, the use of these technologies could be "pushing solutions on problems where they may not be appropriate," he said.

If you've got 10TB or less of data upon which you want to run analyses, then you can still get by on Postgres or some other typical system, he said. But if you're expecting to be logging a PB of data then you need to make your way to Hadoop or something else soon. "Don't wait," he said.

Eighty per cent of the market is driven by the tip of the tech pyramid, O'Brien said. "I'm not trying to say a [Hadoop-using] startup out there is doing it wrong, but I have worked on projects where I wish they'd use MySQL because they've only had a gigabyte of data."

Even Google, the progenitor of all of this technology via the vaunted BigTable and GFS academic papers, has itself moved away from the techniques pioneered by NosQL and Hadoop community via its recent "Spanner" database.

Spanner looks much more like a relational, SQL-style database than anything else, and where Google goes the world follows. This is already happening with other companies, such as TransLattice re-implementing Spanner's structure, and getting much interest because of it.

Perhaps NoSQL and Hadoop have led some companies down a blind alley? The Register's database desk had many conversations at Strata on Wednesday during which companies bemoaned the diversity of the "big data" ecosystem and wished for consolidation to make life easier for end-users.

Companies and technologies have proliferated, as have marketing budgets, and perhaps, as O'Brien's talk outlines, this has gone too far and bitten some novice adopters. These technologies may be big, but they're only as clever as the company using them. ®

The essential guide to IT transformation

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
Docker kicks KVM's butt in IBM tests
Big Blue finds containers are speedy, but may not have much room to improve
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Gartner's Special Report: Should you believe the hype?
Enough hot air to carry a balloon to the Moon
Flash could be CHEAPER than SAS DISK? Come off it, NetApp
Stats analysis reckons we'll hit that point in just three years
Dell The Man shrieks: 'We've got a Bitcoin order, we've got a Bitcoin order'
$50k of PowerEdge servers? That'll be 85 coins in digi-dosh
prev story

Whitepapers

5 things you didn’t know about cloud backup
IT departments are embracing cloud backup, but there’s a lot you need to know before choosing a service provider. Learn all the critical things you need to know.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.