Feeds

Big Data tools cost too much, do too little

SHOCKING REVELATION: Fashionable technology is high maintenance

Top 5 reasons to deploy VMware with Tegile

Strata 2013 Hadoop and NoSQL are the technologies of choice among the web cognoscenti, but one developer and technical author says they are being adopted too enthusiastically by some companies when good 'ol SQL approaches could work just as well.

Ever since a team at Yahoo! did their turn at being prometheus and brought Google-magic down to the rest of us via Hadoop, companies have been on a tear to put the technology into play. But the costs are high, the effort is great, and the advantage it grants you can be slight, Tim O'Brien said in a packed session at the O'Reilly Strata conference in Santa Clara on Wednesday.

"There is a feeling afoot that some of the technologies we've been talking about at a conference like this end up having a huge price tag," he said.

Citing huge human costs (you need to hire expensive in-demand people who know how to use Hadoop), pricey implementation (migrate your data into NoSQL or HDFS without it going wonky) and the possibility of unanticipated problems (you may not fully understand what you are using), O'Brien poured water on the fiery enthusiasm with which it's been adopted by the tech world and its dog.

Big data is a necessity at scale: if you're trying to listen to every transatlantic phonecall, you need to use MapReduce. ... if you need to search the entire internet in milliseconds you need to use MapReduce, if you need to run the largest social network in the world you need to use MapReduce. If you don't you can probably scale with a database.

The way companies have adopted the gamut of "big data" technologies ranging from MongoDB to Hadoop or Impala, means that their own stacks have become difficult to maintain and hard to understand, O'Brien said. "The things I'm being asked to support in production. ... I couldn't even tell you how many databases they use."

For a few large-scale companies, "big data" products are a necessity. For others, they could be useful tools, but for some adopters, the use of these technologies could be "pushing solutions on problems where they may not be appropriate," he said.

If you've got 10TB or less of data upon which you want to run analyses, then you can still get by on Postgres or some other typical system, he said. But if you're expecting to be logging a PB of data then you need to make your way to Hadoop or something else soon. "Don't wait," he said.

Eighty per cent of the market is driven by the tip of the tech pyramid, O'Brien said. "I'm not trying to say a [Hadoop-using] startup out there is doing it wrong, but I have worked on projects where I wish they'd use MySQL because they've only had a gigabyte of data."

Even Google, the progenitor of all of this technology via the vaunted BigTable and GFS academic papers, has itself moved away from the techniques pioneered by NosQL and Hadoop community via its recent "Spanner" database.

Spanner looks much more like a relational, SQL-style database than anything else, and where Google goes the world follows. This is already happening with other companies, such as TransLattice re-implementing Spanner's structure, and getting much interest because of it.

Perhaps NoSQL and Hadoop have led some companies down a blind alley? The Register's database desk had many conversations at Strata on Wednesday during which companies bemoaned the diversity of the "big data" ecosystem and wished for consolidation to make life easier for end-users.

Companies and technologies have proliferated, as have marketing budgets, and perhaps, as O'Brien's talk outlines, this has gone too far and bitten some novice adopters. These technologies may be big, but they're only as clever as the company using them. ®

Beginner's guide to SSL certificates

More from The Register

next story
Ellison: Sparc M7 is Oracle's most important silicon EVER
'Acceleration engines' key to performance, security, Larry says
Oracle SHELLSHOCKER - data titan lists unpatchables
Database kingpin lists 32 products that can't be patched (yet) as GNU fixes second vuln
Lenovo to finish $2.1bn IBM x86 server gobble in October
A lighter snack than expected – but what's a few $100m between friends, eh?
Ello? ello? ello?: Facebook challenger in DDoS KNOCKOUT
Gets back up again after half an hour though
Troll hunter Rackspace turns Rotatable's bizarro patent to stone
News of the Weird: Screen-rotating technology declared unpatentable
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
The next step in data security
With recent increased privacy concerns and computers becoming more powerful, the chance of hackers being able to crack smaller-sized RSA keys increases.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.