Feeds

IBM punts commercial Hadoop distro

Big Blue elephant in the data center

HP ProLiant Gen8: Integrated lifecycle automation

With so much of its future sales and growth staked on smart infrastructure and the data analytics that enables it, it comes as no surprise that IBM has taken a shining to the open source Hadoop big data crunching software that has found a loving home at the Apache Foundation. Today, IBM announced it has created a commercial version of Hadoop as well as some add-ons and - you guessed it - implementation services to make Hadoop more consumable for the Global 20000.

Not everyone is a Google, where the MapReduce distributed data cruncher and its related file system was created, or even a Yahoo, where Hadoop was nurtured to do what Google does - but in an open source, community-driven fashion. Hadoop is used at Yahoo! and Facebook and Twitter, and it helps drive a portion of Microsoft's Bing search engine. But it is not widely understood in the corporations where IBM does its business.

Bernie Spang, director of product strategy for database software and systems at IBM, says that the company needs Hadoop to complete its data analytics hat trick. IBM has traditional data warehousing and predictive analytics in its InfoSphere, Cognos, and now SPSS products, which can extra data from transactional systems to help companies make better decisions. And it has the "System S" InfoSphere Streams system, which debuted as a prototype a year ago to mash up streaming data from text, video, and audio streams and mix it with databases to create something that is a bit more real-time than a data warehouse, helping governments and companies wade through mountains of data to make decisions (like trade options a hell of a lot faster than most systems can, as the prototype did).

Spang says that IBM needs to offer a product that does the "big data" crunching that the Googles of the world do because its own customers have loads of structured and unstructured data that can be sucked into a Hadoop file system and chewed on using MapReduce for a wider, finer-grained, and more long-term analysis than can be done with a data warehouse or stream system.

And that is why IBM is creating its own distro of Hadoop, which is called InfoSphere BigInsights. Spang called BigInsights an enterprise-ready version of the Apache Hadoop code that IBM will package up and install for customers who want to build their own Hadoop grids. IBM has done about a dozen Hadoop installations to build up experience setting up the code and systems, and now feels it has enough experience to offer commercial support and various services, including the Hadoop software but also services and expertise relating to how Hadoop can be used for risk management and analysis at financial firms or for all kinds of cross-linking in social networking and online entertainment applications. IBM will plan your Hadoop installation for you, set it up, and even monitor it for you. Just get out that checkbook.

IBM could have just done the easy thing and partnered with Cloudera, which back in March 2009 launched a commercialized version of the Hadoop Distributed File System, the MapReduce parallelization and data-crunching algorithm to chew on Webby data, and the Hive client library associated with Hadoop. But Big Data is important enough that IBM feels compelled to offer its own distro.

<o<While IBM is now a competitor to Cloudera, Big Blue says it will participate with the members of the Apache Hadoop community, singling out Cloudera and Karmasphere, which has created a graphical tool for debugging Hadoop apps, by name.

Cloudera welcomes IBM's arrival. "I am excited to see more organizations like IBM get behind the Apache Hadoop project," said Cloudera's Doug Cutting, the man who founded Hadoop. "IBM has been working for some time on Hadoop-related projects for its internal use such as BigSheets and I am looking forward to their investment in the core open source platform development as well.

"At Cloudera we've seen incredible Hadoop uptake in mainstream enterprises which has been reflected in the growth of our own business. I see no end to the number of applications of this new technology. IBM's entry means more open source contributors will help expand the horizons for Hadoop around the world."

The InfoSphere BigInsights distro will have some home-grown IBM software as well, including a technology preview of something called BigSheets that Spang says is basically a spreadsheet front-end running in a Web browser that is used for consolidating and visualizing the chewed data coming out of Hadoop, which can be terabytes or petabytes of Web pages and other kinds of unstructured data.

As an example of how BigSheets can interface with Hadoop, IBM is working with the British Library to archive and preserve 5 TB of Web pages culled from sites with the .co.uk domain. The BigSheets interface will let researchers, academics, and students to chew on this data and search it in more sophisticated ways than is possible using a search engine.

IBM is not divulging its prices for the BigInsights Hadoop distro or what the various installation and support services cost. The BigInsights distro is available today. It is not clear when BigSheets will move from technology preview to production, but you can find out more about the software here. Spang said that IBM has other tools to make Hadoop do more tricks, but it is a fair guess that these will cost more than peanuts. ®

Reducing security risks from open source software

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
Amazon Reveals One Weird Trick: A Loss On Almost $20bn In Sales
Investors really hate it: Share price plunge as growth SLOWS in key AWS division
US judge: YES, cops or feds so can slurp an ENTIRE Gmail account
Crooks don't have folders labelled 'drug records', opines NY beak
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
Carbon tax repeal won't see data centre operators cut prices
Rackspace says electricity isn't a major cost, Equinix promises 'no levy'
prev story

Whitepapers

Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Seven Steps to Software Security
Seven practical steps you can begin to take today to secure your applications and prevent the damages a successful cyber-attack can cause.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.