Feeds

IBM punts commercial Hadoop distro

Big Blue elephant in the data center

Designing a Defense for Mobile Applications

With so much of its future sales and growth staked on smart infrastructure and the data analytics that enables it, it comes as no surprise that IBM has taken a shining to the open source Hadoop big data crunching software that has found a loving home at the Apache Foundation. Today, IBM announced it has created a commercial version of Hadoop as well as some add-ons and - you guessed it - implementation services to make Hadoop more consumable for the Global 20000.

Not everyone is a Google, where the MapReduce distributed data cruncher and its related file system was created, or even a Yahoo, where Hadoop was nurtured to do what Google does - but in an open source, community-driven fashion. Hadoop is used at Yahoo! and Facebook and Twitter, and it helps drive a portion of Microsoft's Bing search engine. But it is not widely understood in the corporations where IBM does its business.

Bernie Spang, director of product strategy for database software and systems at IBM, says that the company needs Hadoop to complete its data analytics hat trick. IBM has traditional data warehousing and predictive analytics in its InfoSphere, Cognos, and now SPSS products, which can extra data from transactional systems to help companies make better decisions. And it has the "System S" InfoSphere Streams system, which debuted as a prototype a year ago to mash up streaming data from text, video, and audio streams and mix it with databases to create something that is a bit more real-time than a data warehouse, helping governments and companies wade through mountains of data to make decisions (like trade options a hell of a lot faster than most systems can, as the prototype did).

Spang says that IBM needs to offer a product that does the "big data" crunching that the Googles of the world do because its own customers have loads of structured and unstructured data that can be sucked into a Hadoop file system and chewed on using MapReduce for a wider, finer-grained, and more long-term analysis than can be done with a data warehouse or stream system.

And that is why IBM is creating its own distro of Hadoop, which is called InfoSphere BigInsights. Spang called BigInsights an enterprise-ready version of the Apache Hadoop code that IBM will package up and install for customers who want to build their own Hadoop grids. IBM has done about a dozen Hadoop installations to build up experience setting up the code and systems, and now feels it has enough experience to offer commercial support and various services, including the Hadoop software but also services and expertise relating to how Hadoop can be used for risk management and analysis at financial firms or for all kinds of cross-linking in social networking and online entertainment applications. IBM will plan your Hadoop installation for you, set it up, and even monitor it for you. Just get out that checkbook.

IBM could have just done the easy thing and partnered with Cloudera, which back in March 2009 launched a commercialized version of the Hadoop Distributed File System, the MapReduce parallelization and data-crunching algorithm to chew on Webby data, and the Hive client library associated with Hadoop. But Big Data is important enough that IBM feels compelled to offer its own distro.

<o<While IBM is now a competitor to Cloudera, Big Blue says it will participate with the members of the Apache Hadoop community, singling out Cloudera and Karmasphere, which has created a graphical tool for debugging Hadoop apps, by name.

Cloudera welcomes IBM's arrival. "I am excited to see more organizations like IBM get behind the Apache Hadoop project," said Cloudera's Doug Cutting, the man who founded Hadoop. "IBM has been working for some time on Hadoop-related projects for its internal use such as BigSheets and I am looking forward to their investment in the core open source platform development as well.

"At Cloudera we've seen incredible Hadoop uptake in mainstream enterprises which has been reflected in the growth of our own business. I see no end to the number of applications of this new technology. IBM's entry means more open source contributors will help expand the horizons for Hadoop around the world."

The InfoSphere BigInsights distro will have some home-grown IBM software as well, including a technology preview of something called BigSheets that Spang says is basically a spreadsheet front-end running in a Web browser that is used for consolidating and visualizing the chewed data coming out of Hadoop, which can be terabytes or petabytes of Web pages and other kinds of unstructured data.

As an example of how BigSheets can interface with Hadoop, IBM is working with the British Library to archive and preserve 5 TB of Web pages culled from sites with the .co.uk domain. The BigSheets interface will let researchers, academics, and students to chew on this data and search it in more sophisticated ways than is possible using a search engine.

IBM is not divulging its prices for the BigInsights Hadoop distro or what the various installation and support services cost. The BigInsights distro is available today. It is not clear when BigSheets will move from technology preview to production, but you can find out more about the software here. Spang said that IBM has other tools to make Hadoop do more tricks, but it is a fair guess that these will cost more than peanuts. ®

The Power of One eBook: Top reasons to choose HP BladeSystem

More from The Register

next story
Apple fanbois SCREAM as update BRICKS their Macbook Airs
Ragegasm spills over as firmware upgrade kills machines
Attack of the clones: Oracle's latest Red Hat Linux lookalike arrives
Oracle's Linux boss says Larry's Linux isn't just for Oracle apps anymore
THUD! WD plonks down SIX TERABYTE 'consumer NAS' fatboy
Now that's a LOT of porn or pirated movies. Or, you know, other consumer stuff
EU's top data cops to meet Google, Microsoft et al over 'right to be forgotten'
Plan to hammer out 'coherent' guidelines. Good luck chaps!
US judge: YES, cops or feds so can slurp an ENTIRE Gmail account
Crooks don't have folders labelled 'drug records', opines NY beak
Manic malware Mayhem spreads through Linux, FreeBSD web servers
And how Google could cripple infection rate in a second
FLAPE – the next BIG THING in storage
Find cold data with flash, transmit it from tape
prev story

Whitepapers

Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Reducing security risks from open source software
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Consolidation: the foundation for IT and business transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.