Feeds

IBM punts commercial Hadoop distro

Big Blue elephant in the data center

Internet Security Threat Report 2014

With so much of its future sales and growth staked on smart infrastructure and the data analytics that enables it, it comes as no surprise that IBM has taken a shining to the open source Hadoop big data crunching software that has found a loving home at the Apache Foundation. Today, IBM announced it has created a commercial version of Hadoop as well as some add-ons and - you guessed it - implementation services to make Hadoop more consumable for the Global 20000.

Not everyone is a Google, where the MapReduce distributed data cruncher and its related file system was created, or even a Yahoo, where Hadoop was nurtured to do what Google does - but in an open source, community-driven fashion. Hadoop is used at Yahoo! and Facebook and Twitter, and it helps drive a portion of Microsoft's Bing search engine. But it is not widely understood in the corporations where IBM does its business.

Bernie Spang, director of product strategy for database software and systems at IBM, says that the company needs Hadoop to complete its data analytics hat trick. IBM has traditional data warehousing and predictive analytics in its InfoSphere, Cognos, and now SPSS products, which can extra data from transactional systems to help companies make better decisions. And it has the "System S" InfoSphere Streams system, which debuted as a prototype a year ago to mash up streaming data from text, video, and audio streams and mix it with databases to create something that is a bit more real-time than a data warehouse, helping governments and companies wade through mountains of data to make decisions (like trade options a hell of a lot faster than most systems can, as the prototype did).

Spang says that IBM needs to offer a product that does the "big data" crunching that the Googles of the world do because its own customers have loads of structured and unstructured data that can be sucked into a Hadoop file system and chewed on using MapReduce for a wider, finer-grained, and more long-term analysis than can be done with a data warehouse or stream system.

And that is why IBM is creating its own distro of Hadoop, which is called InfoSphere BigInsights. Spang called BigInsights an enterprise-ready version of the Apache Hadoop code that IBM will package up and install for customers who want to build their own Hadoop grids. IBM has done about a dozen Hadoop installations to build up experience setting up the code and systems, and now feels it has enough experience to offer commercial support and various services, including the Hadoop software but also services and expertise relating to how Hadoop can be used for risk management and analysis at financial firms or for all kinds of cross-linking in social networking and online entertainment applications. IBM will plan your Hadoop installation for you, set it up, and even monitor it for you. Just get out that checkbook.

IBM could have just done the easy thing and partnered with Cloudera, which back in March 2009 launched a commercialized version of the Hadoop Distributed File System, the MapReduce parallelization and data-crunching algorithm to chew on Webby data, and the Hive client library associated with Hadoop. But Big Data is important enough that IBM feels compelled to offer its own distro.

<o<While IBM is now a competitor to Cloudera, Big Blue says it will participate with the members of the Apache Hadoop community, singling out Cloudera and Karmasphere, which has created a graphical tool for debugging Hadoop apps, by name.

Cloudera welcomes IBM's arrival. "I am excited to see more organizations like IBM get behind the Apache Hadoop project," said Cloudera's Doug Cutting, the man who founded Hadoop. "IBM has been working for some time on Hadoop-related projects for its internal use such as BigSheets and I am looking forward to their investment in the core open source platform development as well.

"At Cloudera we've seen incredible Hadoop uptake in mainstream enterprises which has been reflected in the growth of our own business. I see no end to the number of applications of this new technology. IBM's entry means more open source contributors will help expand the horizons for Hadoop around the world."

The InfoSphere BigInsights distro will have some home-grown IBM software as well, including a technology preview of something called BigSheets that Spang says is basically a spreadsheet front-end running in a Web browser that is used for consolidating and visualizing the chewed data coming out of Hadoop, which can be terabytes or petabytes of Web pages and other kinds of unstructured data.

As an example of how BigSheets can interface with Hadoop, IBM is working with the British Library to archive and preserve 5 TB of Web pages culled from sites with the .co.uk domain. The BigSheets interface will let researchers, academics, and students to chew on this data and search it in more sophisticated ways than is possible using a search engine.

IBM is not divulging its prices for the BigInsights Hadoop distro or what the various installation and support services cost. The BigInsights distro is available today. It is not clear when BigSheets will move from technology preview to production, but you can find out more about the software here. Spang said that IBM has other tools to make Hadoop do more tricks, but it is a fair guess that these will cost more than peanuts. ®

Beginner's guide to SSL certificates

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Oracle hires former SAP exec for cloudy push
'We know Larry said cloud was gibberish, and insane, and idiotic, but...'
Symantec backs out of Backup Exec: Plans to can appliance in Jan
Will still provide support to existing customers
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.