Cloudera Hadoop plugs trunk into Netezza iron
Stuffed elephant meets data warehousing
Cloudera – the commercial Hadoop outfit – has teamed with data analytics maven Netezza to build a connector between its stuffed elephant distro and Netezza's Twinfin data warehousing appliances
Due at the end of the third quarter, the connector will allow users to move data from Netezza appliance to the Cloudera Distribution for Hadoop (CDH) – and vice versa.
Based on research papers describing Google’s proprietary infrastructure, the open source Hadoop is a way of crunching massive amounts of data across a network of distributed machines. Named after the yellow stuffed elephant belonging to the son of project founder Doug Cutting, the platform underpins net services offered by everyone from Yahoo! and Facebook and Twitter to Microsoft. Yes, Microsoft.
Meanwhile, Netezza's TwinFin blade servers offer a customized PostgreSQL database. Like other data warehouses, running ad hoc SQL queries against epic data sets.
"One thing we have seen at Cloudera is substantial existing use of Netezza's product in our big enterprise accounts," Cloudera CEO Mike Olson tells The Reg. "And they're looking as Hadoop as a complement to the existing Netezza use. We view ourselves as another piece of the puzzle, solving a different problem: complex data and hard core exhaustive analytics, more exotic algorithms running over complex data at scale."
But Olson also stresses that after its crunched by Hadoop, users will be able to move data back to the Netzza appliance for additional exploration. "Enterprises want to take structured data – customer and transaction data – and combine it will all the unstructured data coming off their websites...that might not fit into a tabular schema well."
Hadoop, for instance, might be used to crunch data relating to user behavior on a website. "What we call Web 2.0 sites have users that move around on their site, post status updates, interact with other individuals. All of that activity is captured in web logs that can't easily be digested using existing relational system. Hadoop can look at all that activity, identify individual users, digest their behavior, and begin to make predictions about behavior," Olson continues.
"But these companies want to combine want these users do with who they are, but that information is often in a system like Netezza's."
"A lot of people assumed that Hadoop was innately competitive with Netazza and other players," says Olson. "But that's not what we're seeing. We're seeing an appetite for this new technology [Hadoop] to solve problems with data – but it has to work well with existing and expanding investments in [data warehousing]."
Last month, Cloudera teamed with Oracle-tools shop Quest Software to build a Hadoop connector for Oracle. It's due in Q3 as well. ®