Feeds

Cloudera Hadoop plugs trunk into Netezza iron

Stuffed elephant meets data warehousing

Beginner's guide to SSL certificates

Cloudera – the commercial Hadoop outfit – has teamed with data analytics maven Netezza to build a connector between its stuffed elephant distro and Netezza's Twinfin data warehousing appliances

Due at the end of the third quarter, the connector will allow users to move data from Netezza appliance to the Cloudera Distribution for Hadoop (CDH) – and vice versa.

Based on research papers describing Google’s proprietary infrastructure, the open source Hadoop is a way of crunching massive amounts of data across a network of distributed machines. Named after the yellow stuffed elephant belonging to the son of project founder Doug Cutting, the platform underpins net services offered by everyone from Yahoo! and Facebook and Twitter to Microsoft. Yes, Microsoft.

Meanwhile, Netezza's TwinFin blade servers offer a customized PostgreSQL database. Like other data warehouses, running ad hoc SQL queries against epic data sets.

"One thing we have seen at Cloudera is substantial existing use of Netezza's product in our big enterprise accounts," Cloudera CEO Mike Olson tells The Reg. "And they're looking as Hadoop as a complement to the existing Netezza use. We view ourselves as another piece of the puzzle, solving a different problem: complex data and hard core exhaustive analytics, more exotic algorithms running over complex data at scale."

But Olson also stresses that after its crunched by Hadoop, users will be able to move data back to the Netzza appliance for additional exploration. "Enterprises want to take structured data – customer and transaction data – and combine it will all the unstructured data coming off their websites...that might not fit into a tabular schema well."

Hadoop, for instance, might be used to crunch data relating to user behavior on a website. "What we call Web 2.0 sites have users that move around on their site, post status updates, interact with other individuals. All of that activity is captured in web logs that can't easily be digested using existing relational system. Hadoop can look at all that activity, identify individual users, digest their behavior, and begin to make predictions about behavior," Olson continues.

"But these companies want to combine want these users do with who they are, but that information is often in a system like Netezza's."

"A lot of people assumed that Hadoop was innately competitive with Netazza and other players," says Olson. "But that's not what we're seeing. We're seeing an appetite for this new technology [Hadoop] to solve problems with data – but it has to work well with existing and expanding investments in [data warehousing]."

Last month, Cloudera teamed with Oracle-tools shop Quest Software to build a Hadoop connector for Oracle. It's due in Q3 as well. ®

Security for virtualized datacentres

More from The Register

next story
It's Big, it's Blue... it's simply FABLESS! IBM's chip-free future
Or why the reversal of globalisation ain't gonna 'appen
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
CAGE MATCH: Microsoft, Dell open co-located bit barns in Oz
Whole new species of XaaS spawning in the antipodes
Microsoft and Dell’s cloud in a box: Instant Azure for the data centre
A less painful way to run Microsoft’s private cloud
AWS pulls desktop-as-a-service from the PC
Support for PCoIP protocol means zero clients can run cloudy desktops
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.