Feeds

Yahoo! defies Facebook with Hadoop SQL dupe

Open-source disharmony in stuffed elephant land

High performance access to file storage

Hadoop Summit Much to the chagrin of Facebook, Yahoo! is developing its own SQL-like language for Hadoop, the open-source distributed data-crunching platform that's well on its way to conquering the planet.

Facebook has already developed and open-sourced its own Hadoop SQL, known as Hive. But Yahoo! says it needs a Hive alternative that's better suited to moving its own back-end data onto Hadoop.

"We looked at [Hive], and for the type of problems we're solving, it didn't work quite as well for us," Yahoo! senior vice president for engineering, cloud computing, and data infrastructure told The Reg today at the annual Hadoop Summit in Santa Clara, California.

"One of the internal platforms we're moving to Hadoop uses a version of SQL, and in order to make the migration bit easier. It made more sense to build our own [Hadoop SQL]."

Facebook is bemused. "We've tried to convince them to use [Hive]," Facebook engineering manager Ashish Thusoo told us. "I don't know why they're doing this."

Hadoop mimics Google's MapReduce framework, which maps data-crunching tasks across distributed machines, splitting them into tiny sub-tasks, before reducing the results into one master calculation. You can write straight to the framework in Java, but Hive and other languages let you code at a higher level. Less experienced developers can build apps in a fraction of the time - and with a fraction of the code.

Yahoo! has already built and open-sourced a Hadoop language known affectionately as Pig, which sits somewhere between low-level MapReduce code and the much higher level of Hive. Now, it wants to provide its internal developers with a Hive-like option - but not Hive itself.

Hadoop already offers a second SQL-like language, known as CloudBase, but this option never quite made it in the real world. Meanwile, Hive is widely used by Facebook itself and countless other outfits, including ad-obsessed outfits like AdMob and Adknowledge. At Facebook, it's used to crunch data for everything from the site's Google Trends-like Lexicon tool to, yes, its ad placement system.

"We've been doing this for about a year and a half or two years, and we're open source," Facebook software engineer Joydeep Sen Sarma told us. "We're much further out in terms of developing a classic developing environment. They're playing catch-up. It will take them time to support all our functionality that we have built-in."

Basically, Yahoo! is taking SQL and placing it on top of Pig. And on some level, the Facebookers understand why Yahoo! would do so, but in the end, they vote for Hive harmony. "Hive solves exactly the problems they are trying to solve," Sen Sarma said. "And it's being used by large customers of SQL software...This was a brilliant opportunity to collaborate, and we would have embraced it."

Yes, Pig predates Hive. So, in developing its Hadoop SQL, Facebook took its own path as well. But, Thusoo told us, it was trying to do something very different from Pig. For one thing, Pig operates at a lower-level. Plus, when first developed, it couldn't handle scripts written in other languages. Hive was designed from embedded scripts from the beginning.

"Pig is both an imperative and a declarative language," Thusoo told us. "But our philosophy was: If you're going to do declarative, why not use SQL? And why not let people embed scripts in the language of their choice in the SQL?" Since then, Pig has embraced such scripting.

Yahoo! has not open-sourced its Hadoop SQL, but plans to. "It will somewhat suit our needs for migration, but it's relevant to anyone - so folks will have a choice," Yahoo!'s Shugar said.

No, it doesn't have a name. But in classic Hadoop fashion, it will undoubtedly evoke some sort of fauna. Famously, Hadoop is named for a yellow stuffed elephant. ®

High performance access to file storage

More from The Register

next story
Android engineer: We DIDN'T copy Apple OR follow Samsung's orders
Veep testifies for Samsung during Apple patent trial
Windows 8.1, which you probably haven't upgraded to yet, ALREADY OBSOLETE
Pre-Update versions of new Windows version will no longer support patches
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Batten down the hatches, Ubuntu 14.04 LTS due in TWO DAYS
Admins dab straining server brows in advance of Trusty Tahr's long-term support landing
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Windows XP still has 27 per cent market share on its deathbed
Windows 7 making some gains on XP Death Day
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
Microsoft TIER SMEAR changes app prices whether devs ask or not
Some go up, some go down, Redmond goes silent
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
HP ArcSight ESM solution helps Finansbank
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.