Feeds

Yahoo! defies Facebook with Hadoop SQL dupe

Open-source disharmony in stuffed elephant land

Internet Security Threat Report 2014

Hadoop Summit Much to the chagrin of Facebook, Yahoo! is developing its own SQL-like language for Hadoop, the open-source distributed data-crunching platform that's well on its way to conquering the planet.

Facebook has already developed and open-sourced its own Hadoop SQL, known as Hive. But Yahoo! says it needs a Hive alternative that's better suited to moving its own back-end data onto Hadoop.

"We looked at [Hive], and for the type of problems we're solving, it didn't work quite as well for us," Yahoo! senior vice president for engineering, cloud computing, and data infrastructure told The Reg today at the annual Hadoop Summit in Santa Clara, California.

"One of the internal platforms we're moving to Hadoop uses a version of SQL, and in order to make the migration bit easier. It made more sense to build our own [Hadoop SQL]."

Facebook is bemused. "We've tried to convince them to use [Hive]," Facebook engineering manager Ashish Thusoo told us. "I don't know why they're doing this."

Hadoop mimics Google's MapReduce framework, which maps data-crunching tasks across distributed machines, splitting them into tiny sub-tasks, before reducing the results into one master calculation. You can write straight to the framework in Java, but Hive and other languages let you code at a higher level. Less experienced developers can build apps in a fraction of the time - and with a fraction of the code.

Yahoo! has already built and open-sourced a Hadoop language known affectionately as Pig, which sits somewhere between low-level MapReduce code and the much higher level of Hive. Now, it wants to provide its internal developers with a Hive-like option - but not Hive itself.

Hadoop already offers a second SQL-like language, known as CloudBase, but this option never quite made it in the real world. Meanwile, Hive is widely used by Facebook itself and countless other outfits, including ad-obsessed outfits like AdMob and Adknowledge. At Facebook, it's used to crunch data for everything from the site's Google Trends-like Lexicon tool to, yes, its ad placement system.

"We've been doing this for about a year and a half or two years, and we're open source," Facebook software engineer Joydeep Sen Sarma told us. "We're much further out in terms of developing a classic developing environment. They're playing catch-up. It will take them time to support all our functionality that we have built-in."

Basically, Yahoo! is taking SQL and placing it on top of Pig. And on some level, the Facebookers understand why Yahoo! would do so, but in the end, they vote for Hive harmony. "Hive solves exactly the problems they are trying to solve," Sen Sarma said. "And it's being used by large customers of SQL software...This was a brilliant opportunity to collaborate, and we would have embraced it."

Yes, Pig predates Hive. So, in developing its Hadoop SQL, Facebook took its own path as well. But, Thusoo told us, it was trying to do something very different from Pig. For one thing, Pig operates at a lower-level. Plus, when first developed, it couldn't handle scripts written in other languages. Hive was designed from embedded scripts from the beginning.

"Pig is both an imperative and a declarative language," Thusoo told us. "But our philosophy was: If you're going to do declarative, why not use SQL? And why not let people embed scripts in the language of their choice in the SQL?" Since then, Pig has embraced such scripting.

Yahoo! has not open-sourced its Hadoop SQL, but plans to. "It will somewhat suit our needs for migration, but it's relevant to anyone - so folks will have a choice," Yahoo!'s Shugar said.

No, it doesn't have a name. But in classic Hadoop fashion, it will undoubtedly evoke some sort of fauna. Famously, Hadoop is named for a yellow stuffed elephant. ®

Choosing a cloud hosting partner with confidence

More from The Register

next story
Netscape Navigator - the browser that started it all - turns 20
It was 20 years ago today, Marc Andreeesen taught the band to play
Sway: Microsoft's new Office app doesn't have an Undo function
Content aggregation, meet the workplace ... oh
Do Moan! MONSTER 6-day EMAIL OUTAGE hits Domain Monster
Customers freaked out by frightful service
Sign off my IT project or I’ll PHONE your MUM
Honestly, it’s a piece of piss
Return of the Jedi – Apache reclaims web server crown
.london, .hamburg and .公司 - that's .com in Chinese - storm the web server charts
NetWare sales revive in China thanks to that man Snowden
If it ain't Microsoft, it's in fashion behind the Great Firewall
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.