Feeds

Yahoo! defies Facebook with Hadoop SQL dupe

Open-source disharmony in stuffed elephant land

Boost IT visibility and business value

Hadoop Summit Much to the chagrin of Facebook, Yahoo! is developing its own SQL-like language for Hadoop, the open-source distributed data-crunching platform that's well on its way to conquering the planet.

Facebook has already developed and open-sourced its own Hadoop SQL, known as Hive. But Yahoo! says it needs a Hive alternative that's better suited to moving its own back-end data onto Hadoop.

"We looked at [Hive], and for the type of problems we're solving, it didn't work quite as well for us," Yahoo! senior vice president for engineering, cloud computing, and data infrastructure told The Reg today at the annual Hadoop Summit in Santa Clara, California.

"One of the internal platforms we're moving to Hadoop uses a version of SQL, and in order to make the migration bit easier. It made more sense to build our own [Hadoop SQL]."

Facebook is bemused. "We've tried to convince them to use [Hive]," Facebook engineering manager Ashish Thusoo told us. "I don't know why they're doing this."

Hadoop mimics Google's MapReduce framework, which maps data-crunching tasks across distributed machines, splitting them into tiny sub-tasks, before reducing the results into one master calculation. You can write straight to the framework in Java, but Hive and other languages let you code at a higher level. Less experienced developers can build apps in a fraction of the time - and with a fraction of the code.

Yahoo! has already built and open-sourced a Hadoop language known affectionately as Pig, which sits somewhere between low-level MapReduce code and the much higher level of Hive. Now, it wants to provide its internal developers with a Hive-like option - but not Hive itself.

Hadoop already offers a second SQL-like language, known as CloudBase, but this option never quite made it in the real world. Meanwile, Hive is widely used by Facebook itself and countless other outfits, including ad-obsessed outfits like AdMob and Adknowledge. At Facebook, it's used to crunch data for everything from the site's Google Trends-like Lexicon tool to, yes, its ad placement system.

"We've been doing this for about a year and a half or two years, and we're open source," Facebook software engineer Joydeep Sen Sarma told us. "We're much further out in terms of developing a classic developing environment. They're playing catch-up. It will take them time to support all our functionality that we have built-in."

Basically, Yahoo! is taking SQL and placing it on top of Pig. And on some level, the Facebookers understand why Yahoo! would do so, but in the end, they vote for Hive harmony. "Hive solves exactly the problems they are trying to solve," Sen Sarma said. "And it's being used by large customers of SQL software...This was a brilliant opportunity to collaborate, and we would have embraced it."

Yes, Pig predates Hive. So, in developing its Hadoop SQL, Facebook took its own path as well. But, Thusoo told us, it was trying to do something very different from Pig. For one thing, Pig operates at a lower-level. Plus, when first developed, it couldn't handle scripts written in other languages. Hive was designed from embedded scripts from the beginning.

"Pig is both an imperative and a declarative language," Thusoo told us. "But our philosophy was: If you're going to do declarative, why not use SQL? And why not let people embed scripts in the language of their choice in the SQL?" Since then, Pig has embraced such scripting.

Yahoo! has not open-sourced its Hadoop SQL, but plans to. "It will somewhat suit our needs for migration, but it's relevant to anyone - so folks will have a choice," Yahoo!'s Shugar said.

No, it doesn't have a name. But in classic Hadoop fashion, it will undoubtedly evoke some sort of fauna. Famously, Hadoop is named for a yellow stuffed elephant. ®

The essential guide to IT transformation

More from The Register

next story
Munich considers dumping Linux for ... GULP ... Windows!
Give a penguinista a hug, the Outlook's not good for open source's poster child
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Intel's Raspberry Pi rival Galileo can now run Windows
Behold the Internet of Things. Wintel Things
Microsoft cries UNINSTALL in the wake of Blue Screens of Death™
Cache crash causes contained choloric calamity
Eat up Martha! Microsoft slings handwriting recog into OneNote on Android
Freehand input on non-Windows kit for the first time
Time to move away from Windows 7 ... whoa, whoa, who said anything about Windows 8?
Start migrating now to avoid another XPocalypse – Gartner
You'll find Yoda at the back of every IT conference
The piss always taking is he. Bastard the.
prev story

Whitepapers

5 things you didn’t know about cloud backup
IT departments are embracing cloud backup, but there’s a lot you need to know before choosing a service provider. Learn all the critical things you need to know.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.