Feeds

Yahoo! defies Facebook with Hadoop SQL dupe

Open-source disharmony in stuffed elephant land

Intelligent flash storage arrays

Hadoop Summit Much to the chagrin of Facebook, Yahoo! is developing its own SQL-like language for Hadoop, the open-source distributed data-crunching platform that's well on its way to conquering the planet.

Facebook has already developed and open-sourced its own Hadoop SQL, known as Hive. But Yahoo! says it needs a Hive alternative that's better suited to moving its own back-end data onto Hadoop.

"We looked at [Hive], and for the type of problems we're solving, it didn't work quite as well for us," Yahoo! senior vice president for engineering, cloud computing, and data infrastructure told The Reg today at the annual Hadoop Summit in Santa Clara, California.

"One of the internal platforms we're moving to Hadoop uses a version of SQL, and in order to make the migration bit easier. It made more sense to build our own [Hadoop SQL]."

Facebook is bemused. "We've tried to convince them to use [Hive]," Facebook engineering manager Ashish Thusoo told us. "I don't know why they're doing this."

Hadoop mimics Google's MapReduce framework, which maps data-crunching tasks across distributed machines, splitting them into tiny sub-tasks, before reducing the results into one master calculation. You can write straight to the framework in Java, but Hive and other languages let you code at a higher level. Less experienced developers can build apps in a fraction of the time - and with a fraction of the code.

Yahoo! has already built and open-sourced a Hadoop language known affectionately as Pig, which sits somewhere between low-level MapReduce code and the much higher level of Hive. Now, it wants to provide its internal developers with a Hive-like option - but not Hive itself.

Hadoop already offers a second SQL-like language, known as CloudBase, but this option never quite made it in the real world. Meanwile, Hive is widely used by Facebook itself and countless other outfits, including ad-obsessed outfits like AdMob and Adknowledge. At Facebook, it's used to crunch data for everything from the site's Google Trends-like Lexicon tool to, yes, its ad placement system.

"We've been doing this for about a year and a half or two years, and we're open source," Facebook software engineer Joydeep Sen Sarma told us. "We're much further out in terms of developing a classic developing environment. They're playing catch-up. It will take them time to support all our functionality that we have built-in."

Basically, Yahoo! is taking SQL and placing it on top of Pig. And on some level, the Facebookers understand why Yahoo! would do so, but in the end, they vote for Hive harmony. "Hive solves exactly the problems they are trying to solve," Sen Sarma said. "And it's being used by large customers of SQL software...This was a brilliant opportunity to collaborate, and we would have embraced it."

Yes, Pig predates Hive. So, in developing its Hadoop SQL, Facebook took its own path as well. But, Thusoo told us, it was trying to do something very different from Pig. For one thing, Pig operates at a lower-level. Plus, when first developed, it couldn't handle scripts written in other languages. Hive was designed from embedded scripts from the beginning.

"Pig is both an imperative and a declarative language," Thusoo told us. "But our philosophy was: If you're going to do declarative, why not use SQL? And why not let people embed scripts in the language of their choice in the SQL?" Since then, Pig has embraced such scripting.

Yahoo! has not open-sourced its Hadoop SQL, but plans to. "It will somewhat suit our needs for migration, but it's relevant to anyone - so folks will have a choice," Yahoo!'s Shugar said.

No, it doesn't have a name. But in classic Hadoop fashion, it will undoubtedly evoke some sort of fauna. Famously, Hadoop is named for a yellow stuffed elephant. ®

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
Preview redux: Microsoft ships new Windows 10 build with 7,000 changes
Latest bleeding-edge bits borrow Action Center from Windows Phone
Google opens Inbox – email for people too thick to handle email
Print this article out and give it to someone tech-y if you get stuck
Microsoft promises Windows 10 will mean two-factor auth for all
Sneak peek at security features Redmond's baking into new OS
UNIX greybeards threaten Debian fork over systemd plan
'Veteran Unix Admins' fear desktop emphasis is betraying open source
Entity Framework goes 'code first' as Microsoft pulls visual design tool
Visual Studio database diagramming's out the window
Google+ goes TITSUP. But WHO knew? How long? Anyone ... Hello ...
Wobbly Gmail, Contacts, Calendar on the other hand ...
DEATH by PowerPoint: Microsoft warns of 0-day attack hidden in slides
Might put out patch in update, might chuck it out sooner
Redmond top man Satya Nadella: 'Microsoft LOVES Linux'
Open-source 'love' fairly runneth over at cloud event
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.