Feeds

Yahoo! breeds Pig that talks elephant

Swine talk of a different kind

Boost IT visibility and business value

If there's one place on earth where swine talk is still met with open arms, it's Yahoo!.

Yahoo! is gradually moving its data-heavy web services onto Hadoop - that Google-inspired open-source platform for crunching epic amounts of information across a sea of distributed machines. And to grease the move, the company has developed its own Hadoop programming language. In typical Hadoop fashion, it's known as Pig.

Hadoop mimics Google's MapReduce framework, which maps data-crunching tasks across distributed machines, splitting them into tiny sub-tasks, before reducing the results into one master calculation. You can write straight to the framework in Java, but Pig aims to put MapReduce coding at a higher level.

"There was a lot of hype around [Hadoop] MapReduce and it gained a lot of traction, probably because it's a very simple low-level model," says Chris Olston, part of the Yahoo! research team that originated Pig. "But at the same time, people were writing higher-level functions over and over again."

Hadoop MapReduce, for instance, has no "join" operation - a staple of data programming - and Pig makes amends.

Hadoop founder Dave Cutting describes Pig as "SQL for MapReduce." But that description might be better applied to Hive, a high-level open-source MapReduce language first developed at Facebook. Pig sits somewhere between Hive and the low-level code of MapReduce.

"Hive is closer to SQL syntax. Pig aims for something that's more of an explicit data flow syntax" Olston tells The Reg. "We wanted to get to something where the common operations like 'join' are built-in - so you just have to write a one-line command to do a 'join' - but at the same time, it retains the explicit data flow aspect of MapReduce. It's in the sweet spot between the two."

In the end, this still puts Hadoop coding in the hands of those who may not be hardcore developers. "You have to be able to write scripts," says Olston. "But you don't have to be a full-fledged programmer."

Pig began life as an Apache Incubator project in the fall of 2007, and in October of 2008 it was accepted as an official Hadoop sub-project. About 30 per cent of all Yahoo! Hadoop jobs are now Pig jobs too, and according to Ajay Anand, director of product management for grid computing at Yahoo!, when new developers join Yahoo!'s Hadoop migration they typically choose to ride the Pig. "It's much easier to get going," he says.

According to Olga Natkovich (PowerPoint), who manages the Pig development team, the typical Pig program is about 1/20th as long as an equivalent MapReduce creation - and requires about 1/16th of the development time.

Doug Cutting - the man behind the Lucene search library and the Nutch web crawler - first developed Hadoop after Google kindly published a pair of research papers on MapReduce and its proprietary Google File System (GFS). He envisioned the project as underpinning for his open-source Nutch webcrawler, but Yahoo! soon took an interest and he's now on the company payroll.

Most notably, Hadoop runs Yahoo! Search Webmap, which provides the world’s second most popular search engine with a database of all known web pages – complete with all the metadata needed to understand them. According to Yahoo! grid guru Eric Baldeschwieler, the app draws its web map 33 per cent faster than the company's previous system.

But Hadoop also underpins various Yahoo! content and advertising services. On the content side, for instance, it now powers the real-time automated algorithms that select news stories for the Yahoo! home page.

Cutting named Hadoop after his son's yellow stuffed elephant, and animal references tend to pop in the names of sub-projects. Thus the Pig. Version 0.2.0 was released last month, and you can download it here. Need a Hadoop installation first? Go here. ®

The Essential Guide to IT Transformation

More from The Register

next story
NO MORE ALL CAPS and other pleasures of Visual Studio 14
Unpicking a packed preview that breaks down ASP.NET
KDE releases ice-cream coloured Plasma 5 just in time for summer
Melty but refreshing - popular rival to Mint's Cinnamon's still a work in progress
Leaked Windows Phone 8.1 Update specs tease details of Nokia's next mobes
New screen sizes, dual SIMs, voice over LTE, and more
Another day, another Firefox: Version 31 is upon us ALREADY
Web devs, Mozilla really wants you to like this one
Put down that Oracle database patch: It could cost $23,000 per CPU
On-by-default INMEMORY tech a boon for developers ... as long as they can afford it
Mozilla keeps its Beard, hopes anti-gay marriage troubles are now over
Plenty on new CEO's todo list – starting with Firefox's slipping grasp
Apple: We'll unleash OS X Yosemite beta on the MASSES on 24 July
Starting today, regular fanbois will be guinea pigs, it tells Reg
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
The Essential Guide to IT Transformation
ServiceNow discusses three IT transformations that can help CIO's automate IT services to transform IT and the enterprise.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.