Feeds

Google's MapReduce patent - no threat to stuffed elephants

Hadoop will keep its head

Top three mobile application threats

In mid-January, Google won a patent for MapReduce, the distributed data crunching platform that underpins its globe-spanning online infrastructure. And that means there's at least a question mark hanging over Hadoop, the much-hyped open source platform that helps drive Yahoo!, Facebook, Microsoft's Bing, and an ever-expanding array of other web services and back-end business applications.

Hadoop is based in part on a MapReduce research paper Google published in 2004, about six months after it applied for the patent.

The Mountain View Chocolate Factory doesn't officially comment on specific patents in its portfolio. "Like other responsible, innovative companies, Google files patent applications on a variety of technologies it develops," the company recently told GigaOM, in response to questions about its MapReduce patent. "We feel that our behavior to date has been inline with our corporate values and priorities."

But the general assumption is that Google wouldn't use its patent against Hadoop or any other software that takes a lead from MapReduce, including databaseware from the likes of Aster Data Systems or Teradata. This is certainly the view of Cloudera, the all-star Silicon Valley startup that recently commercialized Hadoop in Red Hat-like fashion.

"I don't speak for Google. But Google has lots of patents, and it has basically has no track record of using those patents offensively, either involving licensing or pursuing people for infringement," Cloudera chief executive Mike Olson tells The Reg, before pointing out that Google is a member in the Open Invention Network, a patent pool that grants use licenses for patented technology in an effort to promote Linux.

"All of this convinces us that this is a strategic move from Google and not something that is aimed at the head of any Hadoop adopter or satellite company - Cloudera included."

Olson adds that Cloudera has "excellent ties" back to the Mountain View search giant and that he and his backers were well ware of Google patent before Cloudera was founded. "We - and our investors - talked about it in detail and at length, and without a qualm, we went ahead and founded the company."

The salient Google link is Cloudera vice president Christophe Bisciglia - the former Google engineer who Mountain View famously dispatched to the University of Washington to teach a course on what it likes to calls Big Data, i.e. net-scale distributed computing. Bisciglia's curriculum actually made use of Hadoop, and he stresses that the open source platform has become an important teaching tool for Google.

"In the past, it took three to six months to get hires up to speed with how to work with [Google] technology," Bisciglia has told The Reg. "But if schools are teaching this as part of the standard undergraduate curriculum, Google saved that three to six months - multiplied by thousands of engineers."

Google hired about half the students who took Bisciglia's first class.

But even if did Google change tact, if it suddenly went on the offensive with that MapoReduce patent, you wonder how successful it would be. As Yahoo! vice president of labs and research Ron Brachman points out, the basic concepts behind MapReduce are far from revolutionary. "To my mind, having grown up as a computer scientist in the 70s and taking courses on what was then though of as parallel processing, there were techniques around that felt very similar to [MapReduce's] type of parallelism," Brachman tells The Reg.

The patent - which you can see here - describes a "system and method for efficient large-scale data processing," and this involves "map" and "reduce" functions that have indeed been a part of parallel programming since Brachman's school days.

In essence, Google's platform "maps" data-crunching tasks across a collection of distributed machines, splitting them into tiny sub-tasks, before "reducing" the results into one master calculation. As the patent abstract puts it, one or more map modules read input data, apply an operation to "produce intermediate data values," and distribute these values "across multiple processors in the parallel processing environment." One or more reduce modules then retrieve the intermediate data and apply a new operation to provide the ultimate output.

In any event, Hadoop mirrors this general setup, as Google described it in a research paper published in December 2004. The platform was originally developed by Nutch founder Doug Cutting, who needed a distributed data crunching platform for his open source web crawler, and after he open sourced it at Apache, the platform - named for his son's yellow stuffed elephant - soon spread to some the web's biggest names.

Yahoo! uses it to generate, among other things, the Yahoo! Search Webmap, which provides the index for its search engine. And it underpins Powerset, the so-called semantic search engine that was purchased by Microsoft and now drives portions of Bing.

Meanwhile, Cloudera is helping to deploy the platform on clusters used by countless other companies, including Rackspace, Netflix, LinkedIn, Samsung, and eHarmony. Rackspace, for one, is using a Hadoop cluster to crunch log data from its hosting infrastructure and serve up reports to support reps. The platform can applied to almost any breed of Big Data - and not so big data.

"We really don't like the term 'Big Data,'" Olson says. "To use Hadoop, you don't need to have petabytes of data. You don't even need terabytes. When customers hear a word like 'Big Data', they think 'It must be a Google thing.' But it's not," says Olson.

It's not - no matter what's on file at the US patent office. And we're quite sure that Google would agree. ®

High performance access to file storage

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database sqares the circle of cheap memory vs speed
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
IT bods: How long does it take YOU to train up on new tech?
I'll leave my arrays to do the hard work, if you don't mind
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
prev story

Whitepapers

Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.