Google's MapReduce patent - no threat to stuffed elephants
Hadoop will keep its head
In mid-January, Google won a patent for MapReduce, the distributed data crunching platform that underpins its globe-spanning online infrastructure. And that means there's at least a question mark hanging over Hadoop, the much-hyped open source platform that helps drive Yahoo!, Facebook, Microsoft's Bing, and an ever-expanding array of other web services and back-end business applications.
Hadoop is based in part on a MapReduce research paper Google published in 2004, about six months after it applied for the patent.
The Mountain View Chocolate Factory doesn't officially comment on specific patents in its portfolio. "Like other responsible, innovative companies, Google files patent applications on a variety of technologies it develops," the company recently told GigaOM, in response to questions about its MapReduce patent. "We feel that our behavior to date has been inline with our corporate values and priorities."
But the general assumption is that Google wouldn't use its patent against Hadoop or any other software that takes a lead from MapReduce, including databaseware from the likes of Aster Data Systems or Teradata. This is certainly the view of Cloudera, the all-star Silicon Valley startup that recently commercialized Hadoop in Red Hat-like fashion.
"I don't speak for Google. But Google has lots of patents, and it has basically has no track record of using those patents offensively, either involving licensing or pursuing people for infringement," Cloudera chief executive Mike Olson tells The Reg, before pointing out that Google is a member in the Open Invention Network, a patent pool that grants use licenses for patented technology in an effort to promote Linux.
"All of this convinces us that this is a strategic move from Google and not something that is aimed at the head of any Hadoop adopter or satellite company - Cloudera included."
Olson adds that Cloudera has "excellent ties" back to the Mountain View search giant and that he and his backers were well ware of Google patent before Cloudera was founded. "We - and our investors - talked about it in detail and at length, and without a qualm, we went ahead and founded the company."
The salient Google link is Cloudera vice president Christophe Bisciglia - the former Google engineer who Mountain View famously dispatched to the University of Washington to teach a course on what it likes to calls Big Data, i.e. net-scale distributed computing. Bisciglia's curriculum actually made use of Hadoop, and he stresses that the open source platform has become an important teaching tool for Google.
"In the past, it took three to six months to get hires up to speed with how to work with [Google] technology," Bisciglia has told The Reg. "But if schools are teaching this as part of the standard undergraduate curriculum, Google saved that three to six months - multiplied by thousands of engineers."
Google hired about half the students who took Bisciglia's first class.
But even if did Google change tact, if it suddenly went on the offensive with that MapoReduce patent, you wonder how successful it would be. As Yahoo! vice president of labs and research Ron Brachman points out, the basic concepts behind MapReduce are far from revolutionary. "To my mind, having grown up as a computer scientist in the 70s and taking courses on what was then though of as parallel processing, there were techniques around that felt very similar to [MapReduce's] type of parallelism," Brachman tells The Reg.
The patent - which you can see here - describes a "system and method for efficient large-scale data processing," and this involves "map" and "reduce" functions that have indeed been a part of parallel programming since Brachman's school days.
In essence, Google's platform "maps" data-crunching tasks across a collection of distributed machines, splitting them into tiny sub-tasks, before "reducing" the results into one master calculation. As the patent abstract puts it, one or more map modules read input data, apply an operation to "produce intermediate data values," and distribute these values "across multiple processors in the parallel processing environment." One or more reduce modules then retrieve the intermediate data and apply a new operation to provide the ultimate output.
In any event, Hadoop mirrors this general setup, as Google described it in a research paper published in December 2004. The platform was originally developed by Nutch founder Doug Cutting, who needed a distributed data crunching platform for his open source web crawler, and after he open sourced it at Apache, the platform - named for his son's yellow stuffed elephant - soon spread to some the web's biggest names.
Yahoo! uses it to generate, among other things, the Yahoo! Search Webmap, which provides the index for its search engine. And it underpins Powerset, the so-called semantic search engine that was purchased by Microsoft and now drives portions of Bing.
Meanwhile, Cloudera is helping to deploy the platform on clusters used by countless other companies, including Rackspace, Netflix, LinkedIn, Samsung, and eHarmony. Rackspace, for one, is using a Hadoop cluster to crunch log data from its hosting infrastructure and serve up reports to support reps. The platform can applied to almost any breed of Big Data - and not so big data.
"We really don't like the term 'Big Data,'" Olson says. "To use Hadoop, you don't need to have petabytes of data. You don't even need terabytes. When customers hear a word like 'Big Data', they think 'It must be a Google thing.' But it's not," says Olson.
It's not - no matter what's on file at the US patent office. And we're quite sure that Google would agree. ®
Software Patents are Nukes - This is a Cold War
Oh dear! The patent system in the US is a joke!
Shame it's not going to change any time soon, when the USPTO is laughing its way to the bank everyday while handing out patents like candy bars to anyone with a mere idea.
Clearly, the USPTO is a government agency intent on making a quick buck, even if it harms the economy and kills the very thing it's supposed to encourage - creative innovation - while causing a massive problem for businesses and the courts to deal with, which has the happy side-effect of keeping a whole bunch of IP lawyers in business who duly return the favour by feeding the input valve with more so-called "patent" applications. It's an economy unto itself.
You've got to love the crass audacity of legitimised corruption in Yankeeland. It doesn't even try to hide itself or divert attention, it's just out there in broad daylight hitting the economy, especially small businesses, with a baseball bat. Software patents aren't assets, they're weapons.. the Patent Office runs around supplying arms to any company with the right cash, perpetuating the insipid "them against us" dog eat dog mentality, then standing by counting its profit and watching the fallout.
Software patents are like nukes, everybody is infringing each others, all the big boys have an arsenal pointing at each other.. Linux & open-source have the coalition led by IBM pooling and sharing their arsenal.. no one can move.. It's the equivalent of a cold war we're in.
Google patends divide and conquer
I'm pretty sure I learned about divide and conquer algorithms in college, 10 years before this patent was filed.
Hadoop is based on a scientific paper released by Google and not specifically on the Google code.
Granted that while Google filed for a patent prior to releasing the paper, once they released the paper, one could argue that Google themselves invalidated their own patent. That is, anyone who writes code based on the paper which Google published would not be in violation of the patent.
IANAL, but when you consider what can and can not be patented, the fact that Google released a scientific document detailing the process, they pretty much waived the rights to enforce the patent.
Interestingly enough I think the roots go back to how to copy a bios without infringing on copyright agreements...