Feeds

Google blesses Hadoop with MapReduce patent license

Safety for stuffed elephants

Choosing a cloud hosting partner with confidence

Three months after securing a patent for MapReduce - the distributed number-crunching platform that underpins its world-spanning infrastructure - Google has granted a license to Apache Hadoop, easing infringement concerns hovering over the MapReduce-mimicking open source project.

Apache legal counsel Lawrence Rosen announced the news with an email to the Apache board late last week, and his note was soon posted to a public mailing list.

Rosen did not immediately respond to a request for comment. Nor did Google.

In mid-January, Google won a US patent for "a system and method for efficient large-scale data processing." The patent - which you can see here - describes a means of splitting data-crunching tasks into tiny sub-tasks and mapping them across distributed machines, before reducing the results into one master calculation.

Google uses this MapReduce setup to crunch data across the massive distributed infrastructure buttressing its online services, and though the platform is proprietary, the company published a research paper describing its basic setup in December 2004.

This paper - along with a sister paper describing the company's GFS distributed file system - became the basis for Hadoop. The platform was originally developed by Doug Cutting to back his Nutch open source web crawler, and it was eventually open sourced at Apache. Famously, it's named for a yellow stuffed elephant that belongs to Cutting's son.

When Google won its patent, the general assumption among the Hadoop community was that it posed no threat to the open source project. "Google has lots of patents, and it basically has no track record of using those patents offensively, either involving licensing or pursuing people for infringement," said Mike Olson, chief executive of Cloudera, a company that has commercialized Hadoop in Red Hat-like fashion.

Then he pointed out that Google is a member of the Open Invention Network patent pool, which grants licenses for patented technology in an effort to promote Linux. "All of this convinces us that this is a strategic move from Google and not something that is aimed at the head of any Hadoop adopter or satellite company - Cloudera included."

What's more, Google has long used Hadoop as a way of exposing potential hires to its "Big Data" ways. And even if the company did take legal action, you have to wonder how well its patent would hold up. The map and reduce functions described by Google have been a part of parallel programming for decades.

But now, Mountain View has officially eased fears of legal action. Rosen writes in the email: "Several weeks ago I sought clarification from Google about its recent patent 7,650,331 ["System and method for efficient large-scale data processing"] that may be infringed by implementation of the Apache Hadoop and Apache MapReduce projects. I just received word from Google's general counsel that 'we have granted a license for Hadoop, terms of which are specified in the CLA [contributor licensing agreement].'"

It's unclear what the terms are. But they seem to have passed muster. "I am very pleased to reassure the Apache community about Google's continued generosity and commitment to ASF and open source", Rosen continued.

This is good news not only the likes of Cloudera, but for many of the industry's biggest names as well. The open source Hadoop now underpins everything from Facebook and Yahoo! to, believe it or not, portions of Microsoft Bing. ®

Providing a secure and efficient Helpdesk

More from The Register

next story
Preview redux: Microsoft ships new Windows 10 build with 7,000 changes
Latest bleeding-edge bits borrow Action Center from Windows Phone
Google opens Inbox – email for people too thick to handle email
Print this article out and give it to someone tech-y if you get stuck
Microsoft promises Windows 10 will mean two-factor auth for all
Sneak peek at security features Redmond's baking into new OS
UNIX greybeards threaten Debian fork over systemd plan
'Veteran Unix Admins' fear desktop emphasis is betraying open source
Entity Framework goes 'code first' as Microsoft pulls visual design tool
Visual Studio database diagramming's out the window
Google+ goes TITSUP. But WHO knew? How long? Anyone ... Hello ...
Wobbly Gmail, Contacts, Calendar on the other hand ...
DEATH by PowerPoint: Microsoft warns of 0-day attack hidden in slides
Might put out patch in update, might chuck it out sooner
Redmond top man Satya Nadella: 'Microsoft LOVES Linux'
Open-source 'love' fairly runneth over at cloud event
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.