Feeds

Microsoft's new search - Built on open-source

With Kumo, you get cancer

HP ProLiant Gen8: Integrated lifecycle automation

When Microsoft purchased Hotmail in December of 1997 for an estimated $400m, it ran on FreeBSD. But Redmond ripped out the open source OS and replaced it with Windows 2000. Or at least, it tried to.

More than a decade on, Microsoft still harbors some sort of deep-seated urge to destroy the free software movement it once compared to cancer. But unmitigated open-source antipathy has given way to a kind of free software schizophrenia. In need of extra licensing dollars, Microsoft may sue a Dutch GPS maker over its use of Linux. But in its ongoing struggle to catch the un-catchable Google, Redmond has no problem reversing its Hotmail-era attitudes.

In July of last year, Microsoft acquired Powerset, a San Francisco startup intent on bringing natural language processing to web search. And like the original Hotmail, the startup's semantic search engine leans heavily on open source code.

Some of the company's core technologies are proprietary, including the XLE ranking algorithms it licenses from the Palo Alto Research Center (PARC). But outside of that core, as Powerset product manager Mark Johnson once put it, the company uses open-source code wherever possible.

Most notably, Powerset generates its search index via Hadoop, the same open-source distributed computing platform that juices Yahoo!'s search engine. Based on Google's MapReduce distributed computing platform and GFS file system, Hadoop was originally developed by open-source maven Doug Cutting, now on the Yahoo! payroll. But it was Powerset that originated Hadoop's HBase project, an effort to mimic Google's famous distributed storage system, BigTable.

When Microsoft acquired the company, Powersetters Michael Stack and Jim Kellerman took a hiatus from their full-time HBase contributions. But by October, Redmond had cleared the pair to resume their open coding. And that's what we'd call giving yourself cancer. "While Microsoft has supported open source in the past," a company mouthpiece tells us, "this is the first time that Microsoft has continued to support open source with an acquired company."

By all accounts, Powerset will drive Microsoft's latest, ill-fated attempt to unseat the Google search monopoly. In March, a Tweet from Powerset co-founder Barney Pell set the blogopshere a-burbling about the impending relaunch of Microsoft Live Search, and days later, screenshots of an internal beta - dubbed Kumo - rose to the surface of the web.

When Kumo launches, in early June, it will be one of the few "shipping" Microsoft products to include open-source code.

In an email to The Reg, Microsoft points out that several other product teams have their hand in free software, including the Windows HPC and System Center teams. But the System Center team has yet to actually ship any open source code, and though the HPC team has, this code was developed inside Microsoft and then offered up to the community.

In recent years, Microsoft has enjoyed hearing itself talk in vague terms about its commitment to open source. "Microsoft believes contribution and co-development are natural progressions of participating in open source communities," the company burbled to us over email. "A variety of Microsoft product teams and business groups are moving towards increasing contribution and co-development. The opportunity is in understanding the rules and practices of the particular project’s community to participate or contribute in a positive way."

But with Kumo, it can't help but go whole-hog. Yes, a search engine can't be confused with a shrink-wrapped application or downloadable software. But remember the Hotmail switcheroo.

Regardless, it's a telling moment when Microsoft contributes to an open-source project with such a high-profile. After years of hostility towards Free Software Foundation (FSF) licensing, Redmond has contributed patches to the ADOdb database abstraction library for PHP, and the company likes to boast that to date, it has initiated more than 300 open-source projects.

But the Apache-licensed Hadoop - with its ability to process epic amounts of data on commodity hardware - underpins not only Yahoo! but Facebook. And it's the bastard child of the Google Chocolate Factory.

Perhaps Microsoft is changing after all. Or perhaps Ballmer's Google chase has reached the point of desperation. ®

Clarification: This story has been clarified to show that Powerset may not technically be the first shipping product to include open source code. As the story now reads, Microsoft HPC has shipped open source code, but this code was developed inside Microsoft and then offered up to the community.

The Power of One eBook: Top reasons to choose HP BladeSystem

More from The Register

next story
Apple fanbois SCREAM as update BRICKS their Macbook Airs
Ragegasm spills over as firmware upgrade kills machines
HIDDEN packet sniffer spy tech in MILLIONS of iPhones, iPads – expert
Don't panic though – Apple's backdoor is not wide open to all, guru tells us
NO MORE ALL CAPS and other pleasures of Visual Studio 14
Unpicking a packed preview that breaks down ASP.NET
Captain Kirk sets phaser to SLAUGHTER after trying new Facebook app
William Shatner less-than-impressed by Zuck's celebrity-only app
Cheer up, Nokia fans. It can start making mobes again in 18 months
The real winner of the Nokia sale is *drumroll* ... Nokia
Mozilla fixes CRITICAL security holes in Firefox, urges v31 upgrade
Misc memory hazards 'could be exploited' - and guess what, one's a Javascript vuln
EU dons gloves, pokes Google's deals with Android mobe makers
El Reg cops a squint at investigatory letters
Chrome browser has been DRAINING PC batteries for YEARS
Google is only now fixing ancient, energy-sapping bug
prev story

Whitepapers

Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Reducing security risks from open source software
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Consolidation: the foundation for IT and business transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.