Feeds

Microsoft's new search - Built on open-source

With Kumo, you get cancer

Secure remote control for conventional and virtual desktops

When Microsoft purchased Hotmail in December of 1997 for an estimated $400m, it ran on FreeBSD. But Redmond ripped out the open source OS and replaced it with Windows 2000. Or at least, it tried to.

More than a decade on, Microsoft still harbors some sort of deep-seated urge to destroy the free software movement it once compared to cancer. But unmitigated open-source antipathy has given way to a kind of free software schizophrenia. In need of extra licensing dollars, Microsoft may sue a Dutch GPS maker over its use of Linux. But in its ongoing struggle to catch the un-catchable Google, Redmond has no problem reversing its Hotmail-era attitudes.

In July of last year, Microsoft acquired Powerset, a San Francisco startup intent on bringing natural language processing to web search. And like the original Hotmail, the startup's semantic search engine leans heavily on open source code.

Some of the company's core technologies are proprietary, including the XLE ranking algorithms it licenses from the Palo Alto Research Center (PARC). But outside of that core, as Powerset product manager Mark Johnson once put it, the company uses open-source code wherever possible.

Most notably, Powerset generates its search index via Hadoop, the same open-source distributed computing platform that juices Yahoo!'s search engine. Based on Google's MapReduce distributed computing platform and GFS file system, Hadoop was originally developed by open-source maven Doug Cutting, now on the Yahoo! payroll. But it was Powerset that originated Hadoop's HBase project, an effort to mimic Google's famous distributed storage system, BigTable.

When Microsoft acquired the company, Powersetters Michael Stack and Jim Kellerman took a hiatus from their full-time HBase contributions. But by October, Redmond had cleared the pair to resume their open coding. And that's what we'd call giving yourself cancer. "While Microsoft has supported open source in the past," a company mouthpiece tells us, "this is the first time that Microsoft has continued to support open source with an acquired company."

By all accounts, Powerset will drive Microsoft's latest, ill-fated attempt to unseat the Google search monopoly. In March, a Tweet from Powerset co-founder Barney Pell set the blogopshere a-burbling about the impending relaunch of Microsoft Live Search, and days later, screenshots of an internal beta - dubbed Kumo - rose to the surface of the web.

When Kumo launches, in early June, it will be one of the few "shipping" Microsoft products to include open-source code.

In an email to The Reg, Microsoft points out that several other product teams have their hand in free software, including the Windows HPC and System Center teams. But the System Center team has yet to actually ship any open source code, and though the HPC team has, this code was developed inside Microsoft and then offered up to the community.

In recent years, Microsoft has enjoyed hearing itself talk in vague terms about its commitment to open source. "Microsoft believes contribution and co-development are natural progressions of participating in open source communities," the company burbled to us over email. "A variety of Microsoft product teams and business groups are moving towards increasing contribution and co-development. The opportunity is in understanding the rules and practices of the particular project’s community to participate or contribute in a positive way."

But with Kumo, it can't help but go whole-hog. Yes, a search engine can't be confused with a shrink-wrapped application or downloadable software. But remember the Hotmail switcheroo.

Regardless, it's a telling moment when Microsoft contributes to an open-source project with such a high-profile. After years of hostility towards Free Software Foundation (FSF) licensing, Redmond has contributed patches to the ADOdb database abstraction library for PHP, and the company likes to boast that to date, it has initiated more than 300 open-source projects.

But the Apache-licensed Hadoop - with its ability to process epic amounts of data on commodity hardware - underpins not only Yahoo! but Facebook. And it's the bastard child of the Google Chocolate Factory.

Perhaps Microsoft is changing after all. Or perhaps Ballmer's Google chase has reached the point of desperation. ®

Clarification: This story has been clarified to show that Powerset may not technically be the first shipping product to include open source code. As the story now reads, Microsoft HPC has shipped open source code, but this code was developed inside Microsoft and then offered up to the community.

Secure remote control for conventional and virtual desktops

More from The Register

next story
Microsoft boots 1,500 dodgy apps from the Windows Store
DEVELOPERS! DEVELOPERS! DEVELOPERS! Naughty, misleading developers!
'Stop dissing Google or quit': OK, I quit, says Code Club co-founder
And now a message from our sponsors: 'STFU or else'
Apple promises to lift Curse of the Drained iPhone 5 Battery
Have you tried turning it off and...? Never mind, here's a replacement
Uber, Lyft and cutting corners: The true face of the Sharing Economy
Casual labour and tired ideas = not really web-tastic
Mozilla's 'Tiles' ads debut in new Firefox nightlies
You can try turning them off and on again
Linux turns 23 and Linus Torvalds celebrates as only he can
No, not with swearing, but by controlling the release cycle
prev story

Whitepapers

5 things you didn’t know about cloud backup
IT departments are embracing cloud backup, but there’s a lot you need to know before choosing a service provider. Learn all the critical things you need to know.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Backing up Big Data
Solving backup challenges and “protect everything from everywhere,” as we move into the era of big data management and the adoption of BYOD.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?