Feeds

Microsoft's new search - Built on open-source

With Kumo, you get cancer

Internet Security Threat Report 2014

When Microsoft purchased Hotmail in December of 1997 for an estimated $400m, it ran on FreeBSD. But Redmond ripped out the open source OS and replaced it with Windows 2000. Or at least, it tried to.

More than a decade on, Microsoft still harbors some sort of deep-seated urge to destroy the free software movement it once compared to cancer. But unmitigated open-source antipathy has given way to a kind of free software schizophrenia. In need of extra licensing dollars, Microsoft may sue a Dutch GPS maker over its use of Linux. But in its ongoing struggle to catch the un-catchable Google, Redmond has no problem reversing its Hotmail-era attitudes.

In July of last year, Microsoft acquired Powerset, a San Francisco startup intent on bringing natural language processing to web search. And like the original Hotmail, the startup's semantic search engine leans heavily on open source code.

Some of the company's core technologies are proprietary, including the XLE ranking algorithms it licenses from the Palo Alto Research Center (PARC). But outside of that core, as Powerset product manager Mark Johnson once put it, the company uses open-source code wherever possible.

Most notably, Powerset generates its search index via Hadoop, the same open-source distributed computing platform that juices Yahoo!'s search engine. Based on Google's MapReduce distributed computing platform and GFS file system, Hadoop was originally developed by open-source maven Doug Cutting, now on the Yahoo! payroll. But it was Powerset that originated Hadoop's HBase project, an effort to mimic Google's famous distributed storage system, BigTable.

When Microsoft acquired the company, Powersetters Michael Stack and Jim Kellerman took a hiatus from their full-time HBase contributions. But by October, Redmond had cleared the pair to resume their open coding. And that's what we'd call giving yourself cancer. "While Microsoft has supported open source in the past," a company mouthpiece tells us, "this is the first time that Microsoft has continued to support open source with an acquired company."

By all accounts, Powerset will drive Microsoft's latest, ill-fated attempt to unseat the Google search monopoly. In March, a Tweet from Powerset co-founder Barney Pell set the blogopshere a-burbling about the impending relaunch of Microsoft Live Search, and days later, screenshots of an internal beta - dubbed Kumo - rose to the surface of the web.

When Kumo launches, in early June, it will be one of the few "shipping" Microsoft products to include open-source code.

In an email to The Reg, Microsoft points out that several other product teams have their hand in free software, including the Windows HPC and System Center teams. But the System Center team has yet to actually ship any open source code, and though the HPC team has, this code was developed inside Microsoft and then offered up to the community.

In recent years, Microsoft has enjoyed hearing itself talk in vague terms about its commitment to open source. "Microsoft believes contribution and co-development are natural progressions of participating in open source communities," the company burbled to us over email. "A variety of Microsoft product teams and business groups are moving towards increasing contribution and co-development. The opportunity is in understanding the rules and practices of the particular project’s community to participate or contribute in a positive way."

But with Kumo, it can't help but go whole-hog. Yes, a search engine can't be confused with a shrink-wrapped application or downloadable software. But remember the Hotmail switcheroo.

Regardless, it's a telling moment when Microsoft contributes to an open-source project with such a high-profile. After years of hostility towards Free Software Foundation (FSF) licensing, Redmond has contributed patches to the ADOdb database abstraction library for PHP, and the company likes to boast that to date, it has initiated more than 300 open-source projects.

But the Apache-licensed Hadoop - with its ability to process epic amounts of data on commodity hardware - underpins not only Yahoo! but Facebook. And it's the bastard child of the Google Chocolate Factory.

Perhaps Microsoft is changing after all. Or perhaps Ballmer's Google chase has reached the point of desperation. ®

Clarification: This story has been clarified to show that Powerset may not technically be the first shipping product to include open source code. As the story now reads, Microsoft HPC has shipped open source code, but this code was developed inside Microsoft and then offered up to the community.

Choosing a cloud hosting partner with confidence

More from The Register

next story
Preview redux: Microsoft ships new Windows 10 build with 7,000 changes
Latest bleeding-edge bits borrow Action Center from Windows Phone
Google opens Inbox – email for people too thick to handle email
Print this article out and give it to someone tech-y if you get stuck
Microsoft promises Windows 10 will mean two-factor auth for all
Sneak peek at security features Redmond's baking into new OS
FTDI yanks chip-bricking driver from Windows Update, vows to fight on
Next driver to battle fake chips with 'non-invasive' methods
UNIX greybeards threaten Debian fork over systemd plan
'Veteran Unix Admins' fear desktop emphasis is betraying open source
Entity Framework goes 'code first' as Microsoft pulls visual design tool
Visual Studio database diagramming's out the window
Google+ goes TITSUP. But WHO knew? How long? Anyone ... Hello ...
Wobbly Gmail, Contacts, Calendar on the other hand ...
prev story

Whitepapers

Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
New hybrid storage solutions
Tackling data challenges through emerging hybrid storage solutions that enable optimum database performance whilst managing costs and increasingly large data stores.