Feeds

Microsoft's new search - Built on open-source

With Kumo, you get cancer

Build a business case: developing custom apps

When Microsoft purchased Hotmail in December of 1997 for an estimated $400m, it ran on FreeBSD. But Redmond ripped out the open source OS and replaced it with Windows 2000. Or at least, it tried to.

More than a decade on, Microsoft still harbors some sort of deep-seated urge to destroy the free software movement it once compared to cancer. But unmitigated open-source antipathy has given way to a kind of free software schizophrenia. In need of extra licensing dollars, Microsoft may sue a Dutch GPS maker over its use of Linux. But in its ongoing struggle to catch the un-catchable Google, Redmond has no problem reversing its Hotmail-era attitudes.

In July of last year, Microsoft acquired Powerset, a San Francisco startup intent on bringing natural language processing to web search. And like the original Hotmail, the startup's semantic search engine leans heavily on open source code.

Some of the company's core technologies are proprietary, including the XLE ranking algorithms it licenses from the Palo Alto Research Center (PARC). But outside of that core, as Powerset product manager Mark Johnson once put it, the company uses open-source code wherever possible.

Most notably, Powerset generates its search index via Hadoop, the same open-source distributed computing platform that juices Yahoo!'s search engine. Based on Google's MapReduce distributed computing platform and GFS file system, Hadoop was originally developed by open-source maven Doug Cutting, now on the Yahoo! payroll. But it was Powerset that originated Hadoop's HBase project, an effort to mimic Google's famous distributed storage system, BigTable.

When Microsoft acquired the company, Powersetters Michael Stack and Jim Kellerman took a hiatus from their full-time HBase contributions. But by October, Redmond had cleared the pair to resume their open coding. And that's what we'd call giving yourself cancer. "While Microsoft has supported open source in the past," a company mouthpiece tells us, "this is the first time that Microsoft has continued to support open source with an acquired company."

By all accounts, Powerset will drive Microsoft's latest, ill-fated attempt to unseat the Google search monopoly. In March, a Tweet from Powerset co-founder Barney Pell set the blogopshere a-burbling about the impending relaunch of Microsoft Live Search, and days later, screenshots of an internal beta - dubbed Kumo - rose to the surface of the web.

When Kumo launches, in early June, it will be one of the few "shipping" Microsoft products to include open-source code.

In an email to The Reg, Microsoft points out that several other product teams have their hand in free software, including the Windows HPC and System Center teams. But the System Center team has yet to actually ship any open source code, and though the HPC team has, this code was developed inside Microsoft and then offered up to the community.

In recent years, Microsoft has enjoyed hearing itself talk in vague terms about its commitment to open source. "Microsoft believes contribution and co-development are natural progressions of participating in open source communities," the company burbled to us over email. "A variety of Microsoft product teams and business groups are moving towards increasing contribution and co-development. The opportunity is in understanding the rules and practices of the particular project’s community to participate or contribute in a positive way."

But with Kumo, it can't help but go whole-hog. Yes, a search engine can't be confused with a shrink-wrapped application or downloadable software. But remember the Hotmail switcheroo.

Regardless, it's a telling moment when Microsoft contributes to an open-source project with such a high-profile. After years of hostility towards Free Software Foundation (FSF) licensing, Redmond has contributed patches to the ADOdb database abstraction library for PHP, and the company likes to boast that to date, it has initiated more than 300 open-source projects.

But the Apache-licensed Hadoop - with its ability to process epic amounts of data on commodity hardware - underpins not only Yahoo! but Facebook. And it's the bastard child of the Google Chocolate Factory.

Perhaps Microsoft is changing after all. Or perhaps Ballmer's Google chase has reached the point of desperation. ®

Clarification: This story has been clarified to show that Powerset may not technically be the first shipping product to include open source code. As the story now reads, Microsoft HPC has shipped open source code, but this code was developed inside Microsoft and then offered up to the community.

Secure remote control for conventional and virtual desktops

More from The Register

next story
'Stop dissing Google or quit': OK, I quit, says Code Club co-founder
And now a message from our sponsors: 'STFU or else'
Why has the web gone to hell? Market chaos and HUMAN NATURE
Tim Berners-Lee isn't happy, but we should be
Microsoft boots 1,500 dodgy apps from the Windows Store
DEVELOPERS! DEVELOPERS! DEVELOPERS! Naughty, misleading developers!
Mozilla's 'Tiles' ads debut in new Firefox nightlies
You can try turning them off and on again
Apple promises to lift Curse of the Drained iPhone 5 Battery
Have you tried turning it off and...? Never mind, here's a replacement
Uber, Lyft and cutting corners: The true face of the Sharing Economy
Casual labour and tired ideas = not really web-tastic
Linux turns 23 and Linus Torvalds celebrates as only he can
No, not with swearing, but by controlling the release cycle
prev story

Whitepapers

Top 10 endpoint backup mistakes
Avoid the ten endpoint backup mistakes to ensure that your critical corporate data is protected and end user productivity is improved.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Backing up distributed data
Eliminating the redundant use of bandwidth and storage capacity and application consolidation in the modern data center.
The essential guide to IT transformation
ServiceNow discusses three IT transformations that can help CIOs automate IT services to transform IT and the enterprise
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.