Feeds

Microsoft's new search - Built on open-source

With Kumo, you get cancer

Combat fraud and increase customer satisfaction

When Microsoft purchased Hotmail in December of 1997 for an estimated $400m, it ran on FreeBSD. But Redmond ripped out the open source OS and replaced it with Windows 2000. Or at least, it tried to.

More than a decade on, Microsoft still harbors some sort of deep-seated urge to destroy the free software movement it once compared to cancer. But unmitigated open-source antipathy has given way to a kind of free software schizophrenia. In need of extra licensing dollars, Microsoft may sue a Dutch GPS maker over its use of Linux. But in its ongoing struggle to catch the un-catchable Google, Redmond has no problem reversing its Hotmail-era attitudes.

In July of last year, Microsoft acquired Powerset, a San Francisco startup intent on bringing natural language processing to web search. And like the original Hotmail, the startup's semantic search engine leans heavily on open source code.

Some of the company's core technologies are proprietary, including the XLE ranking algorithms it licenses from the Palo Alto Research Center (PARC). But outside of that core, as Powerset product manager Mark Johnson once put it, the company uses open-source code wherever possible.

Most notably, Powerset generates its search index via Hadoop, the same open-source distributed computing platform that juices Yahoo!'s search engine. Based on Google's MapReduce distributed computing platform and GFS file system, Hadoop was originally developed by open-source maven Doug Cutting, now on the Yahoo! payroll. But it was Powerset that originated Hadoop's HBase project, an effort to mimic Google's famous distributed storage system, BigTable.

When Microsoft acquired the company, Powersetters Michael Stack and Jim Kellerman took a hiatus from their full-time HBase contributions. But by October, Redmond had cleared the pair to resume their open coding. And that's what we'd call giving yourself cancer. "While Microsoft has supported open source in the past," a company mouthpiece tells us, "this is the first time that Microsoft has continued to support open source with an acquired company."

By all accounts, Powerset will drive Microsoft's latest, ill-fated attempt to unseat the Google search monopoly. In March, a Tweet from Powerset co-founder Barney Pell set the blogopshere a-burbling about the impending relaunch of Microsoft Live Search, and days later, screenshots of an internal beta - dubbed Kumo - rose to the surface of the web.

When Kumo launches, in early June, it will be one of the few "shipping" Microsoft products to include open-source code.

In an email to The Reg, Microsoft points out that several other product teams have their hand in free software, including the Windows HPC and System Center teams. But the System Center team has yet to actually ship any open source code, and though the HPC team has, this code was developed inside Microsoft and then offered up to the community.

In recent years, Microsoft has enjoyed hearing itself talk in vague terms about its commitment to open source. "Microsoft believes contribution and co-development are natural progressions of participating in open source communities," the company burbled to us over email. "A variety of Microsoft product teams and business groups are moving towards increasing contribution and co-development. The opportunity is in understanding the rules and practices of the particular project’s community to participate or contribute in a positive way."

But with Kumo, it can't help but go whole-hog. Yes, a search engine can't be confused with a shrink-wrapped application or downloadable software. But remember the Hotmail switcheroo.

Regardless, it's a telling moment when Microsoft contributes to an open-source project with such a high-profile. After years of hostility towards Free Software Foundation (FSF) licensing, Redmond has contributed patches to the ADOdb database abstraction library for PHP, and the company likes to boast that to date, it has initiated more than 300 open-source projects.

But the Apache-licensed Hadoop - with its ability to process epic amounts of data on commodity hardware - underpins not only Yahoo! but Facebook. And it's the bastard child of the Google Chocolate Factory.

Perhaps Microsoft is changing after all. Or perhaps Ballmer's Google chase has reached the point of desperation. ®

Clarification: This story has been clarified to show that Powerset may not technically be the first shipping product to include open source code. As the story now reads, Microsoft HPC has shipped open source code, but this code was developed inside Microsoft and then offered up to the community.

SANS - Survey on application security programs

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Oh no, Joe: WinPhone users already griping over 8.1 mega-update
Hang on. Which bit of Developer Preview don't you understand?
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
IRS boss on XP migration: 'Classic fix the airplane while you're flying it attempt'
Plus: Condoleezza Rice at Dropbox 'maybe she can find ... weapons of mass destruction'
Ditch the sync, paddle in the Streem: Upstart offers syncless sharing
Upload, delete and carry on sharing afterwards?
New Facebook phone app allows you to stalk your mates
Nearby Friends feature goes live in a few weeks
Microsoft TIER SMEAR changes app prices whether devs ask or not
Some go up, some go down, Redmond goes silent
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.