Feeds

Microsoft Bing rides open source to semantic search

Powerset on the side

Secure remote control for conventional and virtual desktops

As it turns out, Powerset's open-source-happy semantic talents are only a small part of Bing, Microsoft's freshly-minted decision engine search engine.

Microsoft acquired Powerset last July in a reported $100m deal, and after a conspicuous Tweet from Powerset co-founder Barney Pell, many assumed that the semantic search outfit would play a major role in Redmond's latest attempt to catch the uncatchable Google.

According to a blog post from Scott Prevost, general manager of Microsoft's Powerset division, the division has tweaked Microsoft's primary search engine in certain "subtle" ways. But its main contribution is a secondary engine that searches nothing but Wikipedia. In essence, Microsoft's has taken Powerset's existing Wikitool and latched it to the Bing torso.

"The Powerset division has contributed to Bing in both subtle and more conspicuous ways. While the subtle contributions are important, they are much harder to showcase. This post will focus on how the features that our users have come to love on Powerset.com have evolved and have been integrated into Bing," Prevost says, before detailing Bing's "Reference" tab.

As we reported yesterday, the Reference tab reproduces Wikipedia articles in their entirety. When you search on, say, Albert Einstein, the tab will appear on the left hand side of the page, and if you click on it, you're taken to a reproduction of Einstein's Wikipedia entry (licensed at no cost from the "free encyclopedia anyone can edit").

Yes, Microsoft has solidified Wikipedia's place as the web's number-one source of truthiness.

But from that Reference tab you can also tap into Powerset's semantic Wikisearch, which the company originally unfurled in May of last year, before the Microsoft acquisition. This vertical search engine is designed to accept natural-language queries, such as "Was Einstein married?" - though that's not immediately obvious from Bing's layout. In a video attached to Prevost's blog post, Powerset founder Lorenzo Thione acknowledge that some of Bing's Powerset tools are "a little bit hidden. Over time, we'll definitely work on making it more accessible and visible to users."

In the same video, Senior Program Manager Mark Johnson says that in a few cases, Microsoft has hooked Powerset's natural-language platform into some of Bing's other search verticals, including the "Business" tab. But Thione calls these "pilot tests."

"There are a subset of queries where you use a more natural-language oriented syntax or you ask questions, similar to what Powerset.com used to support, we will get you answers right there on the page and a link back to the Reference vertical," he says.

Despite its limited role in the new search engine, Powerset's Bingification is a Microsoft milestone. Powerset's platform leans heavily on open-source code. Most notably, its search index is generated via Hadoop, the same open-source distributed computing platform that juices Yahoo!'s search engine. Powerset originated Hadoop's HBase project, an effort to mimic Google's famous distributed storage system, BigTable, and two of its employees, Michael Stack and Jim Kellerman, are full-time HBase committers.

According to Sam Ramji, Microsoft's senior director of platform strategy and the man who oversees the company's open source thinking, "This is the first time we have acquired a company with committers to a key open source project who have been able to continue to commit to that project in their old capacity as part of their new role."

And thanks to its integration with Powerset's platform, Bing is one of the few Microsoft "shipping" products to actually incorporate open-source code. Ramji points out that from the early to late 90s, Microsoft's Windows TCP/IP stack included BSD code, and today Windows HPC includes code developed at Microsoft that was then offered up to Argonne National Lab (ANL) for open-sourcing. But since the arrival of Windows Vista, Bing is certainly the most high-profile Microsoft product to go the open-source route.

Ramji calls it part of Microsoft's "strategic shift and cultural change" towards the open-source world. And it's certainly nice to see. But on another level, it's rather amusing that the company that once called Linux a cancer and spent untold millions on Encarta is now resting its search-engine on Hadoop and Wikipedia. ®

Beginner's guide to SSL certificates

More from The Register

next story
Google Glassholes are UNDATEABLE – HP exec
You need an emotional connection, says touchy-feely MD... We can do that
Just don't blame Bono! Apple iTunes music sales PLUMMET
Cupertino revenue hit by cheapo downloads, says report
US court SHUTS DOWN 'scammers posing as Microsoft, Facebook support staff'
Netizens allegedly duped into paying for bogus tech advice
Feds seek potential 'second Snowden' gov doc leaker – report
Hang on, Ed wasn't here when we compiled THIS document
Verizon bankrolls tech news site, bans tech's biggest stories
No agenda here. Just don't ever mention Net neutrality or spying, ok?
NATO declares WAR on Google Glass, mounts attack alongside MPAA
Yes, the National Association of Theater Owners is quite upset
Inside the EYE of the TORnado: From Navy spooks to Silk Road
It's hard enough to peel the onion, are you hard enough to eat the core?
prev story

Whitepapers

Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Top 5 reasons to deploy VMware with Tegile
Data demand and the rise of virtualization is challenging IT teams to deliver storage performance, scalability and capacity that can keep up, while maximizing efficiency.
Mitigating web security risk with SSL certificates
Web-based systems are essential tools for running business processes and delivering services to customers.