Related topics

Microsoft Bing rides open source to semantic search

Powerset on the side

As it turns out, Powerset's open-source-happy semantic talents are only a small part of Bing, Microsoft's freshly-minted decision engine search engine.

Microsoft acquired Powerset last July in a reported $100m deal, and after a conspicuous Tweet from Powerset co-founder Barney Pell, many assumed that the semantic search outfit would play a major role in Redmond's latest attempt to catch the uncatchable Google.

According to a blog post from Scott Prevost, general manager of Microsoft's Powerset division, the division has tweaked Microsoft's primary search engine in certain "subtle" ways. But its main contribution is a secondary engine that searches nothing but Wikipedia. In essence, Microsoft's has taken Powerset's existing Wikitool and latched it to the Bing torso.

"The Powerset division has contributed to Bing in both subtle and more conspicuous ways. While the subtle contributions are important, they are much harder to showcase. This post will focus on how the features that our users have come to love on Powerset.com have evolved and have been integrated into Bing," Prevost says, before detailing Bing's "Reference" tab.

As we reported yesterday, the Reference tab reproduces Wikipedia articles in their entirety. When you search on, say, Albert Einstein, the tab will appear on the left hand side of the page, and if you click on it, you're taken to a reproduction of Einstein's Wikipedia entry (licensed at no cost from the "free encyclopedia anyone can edit").

Yes, Microsoft has solidified Wikipedia's place as the web's number-one source of truthiness.

But from that Reference tab you can also tap into Powerset's semantic Wikisearch, which the company originally unfurled in May of last year, before the Microsoft acquisition. This vertical search engine is designed to accept natural-language queries, such as "Was Einstein married?" - though that's not immediately obvious from Bing's layout. In a video attached to Prevost's blog post, Powerset founder Lorenzo Thione acknowledge that some of Bing's Powerset tools are "a little bit hidden. Over time, we'll definitely work on making it more accessible and visible to users."

In the same video, Senior Program Manager Mark Johnson says that in a few cases, Microsoft has hooked Powerset's natural-language platform into some of Bing's other search verticals, including the "Business" tab. But Thione calls these "pilot tests."

"There are a subset of queries where you use a more natural-language oriented syntax or you ask questions, similar to what Powerset.com used to support, we will get you answers right there on the page and a link back to the Reference vertical," he says.

Despite its limited role in the new search engine, Powerset's Bingification is a Microsoft milestone. Powerset's platform leans heavily on open-source code. Most notably, its search index is generated via Hadoop, the same open-source distributed computing platform that juices Yahoo!'s search engine. Powerset originated Hadoop's HBase project, an effort to mimic Google's famous distributed storage system, BigTable, and two of its employees, Michael Stack and Jim Kellerman, are full-time HBase committers.

According to Sam Ramji, Microsoft's senior director of platform strategy and the man who oversees the company's open source thinking, "This is the first time we have acquired a company with committers to a key open source project who have been able to continue to commit to that project in their old capacity as part of their new role."

And thanks to its integration with Powerset's platform, Bing is one of the few Microsoft "shipping" products to actually incorporate open-source code. Ramji points out that from the early to late 90s, Microsoft's Windows TCP/IP stack included BSD code, and today Windows HPC includes code developed at Microsoft that was then offered up to Argonne National Lab (ANL) for open-sourcing. But since the arrival of Windows Vista, Bing is certainly the most high-profile Microsoft product to go the open-source route.

Ramji calls it part of Microsoft's "strategic shift and cultural change" towards the open-source world. And it's certainly nice to see. But on another level, it's rather amusing that the company that once called Linux a cancer and spent untold millions on Encarta is now resting its search-engine on Hadoop and Wikipedia. ®

Sponsored: 5 critical considerations for enterprise cloud backup