Feeds

Microsoft Bing rides open source to semantic search

Powerset on the side

Security for virtualized datacentres

As it turns out, Powerset's open-source-happy semantic talents are only a small part of Bing, Microsoft's freshly-minted decision engine search engine.

Microsoft acquired Powerset last July in a reported $100m deal, and after a conspicuous Tweet from Powerset co-founder Barney Pell, many assumed that the semantic search outfit would play a major role in Redmond's latest attempt to catch the uncatchable Google.

According to a blog post from Scott Prevost, general manager of Microsoft's Powerset division, the division has tweaked Microsoft's primary search engine in certain "subtle" ways. But its main contribution is a secondary engine that searches nothing but Wikipedia. In essence, Microsoft's has taken Powerset's existing Wikitool and latched it to the Bing torso.

"The Powerset division has contributed to Bing in both subtle and more conspicuous ways. While the subtle contributions are important, they are much harder to showcase. This post will focus on how the features that our users have come to love on Powerset.com have evolved and have been integrated into Bing," Prevost says, before detailing Bing's "Reference" tab.

As we reported yesterday, the Reference tab reproduces Wikipedia articles in their entirety. When you search on, say, Albert Einstein, the tab will appear on the left hand side of the page, and if you click on it, you're taken to a reproduction of Einstein's Wikipedia entry (licensed at no cost from the "free encyclopedia anyone can edit").

Yes, Microsoft has solidified Wikipedia's place as the web's number-one source of truthiness.

But from that Reference tab you can also tap into Powerset's semantic Wikisearch, which the company originally unfurled in May of last year, before the Microsoft acquisition. This vertical search engine is designed to accept natural-language queries, such as "Was Einstein married?" - though that's not immediately obvious from Bing's layout. In a video attached to Prevost's blog post, Powerset founder Lorenzo Thione acknowledge that some of Bing's Powerset tools are "a little bit hidden. Over time, we'll definitely work on making it more accessible and visible to users."

In the same video, Senior Program Manager Mark Johnson says that in a few cases, Microsoft has hooked Powerset's natural-language platform into some of Bing's other search verticals, including the "Business" tab. But Thione calls these "pilot tests."

"There are a subset of queries where you use a more natural-language oriented syntax or you ask questions, similar to what Powerset.com used to support, we will get you answers right there on the page and a link back to the Reference vertical," he says.

Despite its limited role in the new search engine, Powerset's Bingification is a Microsoft milestone. Powerset's platform leans heavily on open-source code. Most notably, its search index is generated via Hadoop, the same open-source distributed computing platform that juices Yahoo!'s search engine. Powerset originated Hadoop's HBase project, an effort to mimic Google's famous distributed storage system, BigTable, and two of its employees, Michael Stack and Jim Kellerman, are full-time HBase committers.

According to Sam Ramji, Microsoft's senior director of platform strategy and the man who oversees the company's open source thinking, "This is the first time we have acquired a company with committers to a key open source project who have been able to continue to commit to that project in their old capacity as part of their new role."

And thanks to its integration with Powerset's platform, Bing is one of the few Microsoft "shipping" products to actually incorporate open-source code. Ramji points out that from the early to late 90s, Microsoft's Windows TCP/IP stack included BSD code, and today Windows HPC includes code developed at Microsoft that was then offered up to Argonne National Lab (ANL) for open-sourcing. But since the arrival of Windows Vista, Bing is certainly the most high-profile Microsoft product to go the open-source route.

Ramji calls it part of Microsoft's "strategic shift and cultural change" towards the open-source world. And it's certainly nice to see. But on another level, it's rather amusing that the company that once called Linux a cancer and spent untold millions on Encarta is now resting its search-engine on Hadoop and Wikipedia. ®

Beginner's guide to SSL certificates

More from The Register

next story
Bono apologises for iTunes album dump
Megalomania, generosity and FEAR of irrelevance drove group to Apple deal
Facebook, Apple: LADIES! Why not FREEZE your EGGS? It's on the company!
No biological clockwatching when you work in Silicon Valley
Doctor Who's Flatline: Cool monsters, yes, but utterly limp subplots
We know what the Doctor does, stop going on about it already
Happiness economics is bollocks. Oh, UK.gov just adopted it? Er ...
Opportunity doesn't knock; it costs us instead
Arab States make play for greater government control of the internet
Nerds told to get lost in last-minute power grab bid at UN meeting
'Cowardly, venomous trolls' threatened with TWO-YEAR sentences for menacing posts
UK government: 'Taking a stand against a baying cyber-mob'
Apple SILENCES Bose, YANKS headphones from stores
The, er, Beats go on after noise-cancelling spat
Zippy one-liners, broken promises: Doctor Who on the Orient Express
Series finally hits stride, but Clara's U-turn is baffling
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.