Feeds

Microsoft Bing rides open source to semantic search

Powerset on the side

Build a business case: developing custom apps

As it turns out, Powerset's open-source-happy semantic talents are only a small part of Bing, Microsoft's freshly-minted decision engine search engine.

Microsoft acquired Powerset last July in a reported $100m deal, and after a conspicuous Tweet from Powerset co-founder Barney Pell, many assumed that the semantic search outfit would play a major role in Redmond's latest attempt to catch the uncatchable Google.

According to a blog post from Scott Prevost, general manager of Microsoft's Powerset division, the division has tweaked Microsoft's primary search engine in certain "subtle" ways. But its main contribution is a secondary engine that searches nothing but Wikipedia. In essence, Microsoft's has taken Powerset's existing Wikitool and latched it to the Bing torso.

"The Powerset division has contributed to Bing in both subtle and more conspicuous ways. While the subtle contributions are important, they are much harder to showcase. This post will focus on how the features that our users have come to love on Powerset.com have evolved and have been integrated into Bing," Prevost says, before detailing Bing's "Reference" tab.

As we reported yesterday, the Reference tab reproduces Wikipedia articles in their entirety. When you search on, say, Albert Einstein, the tab will appear on the left hand side of the page, and if you click on it, you're taken to a reproduction of Einstein's Wikipedia entry (licensed at no cost from the "free encyclopedia anyone can edit").

Yes, Microsoft has solidified Wikipedia's place as the web's number-one source of truthiness.

But from that Reference tab you can also tap into Powerset's semantic Wikisearch, which the company originally unfurled in May of last year, before the Microsoft acquisition. This vertical search engine is designed to accept natural-language queries, such as "Was Einstein married?" - though that's not immediately obvious from Bing's layout. In a video attached to Prevost's blog post, Powerset founder Lorenzo Thione acknowledge that some of Bing's Powerset tools are "a little bit hidden. Over time, we'll definitely work on making it more accessible and visible to users."

In the same video, Senior Program Manager Mark Johnson says that in a few cases, Microsoft has hooked Powerset's natural-language platform into some of Bing's other search verticals, including the "Business" tab. But Thione calls these "pilot tests."

"There are a subset of queries where you use a more natural-language oriented syntax or you ask questions, similar to what Powerset.com used to support, we will get you answers right there on the page and a link back to the Reference vertical," he says.

Despite its limited role in the new search engine, Powerset's Bingification is a Microsoft milestone. Powerset's platform leans heavily on open-source code. Most notably, its search index is generated via Hadoop, the same open-source distributed computing platform that juices Yahoo!'s search engine. Powerset originated Hadoop's HBase project, an effort to mimic Google's famous distributed storage system, BigTable, and two of its employees, Michael Stack and Jim Kellerman, are full-time HBase committers.

According to Sam Ramji, Microsoft's senior director of platform strategy and the man who oversees the company's open source thinking, "This is the first time we have acquired a company with committers to a key open source project who have been able to continue to commit to that project in their old capacity as part of their new role."

And thanks to its integration with Powerset's platform, Bing is one of the few Microsoft "shipping" products to actually incorporate open-source code. Ramji points out that from the early to late 90s, Microsoft's Windows TCP/IP stack included BSD code, and today Windows HPC includes code developed at Microsoft that was then offered up to Argonne National Lab (ANL) for open-sourcing. But since the arrival of Windows Vista, Bing is certainly the most high-profile Microsoft product to go the open-source route.

Ramji calls it part of Microsoft's "strategic shift and cultural change" towards the open-source world. And it's certainly nice to see. But on another level, it's rather amusing that the company that once called Linux a cancer and spent untold millions on Encarta is now resting its search-engine on Hadoop and Wikipedia. ®

A new approach to endpoint data protection

More from The Register

next story
Amazon says Hachette should lower ebook prices, pay authors more
Oh yeah ... and a 30% cut for Amazon to seal the deal
Philip K Dick 'Nazi alternate reality' story to be made into TV series
Amazon Studios, Ridley Scott firm to produce The Man in the High Castle
Nintend-OH NO! Sorry, Mario – your profits are in another castle
Red-hatted mascot, red-colored logo, red-stained finance books
Sonos AXES support for Apple's iOS4 and 5
Want to use your iThing? You can't - it's too old
Joe Average isn't worth $10 a year to Mark Zuckerberg
The Social Network deflates the PC resurgence with mobile-only usage prediction
Feel free to BONK on the TUBE, says Transport for London
Plus: Almost NOBODY uses pay-by-bonk on buses - Visa
Twitch rich as Google flicks $1bn hitch switch, claims snitch
Gameplay streaming biz and search king refuse to deny fresh gobble rumors
Stick a 4K in them: Super high-res TVs are DONE
4,000 pixels is niche now... Don't say we didn't warn you
prev story

Whitepapers

7 Elements of Radically Simple OS Migration
Avoid the typical headaches of OS migration during your next project by learning about 7 elements of radically simple OS migration.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Solving today's distributed Big Data backup challenges
Enable IT efficiency and allow a firm to access and reuse corporate information for competitive advantage, ultimately changing business outcomes.
A new approach to endpoint data protection
What is the best way to ensure comprehensive visibility, management, and control of information on both company-owned and employee-owned devices?