Feeds

Facebook rides Unicorn to graph search nirvana

Data-guzzling index tech unfazed by billions of likes

Internet Security Threat Report 2014

Facebook has given details on "Unicorn," the technology that makes its needle-in-a-haystack query engine Graph Search possible.

The company revealed Unicorn in a post to its Facebook engineering blog on Wednesday.

Unicorn is an inverted index that can theoretically handle queries with "hundreds" of operands (aka, the things wot go into a query), Sriram Sankar, Facebook's lead for search quality and ranking, told The Register.

"The Unicorn infrastructure strength is searching for entities based on attributes about these entities you can get. It is very different from a SQL database," Sankar said. "Unicorn can be used to look up any table, like a hashtable can – but that's not a perfect use of Unicorn. You'd use Unicorn when you have a bunch of keys."

Without Unicorn, Facebook's "screw you, Google & Microsoft" Graph Search tech would be achingly slow.

Anything that can be done in memcache can also be done in Unicorn, Sankar said, though in most use cases memcache would have greater performance.

Unicorn can index "nodes" – anything with a Facebook ID, like a place, person, photo, post – and "edges," which are the actions that can be performed to link these objects, like check-ins, friendships, and tags.

"One way to think of this is if the graph were represented by language, the nodes would be the nouns and the edges would be the verbs. Every user, page, place, photo, post, etc. are nodes in this graph. Edges between nodes represent friendships, check-ins, tags, relationships, ownership, attributes, etc," a trio of Facebook engineers wrote in a blog post on Wednesday.

Typical Facebook searches let you search by the metadata associated with things, like the name of a tagged person. The Unicorn index makes it possible to search across objects and edges, so you can stalk find people nearby who like The Register, for example.

What makes the Unicorn index impressive is it allows Joe Public to query a dataset that consists of billions of different points, and it can do this across multiple nodes without taking a year to spit out the data.

Once a week, Facebook feeds the Unicorn with a giant batch of new information, and the index ingests smaller bits of data throughout the week. 2.5 billion new pieces of content are added to Facebook every day, along with more than 2.7 billion "likes," adding up to tens of billions of new data points and relationships per week.

When Unicorn indexes an object, say a new user with the name Vulture Lohan, it would store Vulture with one id number, Lohan with another, and Vulture Lohan with its own special number.

When a user makes a graph search query, the list of results is winnowed down by static ranking according to relevance at each stage. Most users are likely to make queries that only require a couple of steps, but anything is possible if you can put up with the latency, he said.

"Our infrastructure allows you to do any level of nesting [but] it grows lineally in latency," he said.

The kinds of systems Facebook is developing to deal with its vast kingdom of likes, tedious funny photos, and subtly veiled insults and/or brags posing as status updates, may seem overkill for other companies, but as the amount of data generated by people explodes, these technologies will have more and more relevance.

Big supermarket chains, streaming media companies, and logistics firms would like to get their hands on this kind of tech, we think, and we're fairly sure that Unicorn is the type of index that spooks have either already developed, or are busily trying to replicate.

It's also exactly the type of system that wannabe-social networks will have to build if they want to maintain feature parity with Facebook.

But it could be a while before Zuckerberg releases his Unicorn into the wild: the company has no immediate plans to make the technology open source, Sankar said, though it will publish an academic paper that gives more information on the tech "in due course." ®

Beginner's guide to SSL certificates

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Oracle hires former SAP exec for cloudy push
'We know Larry said cloud was gibberish, and insane, and idiotic, but...'
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.