Feeds

Google's epic graph cruncher mimicked with open source

And then there was GoldenOrb

Combat fraud and increase customer satisfaction

Unlike Facebook or Yahoo!, Google is loath to open source its back-end software. For many, this is a sore point, as the search giant has built its famously distributed infrastructure atop countless open source tools fashioned outside the walls of the Googleplex. But Mountain View does give back in less-direct ways.

In some cases, Google will publish research papers describing one of the proprietary platforms driving its back end, and to a certain degree this allows outside developers to mimic these platforms with open source projects. Google papers on its GFS distributed file system and MapReduce distributed number-crunching platform, for example, gave rise to the open source Hadoop, and a paper on its BigTable distributed database sparked the open source HBase project.

Now, much the same thing has happened with Google Pregel, Mountain View's platform for processing enormous online graphs, such as a map of the web itself, or of a social network, graphing relationships between people. This week, a Texas startup known as Ravel unveiled an open source project based on Google's 2010 paper describing Pregel. Open sourced under an Apache license at GitHub, the project is dubbed GoldenOrb.

Zach Richardson

Zach Richardson

A few years back, while working on a PhD in computational mathematics, Ravel president and GoldenOrb lead architect Zach Richardson helped found a small company that basically helped other businesses processes large amounts of data, including "semantic web" data, which seeks to give machines a better "understanding" of text on the internet. They soon realized that to solve such problems required better tools.

"We were doing consulting at the intersection of the semantic web and Big Data," Richardson tells The Register. "Semantic web data is inherently stored in a graph, and when you get to very large data sizes, a lot of the traditional methodologies for processing or trying to understand that data no longer work. Or they don't scale. Or they take a completely unrealistic amount of time."

As luck would have it, Google published its Pregel paper. Pregel is a computational model that dovetails with Google's existing data-storage technologies, including the Google File System (GFS) and BigTable. In essence, data from GFS or BigTable is shuttled to Pregel, where the data is crunched. Presumably, Google Chubby – the company's distributed lock service – is used to manage access to data.

In the open source world, GFS, BigTable, and Chubby are mirrored by the Hadoop File System (HDFS), HBase, and Zookeeper. Naturally, Richardson and his fellow developers built GoldenOrb atop such open source platforms. Zookeeper handles data synchronization across distributed machines, and Hadoop's remote procedural call (RPC) passes message from node to node. Google's Pregel paper provides a high-level description of the Pregel programming model, but the rest was guesswork.

"Google provides the higher level concepts of when something needs to synchronize, what communications need to happen between servers, and what your programming model needs to look like to use it," Richardson says. "But how close we are to their implementation? It's very hard to guess."

Building an initial GoldenOrb platform took about seven months of "on and off" work. Richardson and his team has not even had a cursory discussion with Google about the platform.

In addition to developing the platform, Ravel will build applications that run atop it. "We're focused on building enterprise products that analyze data," Richardson says. "Graph problems are [almost infinite]. This includes social network analysis, a very popular topic, but the same algorithms might also be used in things like epidemiology research or pharmaceutical research."

According to Richardson, the platform is suited to situations in which you need random access to data while your algorithm is running. MapReduce is designed for batch processing. You take a large chuck on data, break it up into tiny pieces, and spread it across a cluster of machines for processing. GoldenOrb can run algorithms that grab particular pieces of information from distributed machines on the fly. "With MapReduce, if I'm doing a calculation on one machine and I happen to need information on another machines, there's no way to get it," he says. "GoldenOrb can share information across all machines as necessary to solve the problem."

Ravel employees about fifteen people, and according to Richardson, the company has already started building products on the open source platform. But he declined to discuss them. ®

SANS - Survey on application security programs

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Oh no, Joe: WinPhone users already griping over 8.1 mega-update
Hang on. Which bit of Developer Preview don't you understand?
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
IRS boss on XP migration: 'Classic fix the airplane while you're flying it attempt'
Plus: Condoleezza Rice at Dropbox 'maybe she can find ... weapons of mass destruction'
Ditch the sync, paddle in the Streem: Upstart offers syncless sharing
Upload, delete and carry on sharing afterwards?
New Facebook phone app allows you to stalk your mates
Nearby Friends feature goes live in a few weeks
Microsoft TIER SMEAR changes app prices whether devs ask or not
Some go up, some go down, Redmond goes silent
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.