Google's epic graph cruncher mimicked with open source

And then there was GoldenOrb

The Power of One eBook: Top reasons to choose HP BladeSystem

Unlike Facebook or Yahoo!, Google is loath to open source its back-end software. For many, this is a sore point, as the search giant has built its famously distributed infrastructure atop countless open source tools fashioned outside the walls of the Googleplex. But Mountain View does give back in less-direct ways.

In some cases, Google will publish research papers describing one of the proprietary platforms driving its back end, and to a certain degree this allows outside developers to mimic these platforms with open source projects. Google papers on its GFS distributed file system and MapReduce distributed number-crunching platform, for example, gave rise to the open source Hadoop, and a paper on its BigTable distributed database sparked the open source HBase project.

Now, much the same thing has happened with Google Pregel, Mountain View's platform for processing enormous online graphs, such as a map of the web itself, or of a social network, graphing relationships between people. This week, a Texas startup known as Ravel unveiled an open source project based on Google's 2010 paper describing Pregel. Open sourced under an Apache license at GitHub, the project is dubbed GoldenOrb.

Zach Richardson

Zach Richardson

A few years back, while working on a PhD in computational mathematics, Ravel president and GoldenOrb lead architect Zach Richardson helped found a small company that basically helped other businesses processes large amounts of data, including "semantic web" data, which seeks to give machines a better "understanding" of text on the internet. They soon realized that to solve such problems required better tools.

"We were doing consulting at the intersection of the semantic web and Big Data," Richardson tells The Register. "Semantic web data is inherently stored in a graph, and when you get to very large data sizes, a lot of the traditional methodologies for processing or trying to understand that data no longer work. Or they don't scale. Or they take a completely unrealistic amount of time."

As luck would have it, Google published its Pregel paper. Pregel is a computational model that dovetails with Google's existing data-storage technologies, including the Google File System (GFS) and BigTable. In essence, data from GFS or BigTable is shuttled to Pregel, where the data is crunched. Presumably, Google Chubby – the company's distributed lock service – is used to manage access to data.

In the open source world, GFS, BigTable, and Chubby are mirrored by the Hadoop File System (HDFS), HBase, and Zookeeper. Naturally, Richardson and his fellow developers built GoldenOrb atop such open source platforms. Zookeeper handles data synchronization across distributed machines, and Hadoop's remote procedural call (RPC) passes message from node to node. Google's Pregel paper provides a high-level description of the Pregel programming model, but the rest was guesswork.

"Google provides the higher level concepts of when something needs to synchronize, what communications need to happen between servers, and what your programming model needs to look like to use it," Richardson says. "But how close we are to their implementation? It's very hard to guess."

Building an initial GoldenOrb platform took about seven months of "on and off" work. Richardson and his team has not even had a cursory discussion with Google about the platform.

In addition to developing the platform, Ravel will build applications that run atop it. "We're focused on building enterprise products that analyze data," Richardson says. "Graph problems are [almost infinite]. This includes social network analysis, a very popular topic, but the same algorithms might also be used in things like epidemiology research or pharmaceutical research."

According to Richardson, the platform is suited to situations in which you need random access to data while your algorithm is running. MapReduce is designed for batch processing. You take a large chuck on data, break it up into tiny pieces, and spread it across a cluster of machines for processing. GoldenOrb can run algorithms that grab particular pieces of information from distributed machines on the fly. "With MapReduce, if I'm doing a calculation on one machine and I happen to need information on another machines, there's no way to get it," he says. "GoldenOrb can share information across all machines as necessary to solve the problem."

Ravel employees about fifteen people, and according to Richardson, the company has already started building products on the open source platform. But he declined to discuss them. ®

Reducing security risks from open source software

More from The Register

next story
NO MORE ALL CAPS and other pleasures of Visual Studio 14
Unpicking a packed preview that breaks down ASP.NET
Cheer up, Nokia fans. It can start making mobes again in 18 months
The real winner of the Nokia sale is *drumroll* ... Nokia
Mozilla fixes CRITICAL security holes in Firefox, urges v31 upgrade
Misc memory hazards 'could be exploited' - and guess what, one's a Javascript vuln
Put down that Oracle database patch: It could cost $23,000 per CPU
On-by-default INMEMORY tech a boon for developers ... as long as they can afford it
Google shows off new Chrome OS look
Athena springs full-grown from Chromium project's head
Apple: We'll unleash OS X Yosemite beta on the MASSES on 24 July
Starting today, regular fanbois will be guinea pigs, it tells Reg
HIDDEN packet sniffer spy tech in MILLIONS of iPhones, iPads – expert
Don't panic though – Apple's backdoor is not wide open to all, guru tells us
prev story


Top three mobile application threats
Prevent sensitive data leakage over insecure channels or stolen mobile devices.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.