Feeds

Google's epic graph cruncher mimicked with open source

And then there was GoldenOrb

Secure remote control for conventional and virtual desktops

Unlike Facebook or Yahoo!, Google is loath to open source its back-end software. For many, this is a sore point, as the search giant has built its famously distributed infrastructure atop countless open source tools fashioned outside the walls of the Googleplex. But Mountain View does give back in less-direct ways.

In some cases, Google will publish research papers describing one of the proprietary platforms driving its back end, and to a certain degree this allows outside developers to mimic these platforms with open source projects. Google papers on its GFS distributed file system and MapReduce distributed number-crunching platform, for example, gave rise to the open source Hadoop, and a paper on its BigTable distributed database sparked the open source HBase project.

Now, much the same thing has happened with Google Pregel, Mountain View's platform for processing enormous online graphs, such as a map of the web itself, or of a social network, graphing relationships between people. This week, a Texas startup known as Ravel unveiled an open source project based on Google's 2010 paper describing Pregel. Open sourced under an Apache license at GitHub, the project is dubbed GoldenOrb.

Zach Richardson

Zach Richardson

A few years back, while working on a PhD in computational mathematics, Ravel president and GoldenOrb lead architect Zach Richardson helped found a small company that basically helped other businesses processes large amounts of data, including "semantic web" data, which seeks to give machines a better "understanding" of text on the internet. They soon realized that to solve such problems required better tools.

"We were doing consulting at the intersection of the semantic web and Big Data," Richardson tells The Register. "Semantic web data is inherently stored in a graph, and when you get to very large data sizes, a lot of the traditional methodologies for processing or trying to understand that data no longer work. Or they don't scale. Or they take a completely unrealistic amount of time."

As luck would have it, Google published its Pregel paper. Pregel is a computational model that dovetails with Google's existing data-storage technologies, including the Google File System (GFS) and BigTable. In essence, data from GFS or BigTable is shuttled to Pregel, where the data is crunched. Presumably, Google Chubby – the company's distributed lock service – is used to manage access to data.

In the open source world, GFS, BigTable, and Chubby are mirrored by the Hadoop File System (HDFS), HBase, and Zookeeper. Naturally, Richardson and his fellow developers built GoldenOrb atop such open source platforms. Zookeeper handles data synchronization across distributed machines, and Hadoop's remote procedural call (RPC) passes message from node to node. Google's Pregel paper provides a high-level description of the Pregel programming model, but the rest was guesswork.

"Google provides the higher level concepts of when something needs to synchronize, what communications need to happen between servers, and what your programming model needs to look like to use it," Richardson says. "But how close we are to their implementation? It's very hard to guess."

Building an initial GoldenOrb platform took about seven months of "on and off" work. Richardson and his team has not even had a cursory discussion with Google about the platform.

In addition to developing the platform, Ravel will build applications that run atop it. "We're focused on building enterprise products that analyze data," Richardson says. "Graph problems are [almost infinite]. This includes social network analysis, a very popular topic, but the same algorithms might also be used in things like epidemiology research or pharmaceutical research."

According to Richardson, the platform is suited to situations in which you need random access to data while your algorithm is running. MapReduce is designed for batch processing. You take a large chuck on data, break it up into tiny pieces, and spread it across a cluster of machines for processing. GoldenOrb can run algorithms that grab particular pieces of information from distributed machines on the fly. "With MapReduce, if I'm doing a calculation on one machine and I happen to need information on another machines, there's no way to get it," he says. "GoldenOrb can share information across all machines as necessary to solve the problem."

Ravel employees about fifteen people, and according to Richardson, the company has already started building products on the open source platform. But he declined to discuss them. ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
China hopes home-grown OS will oust Microsoft
Doesn't much like Apple or Google, either
Linux turns 23 and Linus Torvalds celebrates as only he can
No, not with swearing, but by controlling the release cycle
This is how I set about making a fortune with my own startup
Would you leave your well-paid job to chase your dream?
Microsoft cries UNINSTALL in the wake of Blue Screens of Death™
Cache crash causes contained choloric calamity
Eat up Martha! Microsoft slings handwriting recog into OneNote on Android
Freehand input on non-Windows kit for the first time
Linux kernel devs made to finger their dongles before contributing code
Two-factor auth enabled for Kernel.org repositories
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
5 things you didn’t know about cloud backup
IT departments are embracing cloud backup, but there’s a lot you need to know before choosing a service provider. Learn all the critical things you need to know.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?