SaaS

This article is more than 1 year old

Hey, Presto! Facebook spills petabyte-munching SQL brain sauce online

Zuck on that, Hortonworks and Cloudera

Wed 6 Nov 2013 // 18:00 UTC

Facebook has come through on its promise to publish its Hive-beating "Presto" analysis software as open source.

The code was made available by the social network today under the Apache v2 license, giving developers access to an ANSI-SQL compatible data query and analysis engine that is faster than Apache Hive, and competes with Cloudera's Impala and Hortonworks's Stinger technologies.

Facebook uses the tool for graph analytics, machine-learning, and short turnaround queries. The system has CPU performance four to seven times better than the Hive batch Hadoop cruncher, and returns query results eight to ten times faster.

Though it is designed to process data meant for Hive (or general Hadoop), it has "pluggable backends" that let it ingest info from other sources.

Facebook reckons the system could be relevant for people with 750GB or more of data needing analysis.

"It has also allowed us to provide a uniform SQL interface over multiple data backends such as HDFS, Hbase, Scribe, and an internal in-memory data store," a spokesperson told El Reg via email.

Unlike Hive, the Presto system does not depend on an underlying MapReduce compute framework, which Facebook says has led to improved scheduling. This has helped it work as a quick-turnaround system for interactive queries, rather than the batch processing jobs Hive is designed for.

The Java-based system works by parsing an ANSI-SQL query into a distributed query plan. It then spins up dedicated workers for multiple slices of data which it pulls from the underlying Hadoop File System (HDFS). Each worker runs a process that contains custom bytecode designed to increase execution efficiency. Data is stored and processed in-memory, and pipelined across the network between stages.

'Still a somewhat manual process to unpack and install'

"Through careful use of memory and data structures, Presto avoids typical issues of Java code related to memory allocation and garbage collection. (In a later post, we will share some tips and tricks for writing high-performance Java system code and the lessons learned while," the company wrote in a blog post announcing the publication of Presto as open source.

Facebook put Presto into production in early 2013, and the system now has over 1,000 users performing 30,000 queries that handle at least a petabyte of data per day, the company said. This is up from the 850 users and 27,000 daily queries the company claimed in June when it first told El Reg about Presto. Since then, Facebook's data warehouse has ballooned from 250PB to 300PB in size, and Presto is being used to query all of it.

One developer who has used the software told us that the technology is usable, and that "it is still a somewhat manual process to unpack and install, but I was able to do so within 12 minutes on a few boxes, but this is the sort of thing that runs on clusters of tens, hundreds, or thousands."

"Presto works better at Facebook scale and for our use cases," a spokesperson told El Reg via email. Other web companies have had a chance to play with it as well, and we were sent canned quotes from companies like Airbnb and Dropbox.

"It's an order of magnitude faster than Hive in most of our use cases," Airbnb data scientist Chris Gutierrez said. "It reads directly from HDFS, so unlike [Amazon Web Services] Redshift, there isn't a lot of ETL [extract, transform, and load] before you can use it. It just works." ®

Topics

Special Features

Vendor Voice

Resources

SaaS

Hey, Presto! Facebook spills petabyte-munching SQL brain sauce online

Zuck on that, Hortonworks and Cloudera

'Still a somewhat manual process to unpack and install'

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

Meta accused of snarfing people's Snapchat data via traffic decryption

Trump, who tried kicking TikTok out of the US, says boo to latest ban effort

How do you lot feel about Pay or say OK to ads model, asks ICO

Reducing the cloud security overhead

We're not Meta support: State AGs tell Zuck to fix rampant account takeover problem

Meta kills Facebook News in the US and Australia

World-plus-dog booted out of Facebook, Instagram, Threads

Untangling Meta's plan for its homegrown AI chips, set to actually roll out this year

Cory Doctorow has a plan to wipe away the enshittification of tech

Study: Thousands of businesses just love handing over your info to Facebook

Zuckerberg hunkers down in Hawaii to wait out apocalypse

Meta killing off Instagram, Messenger cross-platform chatting

About Us

Our Websites

Your Privacy