Feeds

Google open sources MapReduce compression

In the name of speed

Combat fraud and increase customer satisfaction

Google has open sourced the compression library used across its backend infrastructure, including MapReduce, its distributed number-crunching platform, and BigTable, its distributed database.

Available at Google Code under an Apache 2.0 license, the library is called Snappy, but Google says this is the same library that was previously referred to as Zippy in some public presentations. As the names imply, the library's primary aim is speed. "It does not aim for maximum compression, or compatibility with any other compression library," Google says. "Instead, it aims for very high speeds and reasonable compression."

Compared to the fastest mode of the popular zlib compression library, Google says, the C++-based Snappy is an order of magnitude faster in most cases (roughly ten times faster), but the compressed files are between 20 and 100 per cent larger. Running in 64-bit mode on a single core of a 2.26Ghz "Westmere" Intel Core i7 processor, according to the company, Snappy compresses at roughly 250MB/sec and decompresses at 500MB/sec.

Google says that the typical compression ratios are about 1.5x to 1.7x for plain text and about 2x to 4x for HTML. zlib in its fastest mode gives you 2.6x to 2.8x for plain text and 3x to 7x for HTML. " So if you want to save space, or want to compress once and decompress lots of times, use zlib (or bzip2, or…). But if you just want to cut down on your I/O, be it network or disk I/O, Snappy might be for you," says Google engineer Steinar Gunderson.

According to Gunderson, Snappy removes the "entropy reduction" step that characterizes zlib and other LZ-style compression libraries. "Most LZ-style compressors (including zlib) consist of two parts: A matching algorithm (recognizing repetitions from data earlier in the stream, as well as things like 'abcabcabcabc') and then an entropy reduction step (almost invariably Huffman or some version of arithmetic encoding)," he says. "Snappy skips the entropy reduction and instead uses a fixed, hand-tuned packing format."

This format, Gunderson says, affords "much less" CPU usage, and he says that Google has spent years fine tuning it. Virtually all of Google's online service run atop a uniform distributed infrastructure based on the proprietary Google File System (GFS), MapReduce, BigTable, and other platforms. This have been mimicked in the open source world by the Apache Hadoop project. ®

SANS - Survey on application security programs

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Oh no, Joe: WinPhone users already griping over 8.1 mega-update
Hang on. Which bit of Developer Preview don't you understand?
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
IRS boss on XP migration: 'Classic fix the airplane while you're flying it attempt'
Plus: Condoleezza Rice at Dropbox 'maybe she can find ... weapons of mass destruction'
Ditch the sync, paddle in the Streem: Upstart offers syncless sharing
Upload, delete and carry on sharing afterwards?
New Facebook phone app allows you to stalk your mates
Nearby Friends feature goes live in a few weeks
Microsoft TIER SMEAR changes app prices whether devs ask or not
Some go up, some go down, Redmond goes silent
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.