Feeds

Google open sources MapReduce compression

In the name of speed

Boost IT visibility and business value

Google has open sourced the compression library used across its backend infrastructure, including MapReduce, its distributed number-crunching platform, and BigTable, its distributed database.

Available at Google Code under an Apache 2.0 license, the library is called Snappy, but Google says this is the same library that was previously referred to as Zippy in some public presentations. As the names imply, the library's primary aim is speed. "It does not aim for maximum compression, or compatibility with any other compression library," Google says. "Instead, it aims for very high speeds and reasonable compression."

Compared to the fastest mode of the popular zlib compression library, Google says, the C++-based Snappy is an order of magnitude faster in most cases (roughly ten times faster), but the compressed files are between 20 and 100 per cent larger. Running in 64-bit mode on a single core of a 2.26Ghz "Westmere" Intel Core i7 processor, according to the company, Snappy compresses at roughly 250MB/sec and decompresses at 500MB/sec.

Google says that the typical compression ratios are about 1.5x to 1.7x for plain text and about 2x to 4x for HTML. zlib in its fastest mode gives you 2.6x to 2.8x for plain text and 3x to 7x for HTML. " So if you want to save space, or want to compress once and decompress lots of times, use zlib (or bzip2, or…). But if you just want to cut down on your I/O, be it network or disk I/O, Snappy might be for you," says Google engineer Steinar Gunderson.

According to Gunderson, Snappy removes the "entropy reduction" step that characterizes zlib and other LZ-style compression libraries. "Most LZ-style compressors (including zlib) consist of two parts: A matching algorithm (recognizing repetitions from data earlier in the stream, as well as things like 'abcabcabcabc') and then an entropy reduction step (almost invariably Huffman or some version of arithmetic encoding)," he says. "Snappy skips the entropy reduction and instead uses a fixed, hand-tuned packing format."

This format, Gunderson says, affords "much less" CPU usage, and he says that Google has spent years fine tuning it. Virtually all of Google's online service run atop a uniform distributed infrastructure based on the proprietary Google File System (GFS), MapReduce, BigTable, and other platforms. This have been mimicked in the open source world by the Apache Hadoop project. ®

Build a business case: developing custom apps

More from The Register

next story
KDE releases ice-cream coloured Plasma 5 just in time for summer
Melty but refreshing - popular rival to Mint's Cinnamon's still a work in progress
Leaked Windows Phone 8.1 Update specs tease details of Nokia's next mobes
New screen sizes, dual SIMs, voice over LTE, and more
PEAK LANDFILL: Why tablet gloom is good news for Windows users
Sinofsky's hybrid strategy looks dafter than ever
Fiendishly complex password app extension ships for iOS 8
Just slip it in, won't hurt a bit, 1Password makers urge devs
Mozilla keeps its Beard, hopes anti-gay marriage troubles are now over
Plenty on new CEO's todo list – starting with Firefox's slipping grasp
Apple: We'll unleash OS X Yosemite beta on the MASSES on 24 July
Starting today, regular fanbois will be guinea pigs, it tells Reg
Another day, another Firefox: Version 31 is upon us ALREADY
Web devs, Mozilla really wants you to like this one
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
The Essential Guide to IT Transformation
ServiceNow discusses three IT transformations that can help CIO's automate IT services to transform IT and the enterprise.
Maximize storage efficiency across the enterprise
The HP StoreOnce backup solution offers highly flexible, centrally managed, and highly efficient data protection for any enterprise.