The Register® — Biting the hand that feeds IT

Feeds

Ocarina makes waves with lossless image compression

Heeeeey Ocarina! Aaaha!

Magic Quadrant for Enterprise Backup/Recovery

Interview Ocarina, the deduplication startup, is making waves with its partnerships with storage vendors, due to its unique lossless image compression technology. Yet the Ocarina founders were not wedded to image deduplication when they started up the company. How did it come about?

The way Murli Thirumale, Ocarina's CEO, tells it, the three founders had three ideas for a startup which they tested with potential customers and with consultants in a proof of concept exercise. They asked which of the three would have long-lasting and true value in the eyes of customers. The one dealing with ever-growing data storage received overwhelming customer support.

Much of this growth was due to rich media, images and videos. These file types, JPEGs, TIFFs and MPEGs and so forth, were previously thought to be uncompressible if there was to be no loss of image or video resolution. Ocarina's chief technology officer and co-founder Goutham Rao, came up with ways of doing this, of compressing the uncompressible. Thurimale says: "He didn't know you're not supposed to be able to do that."

Image and video files list picture elements and their characteristics. An Ocarina technology brief states: "Visual information is typically complex and includes large numbers of values to represent pixels, chrominance, luminance and other information for both recreating an image for the human eye to see, and storing information about the image for computer programs to be able to manipulate."

You can compress such files by getting rid of pixels, but this means losing image quality. What Rao invented was a way of recoding image and video files to store the same information in fewer bytes, with no lost pixels, with "bit-for-bit losslessness". This is Ocarina's secret technology and it's not revealing much about it.

DCT

The technology brief says Ocarina's technology: "extracts the full rich image data from an existing image file in to a Discrete Cosine Matrix (DCT space), correlates related image information like chrominance and luminance boundaries around like areas in an image, and then applies Ocarina’s patented image optimization compressor to the grouped areas. The ECO process is able to compress already-compressed JPEGs up to 40 percent, and sets of scaled images - common on web sites - up to 80 percent. Results on medical and life sciences image formats range from 40 percent to 70 percent, and results on grouped studies are better than on individual images."

Intriguingly, the DCT concept is often used in signal and image processing for lossy compression. Intriguingly again, an academic work on DCT has been written by a Dr K. R. Rao and P. Yip, entitled "Discrete Cosine Transform: Algorithms, Advantages, Applications" (Academic Press, Boston, 1990). Dr Rao is credited with being the co-inventor of the Discrete Cosine Transform but he is not related to Ocarina's Goutham Rao.

It looks as if Goutham Rao has found a way to use a method of encoding data for use in lossy compression algorithms to the opposite end.

Producing an Ocarina-encoded image or video file is much, much harder than displaying it. Production is carried out by using an Ocarina hardware appliance, an Optimizer, which comes in two models, a 2400 and a 3400. The 2400 has two quad-core Xeon 5400 processors, 16GB of RAM and two 500GB SATA disk drives.

The 3400 has the same Xeon processors but paired with 32GB of RAM and four 15,000rpm SAS drives. There is heavyweight processing going on here with a maximum bandwidth of 2TB per 24 hour day quoted by Ocarina. It can be down to 1TB a day if pure JPEGS are involved. Thirumale said: "We are CPU-bound in our optimization," and "We are in the process of benchmarking both the Nehalems and also processors with more cores."

He says that Ocarina does things to ensure it does not overwhelm the storage filer. For example, optimisation can be scheduled for off peak times, and it can be throttled back if the filer is really busy.

During optimisation an existing image or video file is read in by an Optimizer and recoded, using the DCT mathmatical method, and processed with other techniques to produce a smaller output file.

This can be read by Ocarina's ECOreader, a piece of software which sits in-line between the storage and the application needing to access the file. It can be deployed on web servers, application servers, proxy appliances, or in some cases, directly on file servers.

Each Ocarina-encoded file is self-contained and holds all the data and metadata required by any ECOreader to access it and send it on the requesting application in real-time.

Other Ocarina compression techniques

Thurimale says Ocarina's Optimizer looks inside files that can contain various objects, such as images in Word documents and PowerPoint decks and PDFs. Once it finds these it can compress them. Also, once the Optimizer has a DCT version of the data, it can compare this with already processed images and use any correlations to improve the optimisation. It doesn't specify how it does this but it sounds like a form of deduplication.

The Optimizer also looks for sets of images sharing common information, such as a sequence of CT scans. The common data is dealt with once, single-instanced in effect, and stored only once. Ocarina's brief describes this as "an example of deduplication applied at the visual information level, rather than at the block storage level."

The idea here is that sub-file-level deduplication cannot process such files or objects because it doesn't know they exist, it's not application data-aware and only sees raw blocks.

The Optimizer can deal with sets of scaled images by only storing data from the largest one and using it to recreate the smaller ones on the fly through the ECOreader. Where there are small thumbnail images which may be stored inefficiently as separate files, 2KB of data stored in an 8KB block is an example Ocarina uses, then the thumbnails can be grouped together to use storage more efficiently. In other words the small thumbnails can be grouped to fill up the minimum block size in a file server.

Deployments

Thurimale says 61 percent of Ocarina deployments have been associated with an increase in disk purchases by the customer. This is because customers are using Ocarina-compressed files on disk to do work that was not possible in real time before. They'll have active data for use in creative work on videos or images and older data that had been consigned to tape. Restoring this for real-time editing work is not practical.

Talking of a movie studio customer, Thurimale said: "We were able to give them real-time access to this archive data." Having it stored on disk in an Ocarina-encoded format means they can use it in real-time and thus the creatives are more productive.

He says: "It's about tape replacement. Tape's rightful place is in a deep archive," and talks of Ocarina enabling "cheap and deep" disk storage and of it being no threat to disk storage sales.

According to him, Ocarina's products are resold by BlueArc, and have been certified by Hitachi Data Systems, HP, and Isilon. Ocarina is working with DataDirect and pursuing certification with EMC, where the focus is on Celerra, and Ocarina is being integrated using EMC's file mover API. Thurimale said: "We could work very well with Atmos."

Cloud storage provider Nirvanix is also working with Ocarina. However there is no certification with IBM or NetApp.

There have been two Ocarina funding rounds, the last for $20m in February this year with a total of $31m having been raised. Thurimale is especially pleased about the second round which took place in the middle of the recession. The company must have been able to demonstrate good potential. but no customer numbers or revenue numbers have been made public.

Thurimale says: "There are several billion files under Ocarina optimisation (and) we're demonstrating great customer traction (with) customers in production and making repeat purchases. ... We're an add-on to your storage. We're not a rip-and-replace company. The applications won't recognise that we're there." ®

Agentless Backup is Not a Myth

Latest Comments

Reporting FAIL

Ouch - I think I've seen more technically-correct articles in InformationWeek. Picking on the details of DCTs and the like is almost beside the point; this piece was lost well before it tried to offer specifics.

Just who, precisely, believed lossless compression of images was "impossible"? It's trivially true that for any given message longer than one bit, there's at least one encoding that compresses it, though it may expand all other messages. (Encode the target message as a single zero bit; encode all other messages as a one bit followed by the original data verbatim. Implementing the decoder is left as an exercise for the reader.)

Even degenerate cases aside, it's clear that there will often be some redundancy in image and other files that can theoretically be exploited by lossless compressors; and when compressing a set of images, the probability of redundancy increases. This is exploitable in practice using good HMM-based entropy-encoders such as bzip2 and ppmd (the BWT used by bzip2 is effectively a simplified HMM), as one or two commentators have already noted.

Conversely, as other commentators have noted, you can't losslessly compress everything, thanks to the pigeonhole principle. Lossless compression is a question of mapping from the original set of messages to a new set, such that the ones you're interested in tend to be on the short end of the range. Ocarina may have found a practical way to improve that mapping somewhat for the messages their customers are interested in; but they haven't violated some mythical law of compression.

0
0

Are they mixing up "patented" and "secret"?

If the process is patented, it can't be kept secret - you have to declare all the details in the patent. Well, that was the idea, anyway. I suppose if the US patent office allows patents that are so general that they could apply to anything, there would be room to declare enough basic details to enable them to sue anyone who began developing along the same lines, but still keep the vital details secret.

I thought that the JPEG2000 project had already come up with much more efficient compression algorithms for images, taking advantage of the big advances in processor power etc since JPEG compression first appeared. However, the creators of the technologies, quite understandably, wanted to be paid for their work. And since, as has been said, storage capacity has also made big advances in that time, few if any system or application developers felt a need to buy in.

0
0

Aaah, a trip down memory lane

Does anybody else remember the saga of "Adams Platform"

http://www.google.com.au/search?q=adams+platform&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-US:unofficial&client=firefox-a

I wonder what ever happened there?

0
0

More from The Register

SCO vs. IBM battle resumes over ownership of Unix
Zombie lawsuit back and wants to suck the brains out of Linux
 breaking news
You don't need phone lines or cable for ANYTHING, says Dish
The satellite-dish man can sort you out with phone and broadband over the air too
 breaking news
What's HP got under wraps? Looks awfully flash and tape shaped
What happens in Vegas won't stay there - we've got the details
AMD lifts the veil on Opteron, ARM chip plans for 2014
Not much action going on in 2013, though
Microsoft borks botnet takedown in Citadel snafu
Stupid Redmond kicked over our honeypots, wail white hats
IBM's $1bn layoffs latest: Now axe swings in US, Canada - reports
Union claims 121 storage bods canned after dismal sales
NetApp musters muscular cluster bluster for ONTAP busters
Storage array OS overhauled to juggle more nodes, go down on you, er, less
HP adds 'Haswell' Xeon E3s to entry ProLiant servers
Gussies up MicroServer for SMBs, adds baby switches