Ocarina makes waves with lossless image compression

Heeeeey Ocarina! Aaaha!

Protecting against web application threats using SSL

Interview Ocarina, the deduplication startup, is making waves with its partnerships with storage vendors, due to its unique lossless image compression technology. Yet the Ocarina founders were not wedded to image deduplication when they started up the company. How did it come about?

The way Murli Thirumale, Ocarina's CEO, tells it, the three founders had three ideas for a startup which they tested with potential customers and with consultants in a proof of concept exercise. They asked which of the three would have long-lasting and true value in the eyes of customers. The one dealing with ever-growing data storage received overwhelming customer support.

Much of this growth was due to rich media, images and videos. These file types, JPEGs, TIFFs and MPEGs and so forth, were previously thought to be uncompressible if there was to be no loss of image or video resolution. Ocarina's chief technology officer and co-founder Goutham Rao, came up with ways of doing this, of compressing the uncompressible. Thurimale says: "He didn't know you're not supposed to be able to do that."

Image and video files list picture elements and their characteristics. An Ocarina technology brief states: "Visual information is typically complex and includes large numbers of values to represent pixels, chrominance, luminance and other information for both recreating an image for the human eye to see, and storing information about the image for computer programs to be able to manipulate."

You can compress such files by getting rid of pixels, but this means losing image quality. What Rao invented was a way of recoding image and video files to store the same information in fewer bytes, with no lost pixels, with "bit-for-bit losslessness". This is Ocarina's secret technology and it's not revealing much about it.


The technology brief says Ocarina's technology: "extracts the full rich image data from an existing image file in to a Discrete Cosine Matrix (DCT space), correlates related image information like chrominance and luminance boundaries around like areas in an image, and then applies Ocarina’s patented image optimization compressor to the grouped areas. The ECO process is able to compress already-compressed JPEGs up to 40 percent, and sets of scaled images - common on web sites - up to 80 percent. Results on medical and life sciences image formats range from 40 percent to 70 percent, and results on grouped studies are better than on individual images."

Intriguingly, the DCT concept is often used in signal and image processing for lossy compression. Intriguingly again, an academic work on DCT has been written by a Dr K. R. Rao and P. Yip, entitled "Discrete Cosine Transform: Algorithms, Advantages, Applications" (Academic Press, Boston, 1990). Dr Rao is credited with being the co-inventor of the Discrete Cosine Transform but he is not related to Ocarina's Goutham Rao.

It looks as if Goutham Rao has found a way to use a method of encoding data for use in lossy compression algorithms to the opposite end.

Producing an Ocarina-encoded image or video file is much, much harder than displaying it. Production is carried out by using an Ocarina hardware appliance, an Optimizer, which comes in two models, a 2400 and a 3400. The 2400 has two quad-core Xeon 5400 processors, 16GB of RAM and two 500GB SATA disk drives.

The 3400 has the same Xeon processors but paired with 32GB of RAM and four 15,000rpm SAS drives. There is heavyweight processing going on here with a maximum bandwidth of 2TB per 24 hour day quoted by Ocarina. It can be down to 1TB a day if pure JPEGS are involved. Thirumale said: "We are CPU-bound in our optimization," and "We are in the process of benchmarking both the Nehalems and also processors with more cores."

He says that Ocarina does things to ensure it does not overwhelm the storage filer. For example, optimisation can be scheduled for off peak times, and it can be throttled back if the filer is really busy.

During optimisation an existing image or video file is read in by an Optimizer and recoded, using the DCT mathmatical method, and processed with other techniques to produce a smaller output file.

This can be read by Ocarina's ECOreader, a piece of software which sits in-line between the storage and the application needing to access the file. It can be deployed on web servers, application servers, proxy appliances, or in some cases, directly on file servers.

Each Ocarina-encoded file is self-contained and holds all the data and metadata required by any ECOreader to access it and send it on the requesting application in real-time.

Other Ocarina compression techniques

Thurimale says Ocarina's Optimizer looks inside files that can contain various objects, such as images in Word documents and PowerPoint decks and PDFs. Once it finds these it can compress them. Also, once the Optimizer has a DCT version of the data, it can compare this with already processed images and use any correlations to improve the optimisation. It doesn't specify how it does this but it sounds like a form of deduplication.

The Optimizer also looks for sets of images sharing common information, such as a sequence of CT scans. The common data is dealt with once, single-instanced in effect, and stored only once. Ocarina's brief describes this as "an example of deduplication applied at the visual information level, rather than at the block storage level."

The idea here is that sub-file-level deduplication cannot process such files or objects because it doesn't know they exist, it's not application data-aware and only sees raw blocks.

The Optimizer can deal with sets of scaled images by only storing data from the largest one and using it to recreate the smaller ones on the fly through the ECOreader. Where there are small thumbnail images which may be stored inefficiently as separate files, 2KB of data stored in an 8KB block is an example Ocarina uses, then the thumbnails can be grouped together to use storage more efficiently. In other words the small thumbnails can be grouped to fill up the minimum block size in a file server.


Thurimale says 61 percent of Ocarina deployments have been associated with an increase in disk purchases by the customer. This is because customers are using Ocarina-compressed files on disk to do work that was not possible in real time before. They'll have active data for use in creative work on videos or images and older data that had been consigned to tape. Restoring this for real-time editing work is not practical.

Talking of a movie studio customer, Thurimale said: "We were able to give them real-time access to this archive data." Having it stored on disk in an Ocarina-encoded format means they can use it in real-time and thus the creatives are more productive.

He says: "It's about tape replacement. Tape's rightful place is in a deep archive," and talks of Ocarina enabling "cheap and deep" disk storage and of it being no threat to disk storage sales.

According to him, Ocarina's products are resold by BlueArc, and have been certified by Hitachi Data Systems, HP, and Isilon. Ocarina is working with DataDirect and pursuing certification with EMC, where the focus is on Celerra, and Ocarina is being integrated using EMC's file mover API. Thurimale said: "We could work very well with Atmos."

Cloud storage provider Nirvanix is also working with Ocarina. However there is no certification with IBM or NetApp.

There have been two Ocarina funding rounds, the last for $20m in February this year with a total of $31m having been raised. Thurimale is especially pleased about the second round which took place in the middle of the recession. The company must have been able to demonstrate good potential. but no customer numbers or revenue numbers have been made public.

Thurimale says: "There are several billion files under Ocarina optimisation (and) we're demonstrating great customer traction (with) customers in production and making repeat purchases. ... We're an add-on to your storage. We're not a rip-and-replace company. The applications won't recognise that we're there." ®

Choosing a cloud hosting partner with confidence

More from The Register

next story
Wanna keep your data for 1,000 YEARS? No? Hard luck, HDS wants you to anyway
Combine Blu-ray and M-DISC and you get this monster
US boffins demo 'twisted radio' mux
OAM takes wireless signals to 32 Gbps
'Kim Kardashian snaps naked selfies with a BLACKBERRY'. *Twitterati gasps*
More alleged private, nude celeb pics appear online
Google+ GOING, GOING ... ? Newbie Gmailers no longer forced into mandatory ID slurp
Mountain View distances itself from lame 'network thingy'
Apple flops out 2FA for iCloud in bid to stop future nude selfie leaks
Millions of 4chan users howl with laughter as Cupertino slams stable door
Students playing with impressive racks? Yes, it's cluster comp time
The most comprehensive coverage the world has ever seen. Ever
Run little spreadsheet, run! IBM's Watson is coming to gobble you up
Big Blue's big super's big appetite for big data in big clouds for big analytics
Seagate's triple-headed Cerberus could SAVE the DISK WORLD
... and possibly bring us even more HAMR time. Yay!
prev story


Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
The next step in data security
With recent increased privacy concerns and computers becoming more powerful, the chance of hackers being able to crack smaller-sized RSA keys increases.