Feeds

Ocarina makes waves with lossless image compression

Heeeeey Ocarina! Aaaha!

Internet Security Threat Report 2014

Interview Ocarina, the deduplication startup, is making waves with its partnerships with storage vendors, due to its unique lossless image compression technology. Yet the Ocarina founders were not wedded to image deduplication when they started up the company. How did it come about?

The way Murli Thirumale, Ocarina's CEO, tells it, the three founders had three ideas for a startup which they tested with potential customers and with consultants in a proof of concept exercise. They asked which of the three would have long-lasting and true value in the eyes of customers. The one dealing with ever-growing data storage received overwhelming customer support.

Much of this growth was due to rich media, images and videos. These file types, JPEGs, TIFFs and MPEGs and so forth, were previously thought to be uncompressible if there was to be no loss of image or video resolution. Ocarina's chief technology officer and co-founder Goutham Rao, came up with ways of doing this, of compressing the uncompressible. Thurimale says: "He didn't know you're not supposed to be able to do that."

Image and video files list picture elements and their characteristics. An Ocarina technology brief states: "Visual information is typically complex and includes large numbers of values to represent pixels, chrominance, luminance and other information for both recreating an image for the human eye to see, and storing information about the image for computer programs to be able to manipulate."

You can compress such files by getting rid of pixels, but this means losing image quality. What Rao invented was a way of recoding image and video files to store the same information in fewer bytes, with no lost pixels, with "bit-for-bit losslessness". This is Ocarina's secret technology and it's not revealing much about it.

DCT

The technology brief says Ocarina's technology: "extracts the full rich image data from an existing image file in to a Discrete Cosine Matrix (DCT space), correlates related image information like chrominance and luminance boundaries around like areas in an image, and then applies Ocarina’s patented image optimization compressor to the grouped areas. The ECO process is able to compress already-compressed JPEGs up to 40 percent, and sets of scaled images - common on web sites - up to 80 percent. Results on medical and life sciences image formats range from 40 percent to 70 percent, and results on grouped studies are better than on individual images."

Intriguingly, the DCT concept is often used in signal and image processing for lossy compression. Intriguingly again, an academic work on DCT has been written by a Dr K. R. Rao and P. Yip, entitled "Discrete Cosine Transform: Algorithms, Advantages, Applications" (Academic Press, Boston, 1990). Dr Rao is credited with being the co-inventor of the Discrete Cosine Transform but he is not related to Ocarina's Goutham Rao.

It looks as if Goutham Rao has found a way to use a method of encoding data for use in lossy compression algorithms to the opposite end.

Producing an Ocarina-encoded image or video file is much, much harder than displaying it. Production is carried out by using an Ocarina hardware appliance, an Optimizer, which comes in two models, a 2400 and a 3400. The 2400 has two quad-core Xeon 5400 processors, 16GB of RAM and two 500GB SATA disk drives.

The 3400 has the same Xeon processors but paired with 32GB of RAM and four 15,000rpm SAS drives. There is heavyweight processing going on here with a maximum bandwidth of 2TB per 24 hour day quoted by Ocarina. It can be down to 1TB a day if pure JPEGS are involved. Thirumale said: "We are CPU-bound in our optimization," and "We are in the process of benchmarking both the Nehalems and also processors with more cores."

He says that Ocarina does things to ensure it does not overwhelm the storage filer. For example, optimisation can be scheduled for off peak times, and it can be throttled back if the filer is really busy.

During optimisation an existing image or video file is read in by an Optimizer and recoded, using the DCT mathmatical method, and processed with other techniques to produce a smaller output file.

This can be read by Ocarina's ECOreader, a piece of software which sits in-line between the storage and the application needing to access the file. It can be deployed on web servers, application servers, proxy appliances, or in some cases, directly on file servers.

Each Ocarina-encoded file is self-contained and holds all the data and metadata required by any ECOreader to access it and send it on the requesting application in real-time.

Other Ocarina compression techniques

Thurimale says Ocarina's Optimizer looks inside files that can contain various objects, such as images in Word documents and PowerPoint decks and PDFs. Once it finds these it can compress them. Also, once the Optimizer has a DCT version of the data, it can compare this with already processed images and use any correlations to improve the optimisation. It doesn't specify how it does this but it sounds like a form of deduplication.

The Optimizer also looks for sets of images sharing common information, such as a sequence of CT scans. The common data is dealt with once, single-instanced in effect, and stored only once. Ocarina's brief describes this as "an example of deduplication applied at the visual information level, rather than at the block storage level."

The idea here is that sub-file-level deduplication cannot process such files or objects because it doesn't know they exist, it's not application data-aware and only sees raw blocks.

The Optimizer can deal with sets of scaled images by only storing data from the largest one and using it to recreate the smaller ones on the fly through the ECOreader. Where there are small thumbnail images which may be stored inefficiently as separate files, 2KB of data stored in an 8KB block is an example Ocarina uses, then the thumbnails can be grouped together to use storage more efficiently. In other words the small thumbnails can be grouped to fill up the minimum block size in a file server.

Deployments

Thurimale says 61 percent of Ocarina deployments have been associated with an increase in disk purchases by the customer. This is because customers are using Ocarina-compressed files on disk to do work that was not possible in real time before. They'll have active data for use in creative work on videos or images and older data that had been consigned to tape. Restoring this for real-time editing work is not practical.

Talking of a movie studio customer, Thurimale said: "We were able to give them real-time access to this archive data." Having it stored on disk in an Ocarina-encoded format means they can use it in real-time and thus the creatives are more productive.

He says: "It's about tape replacement. Tape's rightful place is in a deep archive," and talks of Ocarina enabling "cheap and deep" disk storage and of it being no threat to disk storage sales.

According to him, Ocarina's products are resold by BlueArc, and have been certified by Hitachi Data Systems, HP, and Isilon. Ocarina is working with DataDirect and pursuing certification with EMC, where the focus is on Celerra, and Ocarina is being integrated using EMC's file mover API. Thurimale said: "We could work very well with Atmos."

Cloud storage provider Nirvanix is also working with Ocarina. However there is no certification with IBM or NetApp.

There have been two Ocarina funding rounds, the last for $20m in February this year with a total of $31m having been raised. Thurimale is especially pleased about the second round which took place in the middle of the recession. The company must have been able to demonstrate good potential. but no customer numbers or revenue numbers have been made public.

Thurimale says: "There are several billion files under Ocarina optimisation (and) we're demonstrating great customer traction (with) customers in production and making repeat purchases. ... We're an add-on to your storage. We're not a rip-and-replace company. The applications won't recognise that we're there." ®

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
Intel, Cisco and co reveal PLANS to keep tabs on WORLD'S MACHINES
Connecting everything to everything... Er, good idea?
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Oracle hires former SAP exec for cloudy push
'We know Larry said cloud was gibberish, and insane, and idiotic, but...'
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.