Drilling into Amazon's tape-killing Glacier cloud archive
It could devastate the industry
Blocks and Files Amazon Glacier is a series of cloud vaults holding customer archive data that isn't based on tape libraries. Instead it appears to use object storage and is set to be the largest object storage implementation in history in a very short time.
Amazon Web Services team member James Hamilton blogged about the new product, writing: "Amazon Glacier now offers cloud storage where each object is stored redundantly in multiple, independent data centers at $0.01/GB/month."
And it seems that it's a growing concern – looks like Amazon is looking for storage staffers.
"[If] you are an engineer or engineering leader with an interest in massive scale distributed storage systems, we have big plans for Glacier and are hiring. Send your resume to email@example.com."
Conclusion: Glacier is object storage implemented as a massive scale, distributed storage system. It provides multiple copies of objects, regularly checks the integrity of stored data and self-heals if errors are found. The annual average data item durability is 99.999999999 per cent – eleven nines.
We can expect Microsoft Azure to play catch-up quite sharpish with its own on-line archive facility. MS has been chasing Amazon on the cloud front for some time.
When you add in the astonishingly low pricing, then tape storage for archive has just received a kick in the guts. Companies that might be envisaging buying tape systems for storing cold Big Data may well think of Amazon instead. Cloud service providers may effectively resell Amazon Glacier or set up their own object storage infrastructure for archive. Amplidata, Caringo, CleverSafe and Scality will be pleased to advise them.
Tape has already virtually ceded the backup market to disk, with deduplicating disk-to-disk back backup systems and virtual tape libraries taking the backup tape citadel by storm, and tape retreating to the archive market with its better-than-disk-drive-array storage costs for vast volumes of data.
But if Amazon can do it cheaper still, and it is online and not on tape, then it's feasible to suggest, Amazon has just delivered a kiss of object storage archive poison to the tape suppliers – Oracle, IBM, HP, Quantum, SpectraLogic and others – and the effects of this could be devastating on the industry. ®
devastating - not
You're forgetting about the cost of bandwidth, and the amount of time it takes to upload data to such a facility. My own facility which is within 17ms of Amazon's east coast facility still gets only a paltry 3-5MB/second of throughput on a gigabit link for a single stream. Tape is frequently measured in dozens or in the high end hundreds of megabytes a second of throughput(my own experience the source media is often the bottleneck rather than the tape). Most users probably will not have either a high speed link nor a low latency connection to the remote facility.
I wrote a blog post recently "Freakish performance on Site to Site VPN" where I was able to sustain 10MB/sec+ between a pair of Sonic wall VPNs on a 95ms link with a single stream (highly compressed file and encrypted with ssh) - myself I've never come across anywhere remotely that level of throughput on a VPN even with generic WAN optimization - SonicWall must be doing something really nice (regular internet speeds outside VPN were in the 700kB/s range). Now if I could get such performance to a cloud provider that would be nice, but unlike good cloud providers that allow you to have a hybrid of physical and virtual resources, Amazon doesn't play that game.
add to that tape can't be easily deleted when it is off site. That is, unless this amazon service is significantly different from S3, it is trivially easy to wipe out all of your backups with a couple commands. Storing tapes totally off line adds significantly more security and protection from that.
There was one facility I almost hosted my gear at a while ago that had a significant Amazon presence, and there was the option to have a direct gigabit link into their network from mine, in that case it would of been sub millisecond access and I can imagine it would make a lot more sense then.
For small data sets, it can work and there are already tons of providers out there that provide the service, most of them seem to advertise "unlimited" storage for a low yearly rate. These sorts of folks I think don't really care whether their data is stored in multiple data centers, it's a backup after all.
Takes hours to retreive
The blurb says it takes 3 to 4 hours to retrieve an object from the "vault". That sounds more like a tape library than a disk system. Or at least, it sounds like it's spec'd so that they could implement it on tape, even if the initial deployment (to test the market) is on top of their existing disk storage system.
You can buy 60TB of tape storage for about $18K (US). That's $.30 per gigabyte one time purchase. If you expect your tape library to last 7 years, then you would pay $.84 per gigabyte for the Amazon solution. And, you have no retrieval costs with tape storage. How is 3X the cost a killer?