This article is more than 1 year old

That great sucking sound? It's data going into the public cloud

The great slurp is inevitable, just you watch

Nearline storage in Google’s cloud is a new front in the vigorous marketing war being fought by public cloud providers to grab your data and convert your storage CAPEX into OPEX, meaning income for them.

Some recent advances in cloud archive storage, cloud access gateways and hybrid on-premises/cloud data management are expected to accelerate the movement of data to the public cloud.

The basic idea is that hyperscale data centres in the public cloud have such immense economies of scale that they can store your data at a fraction of the cost of you storing it on-premises and managing the kit and the data stored in it. Of course there is the network transfer time it takes to send the data to and retrieve it from the cloud which, almost inevitably, renders the public cloud inappropriate for storing primary data needed by on-premises applications.

With that point in mind there are five general kinds of data going to the public cloud:

  • On-premises backup to the cloud with most backup providers supporting this
  • File sync and share via the cloud with services like Box and Dropbox
  • On-premises archiving to the cloud; think CommVault and others
  • Disaster recovery in the cloud
  • Primary data for cloud-apps

Twofold pattern

The pattern is twofold: first to send data to the cloud when on-premises applications don’t process it but need it recovered when lost or unavailable locally. The next is to send data to the cloud when applications in the cloud process it. If your apps run in the cloud then the data they access should be there as well.

Sending to the public cloud data that you don’t need processed locally but may need to recover at some stage is justified on the cost front: it costs less to store it in the cloud than to host it locally. Also the expenditure changes from CAPEX plus OPEX to OPEX only, as you no longer have to buy the storage arrays and or tape libraries to host the data on-premises.

By moving this kind of data to the cloud you can de-commission storage arrays and tape libraries and recover the data centre space they occupied. The kit no longer has to be powered and cooled so there is an on-premises OPEX saving there.

The cost of public cloud storage can be amazingly cheap with eye-catching prices. Amazon offers S3 object, EBS block and EFS file storage alongside Glacier, it’s famed archival storage costing as little as the radical $0.01/GB/month at launch in August 2012. That low price has been dropped to $0.007/GB/month, reflecting competition from Google.

Data retrieval can take several hours of course but that’s because the data is stored on off-line tape.

Google this year announced its Glacier competitor, called Nearline, also priced as low as $0.01/GB/month but with near-enough immediate retrieval of data, with a 3 second average response time, when you need it.

As its eye-catching starter-offer, Google said users could have 100PB – yes, that is petabytes – of free storage *. The data retrieval cost is $0.01/GB, which was the same as Amazon’s Glacier retrieval cost after free retrieval of a certain amount of data/month. Google justifies the S3 comparison on its TCO website because S3 also offers instant data retrieval whereas Glacier doesn’t.

In fact, Glacier was cheaper than Google Nearline, before the recent Amazon price drop, assuming a user stores half a petabyte of data and later makes retrieval and data addition operations. Nearline wins on speed and we may well see on-premises disk backup and archive data streams starting to be diverted to Nearline because it offers low-cost with a better access speed.

More about

TIP US OFF

Send us news


Other stories you might like