Cloud mega-uploads aren't easy
Google, Microsoft, can't explain how to get big data into the cloud, despite rivals' import services
Google and Microsoft don't offer formal data ingestion services to help users get lots of data into the cloud, and neither seems set to do so anytime soon. Quite how would-be users take advantage of the hundreds of terabytes both offer in the cloud is therefore a bit of a mystery.
Data ingestion services see cloud providers offer customers the chance to send them hard disks for rapid upload into the cloud. Amazon Web Services' import/export service was among the first such services and offers the chance to ingest up to 16TB of data, provided it is no more than 14 inches high by 19 inches wide by 36 inches deep (8Us in a standard 19 inch rack) and weighs less than 50 pounds.
Rackspace offers a similar service, dubbed Cloud Files Bulk Import. Optus, the Australian arm of telecoms giant Singtel, will happily offer a similar service. Australian cloud Ninefold does likewise, branding it "Sneakernet".
Some other cloud providers offer such a service, even if it is not productised or advertised. The Register spoke to one cloudy migrant who (after requesting anonymity) told us they borrowed a desktop network attached storage (NAS) device from their new cloud provider, bought another, uploaded data to the devices and then despatched a staffer on a flight to the cloud facility. The NASes were carry-on luggage and the travelling staffer cradled them on their lap during the flight.
It was worth going to those lengths because, as AWS points out in the spiel for its import/export service, doing so “is often much faster than transferring that data via the Internet.”
To understand why, consider the fact that headline speeds advertised on broadband connections aren't always achieved in the real world. Optus, for example, told us that while its fastest broadband connection hums along at 3-5 Gbps, the standard service level agreement “guarantees a speed of 300 Mbps, above which we would conduct fibre checks to ensure additional capacity can be reserved for the customer.” At that speed each terabyte would take about eight hours to upload, and that's with an optimistic assumption of 10% overhead and general network messiness.
It's hard to imagine how that kind of speed will be of any use for cloud services which offer petabyte-scale cloud storage, such as Azure's (or whatever it is called this week) pricing tier for amounts of data “Greater than 5 PB.” Google's BigQuery also promises to support “analysis of datasets up to hundreds of terabytes.”
Both Google and Microsoft, however, offered no details when El Reg prodded them for an explanation of just how customers can get that much data into their clouds. That's despite Microsoft telling your correspondent, in a past professional life, that it was “evaluating” such a service back in 2010.
If you think this all sounds a bit theoretical, the lack of ingestion services from the Chocolate Factory is already leading to some bizarre work-arounds.
Craig Deveson, a serial cloud entrepreneur who currently serves as CEO and Co-Founder of Wordpress backup plugin vendor cloudsafe365, says Google's lack of data ingestion services became “a genuine issue” when he worked on a Gmail migration for a large Australian software company. During that project he found the best way to get substantial quantity of old email data into Google's cloud was first to send disks to Singapore for upload into Amazon's S3 cloud storage service. Once in Amazon's cloud “we had to run a program to ingest it into Google's back end.”
Similar tricks are needed to pump lots of data into software-as-a-service providers' clouds.
Salesforce.com, for example, advised us that bulk uploads are made possible by a Bulk API which happily puts SOAP and REST to work to suck up batches of 10,000 records at a time. “Even while data is still being sent to the server, the Force.com platform submits the batches for processing,” the company said.
Pressed if disks are accepted, the company responded that “All common database products provide a capability to extract to a common file format like .csv.”
Whether anyone can afford to wait for that .csv, or other larger files, to arrive is another matter. ®
Sponsored: Hyper-scale data management