Feeds

Cloud mega-uploads aren't easy

Google, Microsoft, can't explain how to get big data into the cloud, despite rivals' import services

  • alert
  • submit to reddit

Securing Web Applications Made Simple and Scalable

Google and Microsoft don't offer formal data ingestion services to help users get lots of data into the cloud, and neither seems set to do so anytime soon. Quite how would-be users take advantage of the hundreds of terabytes both offer in the cloud is therefore a bit of a mystery.

Data ingestion services see cloud providers offer customers the chance to send them hard disks for rapid upload into the cloud. Amazon Web Services' import/export service was among the first such services and offers the chance to ingest up to 16TB of data, provided it is no more than 14 inches high by 19 inches wide by 36 inches deep (8Us in a standard 19 inch rack) and weighs less than 50 pounds.

Rackspace offers a similar service, dubbed Cloud Files Bulk Import. Optus, the Australian arm of telecoms giant Singtel, will happily offer a similar service. Australian cloud Ninefold does likewise, branding it "Sneakernet".

Some other cloud providers offer such a service, even if it is not productised or advertised. The Register spoke to one cloudy migrant who (after requesting anonymity) told us they borrowed a desktop network attached storage (NAS) device from their new cloud provider, bought another, uploaded data to the devices and then despatched a staffer on a flight to the cloud facility. The NASes were carry-on luggage and the travelling staffer cradled them on their lap during the flight.

It was worth going to those lengths because, as AWS points out in the spiel for its import/export service, doing so “is often much faster than transferring that data via the Internet.”

To understand why, consider the fact that headline speeds advertised on broadband connections aren't always achieved in the real world. Optus, for example, told us that while its fastest broadband connection hums along at 3-5 Gbps, the standard service level agreement “guarantees a speed of 300 Mbps, above which we would conduct fibre checks to ensure additional capacity can be reserved for the customer.” At that speed each terabyte would take about eight hours to upload, and that's with an optimistic assumption of 10% overhead and general network messiness.

It's hard to imagine how that kind of speed will be of any use for cloud services which offer petabyte-scale cloud storage, such as Azure's (or whatever it is called this week) pricing tier for amounts of data “Greater than 5 PB.” Google's BigQuery also promises to support “analysis of datasets up to hundreds of terabytes.”

Both Google and Microsoft, however, offered no details when El Reg prodded them for an explanation of just how customers can get that much data into their clouds. That's despite Microsoft telling your correspondent, in a past professional life, that it was “evaluating” such a service back in 2010.

If you think this all sounds a bit theoretical, the lack of ingestion services from the Chocolate Factory is already leading to some bizarre work-arounds.

Craig Deveson, a serial cloud entrepreneur who currently serves as CEO and Co-Founder of Wordpress backup plugin vendor cloudsafe365, says Google's lack of data ingestion services became “a genuine issue” when he worked on a Gmail migration for a large Australian software company. During that project he found the best way to get substantial quantity of old email data into Google's cloud was first to send disks to Singapore for upload into Amazon's S3 cloud storage service. Once in Amazon's cloud “we had to run a program to ingest it into Google's back end.”

Similar tricks are needed to pump lots of data into software-as-a-service providers' clouds.

Salesforce.com, for example, advised us that bulk uploads are made possible by a Bulk API which happily puts SOAP and REST to work to suck up batches of 10,000 records at a time. “Even while data is still being sent to the server, the Force.com platform submits the batches for processing,” the company said.

Pressed if disks are accepted, the company responded that “All common database products provide a capability to extract to a common file format like .csv.”

Whether anyone can afford to wait for that .csv, or other larger files, to arrive is another matter. ®

The Essential Guide to IT Transformation

More from The Register

next story
EU's top data cops to meet Google, Microsoft et al over 'right to be forgotten'
Plan to hammer out 'coherent' guidelines. Good luck chaps!
Manic malware Mayhem spreads through Linux, FreeBSD web servers
And how Google could cripple infection rate in a second
FLAPE – the next BIG THING in storage
Find cold data with flash, transmit it from tape
Seagate chances ARM with NAS boxes for the SOHO crowd
There's an Atom-powered offering, too
Intel teaches Oracle how to become the latest and greatest Xeon Whisperer
E7-8895 v2 chips are best of the bunch, and with firmware-unlocked speed control
Gartner: To the right, to the right – biz sync firms who've won in a box to the right...
Magic quadrant: Top marks for, er, completeness of vision, EMC
prev story

Whitepapers

Top three mobile application threats
Prevent sensitive data leakage over insecure channels or stolen mobile devices.
The Essential Guide to IT Transformation
ServiceNow discusses three IT transformations that can help CIO's automate IT services to transform IT and the enterprise.
Mobile application security vulnerability report
The alarming realities regarding the sheer number of applications vulnerable to attack, and the most common and easily addressable vulnerability errors.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Consolidation: the foundation for IT and business transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.