Feeds

Cloud mega-uploads aren't easy

Google, Microsoft, can't explain how to get big data into the cloud, despite rivals' import services

  • alert
  • submit to reddit

Remote control for virtualized desktops

Google and Microsoft don't offer formal data ingestion services to help users get lots of data into the cloud, and neither seems set to do so anytime soon. Quite how would-be users take advantage of the hundreds of terabytes both offer in the cloud is therefore a bit of a mystery.

Data ingestion services see cloud providers offer customers the chance to send them hard disks for rapid upload into the cloud. Amazon Web Services' import/export service was among the first such services and offers the chance to ingest up to 16TB of data, provided it is no more than 14 inches high by 19 inches wide by 36 inches deep (8Us in a standard 19 inch rack) and weighs less than 50 pounds.

Rackspace offers a similar service, dubbed Cloud Files Bulk Import. Optus, the Australian arm of telecoms giant Singtel, will happily offer a similar service. Australian cloud Ninefold does likewise, branding it "Sneakernet".

Some other cloud providers offer such a service, even if it is not productised or advertised. The Register spoke to one cloudy migrant who (after requesting anonymity) told us they borrowed a desktop network attached storage (NAS) device from their new cloud provider, bought another, uploaded data to the devices and then despatched a staffer on a flight to the cloud facility. The NASes were carry-on luggage and the travelling staffer cradled them on their lap during the flight.

It was worth going to those lengths because, as AWS points out in the spiel for its import/export service, doing so “is often much faster than transferring that data via the Internet.”

To understand why, consider the fact that headline speeds advertised on broadband connections aren't always achieved in the real world. Optus, for example, told us that while its fastest broadband connection hums along at 3-5 Gbps, the standard service level agreement “guarantees a speed of 300 Mbps, above which we would conduct fibre checks to ensure additional capacity can be reserved for the customer.” At that speed each terabyte would take about eight hours to upload, and that's with an optimistic assumption of 10% overhead and general network messiness.

It's hard to imagine how that kind of speed will be of any use for cloud services which offer petabyte-scale cloud storage, such as Azure's (or whatever it is called this week) pricing tier for amounts of data “Greater than 5 PB.” Google's BigQuery also promises to support “analysis of datasets up to hundreds of terabytes.”

Both Google and Microsoft, however, offered no details when El Reg prodded them for an explanation of just how customers can get that much data into their clouds. That's despite Microsoft telling your correspondent, in a past professional life, that it was “evaluating” such a service back in 2010.

If you think this all sounds a bit theoretical, the lack of ingestion services from the Chocolate Factory is already leading to some bizarre work-arounds.

Craig Deveson, a serial cloud entrepreneur who currently serves as CEO and Co-Founder of Wordpress backup plugin vendor cloudsafe365, says Google's lack of data ingestion services became “a genuine issue” when he worked on a Gmail migration for a large Australian software company. During that project he found the best way to get substantial quantity of old email data into Google's cloud was first to send disks to Singapore for upload into Amazon's S3 cloud storage service. Once in Amazon's cloud “we had to run a program to ingest it into Google's back end.”

Similar tricks are needed to pump lots of data into software-as-a-service providers' clouds.

Salesforce.com, for example, advised us that bulk uploads are made possible by a Bulk API which happily puts SOAP and REST to work to suck up batches of 10,000 records at a time. “Even while data is still being sent to the server, the Force.com platform submits the batches for processing,” the company said.

Pressed if disks are accepted, the company responded that “All common database products provide a capability to extract to a common file format like .csv.”

Whether anyone can afford to wait for that .csv, or other larger files, to arrive is another matter. ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
NSA SOURCE CODE LEAK: Information slurp tools to appear online
Now you can run your own intelligence agency
Azure TITSUP caused by INFINITE LOOP
Fat fingered geo-block kept Aussies in the dark
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
Cloud unicorns are extinct so DiData cloud mess was YOUR fault
Applications need to be built to handle TITSUP incidents
Stop the IoT revolution! We need to figure out packet sizes first
Researchers test 802.15.4 and find we know nuh-think! about large scale sensor network ops
Turnbull should spare us all airline-magazine-grade cloud hype
Box-hugger is not a dirty word, Minister. Box-huggers make the cloud WORK
SanDisk vows: We'll have a 16TB SSD WHOPPER by 2016
Flash WORM has a serious use for archived photos and videos
Astro-boffins start opening universe simulation data
Got a supercomputer? Want to simulate a universe? Here you go
Microsoft adds video offering to Office 365. Oh NOES, you'll need Adobe Flash
Lovely presentations... but not on your Flash-hating mobe
prev story

Whitepapers

Free virtual appliance for wire data analytics
The ExtraHop Discovery Edition is a free virtual appliance will help you to discover the performance of your applications across the network, web, VDI, database, and storage tiers.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
10 threats to successful enterprise endpoint backup
10 threats to a successful backup including issues with BYOD, slow backups and ineffective security.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Website security in corporate America
Find out how you rank among other IT managers testing your website's vulnerabilities.