Feeds

Speed is the essence of WAR

Data location, location, location

Internet Security Threat Report 2014

WAR on the cloud, Part 4 In part 3 I tested some different semi-cloudy solutions for mirrors of my site and I am in the process of replacing one dedicated WebVisions Linux machine with two virtual private system (VPS)es in separate AsiaPac countries, for less money in total. Ker-ching!

Cloud files: public or private?

But this still isn't really getting the full cloud religion, and in particular I haven't dealt with storage and the cloud elegantly. I have simply been letting my existing code manage local mirror caching, with no special cloud storage support at all.

At the moment, when running on a bare system (or VPS) as a mirror, when a request is made for one of the multimedia exhibits in the catalogue, the mirror feeds as much of it (if any) as it has locally from cache. Then the mirror streams the rest over from the master server while saving to local cache.

Thus for popular exhibits, after the first load from a given mirror, every other user of that mirror gets it streamed from the mirror's local cache for speed, and the first download is only a little slower than going to the master direct. This works, but mirrors don't all have the same outgoing bandwidth available, and bandwidth is most important for larger downloads.

(On Amazon's Elastic Beanstalk, if for some reason it decides that my mirror is taking too much CPU time, my mirror gets restarted and its cache discarded, which is wasteful. In part I mitigated that by not having such lightweight cloud mirrors do any significant pre-caching other than of the most popular content. If I could persist the cache through a restart I could reconsider this fix.)

Amazon and Rackspace both support (private) cloud-based storage and a public content delivery network (CDN), either of which might perform better than my existing solution, especially in conjunction with my lower-bandwidth mirrors.

The private persistent cloud storage could be used "behind" my mirrors with the mirrors as a shared concurrency-safe cache with the mirror front-ends protecting against excessive use of bandwidth, and/or the public cloud/CDN could serve appropriate files directly to end users.

Ambush bills again

With neither Rackspace nor Amazon is it possible to restrict download bandwidth and bills. Even if not feeling paranoid about DoS/DDoS attacks, pilfered bandwidth through hotlinking can be a significant nuisance. I checked with Rackspace and it's not currently possible to restrict downloads (say) to requests with a Referer header that matches a supplied regex, which would stop most casual misuse.

The lack of such first-line defences and any cost cap would imply that regular active monitoring is needed, and possibly only selectively making available via the CDN material less likely to be hotlinked (such as site static furniture) and a sampling of requests for popular items not currently setting off hotlink alarms.

On Amazon's AWS the public CDN is partitioned by geographical area (with extra complication and cost to distribute content to each region) whereas the Rackspace CDN (Akamai underneath) seems to be global. The enticing prospect therefore is that one Rackspace CDN could do much of the heavy lifting on behalf of all of the mirrors leaving them just to construct and serve the relatively small Web pages that the end user sees.

Rackspace's CDN is built with OpenStack which reduces the chance of having to throw away or redo any CDN integration work if I switch to a different CDN provider or use more than one.

CDN vs ADSL

I tested a simple face-off between Rackspace's public CDN and my home/office Apache Web site in London serving a reasonable-size binary file (a few MB), both for latency and bandwidth.

I tried pulling down the file to the following locations (within co-lo facilities, not retail broadband connections): UK (London), US (Atlanta), SG, AU (Sydney), IN (Mumbai). (Connectivity from the IN machine was sufficiently erratic and poor at the times I was testing that I have excluded it from the results.)

In general, when downloading the file from my office Apache, I could max out my link outbound at at little over 128KB/s, and the round-trip time (ie, latency) varied from 24ms to the UK machine up to 360ms for AU.

As I can construct and serve a page from one of my mirrors in typically 50ms or (much) less, these latencies to serve up page furniture are significant and annoying to end users, especially when exceeding 100ms, ie: for all but the UK.

When downloading a large file, latency is less significant than bandwidth.

With the Rackspace CDN, with content nominally uploaded to the "UK" cloud, latency to download to UK and US machines was under 1ms and the worst was SG at under 70ms which would have beaten all but the UK-to-UK serving from my own office Apache.

The bandwidth available from the Rackspace CDN was good everywhere too, maxing out my SG connection (at about 240kBytes/s) and getting as high as 22MBytes/s downloading to a UK host.

damoncloudpiecetable2

Latency and bandwidth

Note that the SG and AU servers were probably maxing out their in-bound links during the CDN tests (2Mbps and 10Mbps burstable) rather than indicating the maximum CDN bandwidth available.

Conclusion: Fit or fad?

The upshot of this simple experiment is that it is worthwhile in both latency and bandwidth terms to improve the user experience for small and large files (with perceived performance dominated by latency and bandwidth respectively) to consider taking advantage of a commodity CDN, such as that of Rackspace.

Indeed, even with my own faster mirrors I'd have difficulty matching the better CDN numbers, so if hotlinking and so on were not a worry and I wanted to maximise performance, then I should probably only serve the dynamic page content from my own mirrors, with all other material served by the CDN.

So come on Rackspace, Amazon, et al, gimme a way to control bandwidth and risks and you'll have one more customer and widen your appeal to other SMEs too... ®

Beginner's guide to SSL certificates

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Oracle hires former SAP exec for cloudy push
'We know Larry said cloud was gibberish, and insane, and idiotic, but...'
Symantec backs out of Backup Exec: Plans to can appliance in Jan
Will still provide support to existing customers
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.