Feeds

Speed is the essence of WAR

Data location, location, location

Maximizing your infrastructure through virtualization

WAR on the cloud, Part 4 In part 3 I tested some different semi-cloudy solutions for mirrors of my site and I am in the process of replacing one dedicated WebVisions Linux machine with two virtual private system (VPS)es in separate AsiaPac countries, for less money in total. Ker-ching!

Cloud files: public or private?

But this still isn't really getting the full cloud religion, and in particular I haven't dealt with storage and the cloud elegantly. I have simply been letting my existing code manage local mirror caching, with no special cloud storage support at all.

At the moment, when running on a bare system (or VPS) as a mirror, when a request is made for one of the multimedia exhibits in the catalogue, the mirror feeds as much of it (if any) as it has locally from cache. Then the mirror streams the rest over from the master server while saving to local cache.

Thus for popular exhibits, after the first load from a given mirror, every other user of that mirror gets it streamed from the mirror's local cache for speed, and the first download is only a little slower than going to the master direct. This works, but mirrors don't all have the same outgoing bandwidth available, and bandwidth is most important for larger downloads.

(On Amazon's Elastic Beanstalk, if for some reason it decides that my mirror is taking too much CPU time, my mirror gets restarted and its cache discarded, which is wasteful. In part I mitigated that by not having such lightweight cloud mirrors do any significant pre-caching other than of the most popular content. If I could persist the cache through a restart I could reconsider this fix.)

Amazon and Rackspace both support (private) cloud-based storage and a public content delivery network (CDN), either of which might perform better than my existing solution, especially in conjunction with my lower-bandwidth mirrors.

The private persistent cloud storage could be used "behind" my mirrors with the mirrors as a shared concurrency-safe cache with the mirror front-ends protecting against excessive use of bandwidth, and/or the public cloud/CDN could serve appropriate files directly to end users.

Ambush bills again

With neither Rackspace nor Amazon is it possible to restrict download bandwidth and bills. Even if not feeling paranoid about DoS/DDoS attacks, pilfered bandwidth through hotlinking can be a significant nuisance. I checked with Rackspace and it's not currently possible to restrict downloads (say) to requests with a Referer header that matches a supplied regex, which would stop most casual misuse.

The lack of such first-line defences and any cost cap would imply that regular active monitoring is needed, and possibly only selectively making available via the CDN material less likely to be hotlinked (such as site static furniture) and a sampling of requests for popular items not currently setting off hotlink alarms.

On Amazon's AWS the public CDN is partitioned by geographical area (with extra complication and cost to distribute content to each region) whereas the Rackspace CDN (Akamai underneath) seems to be global. The enticing prospect therefore is that one Rackspace CDN could do much of the heavy lifting on behalf of all of the mirrors leaving them just to construct and serve the relatively small Web pages that the end user sees.

Rackspace's CDN is built with OpenStack which reduces the chance of having to throw away or redo any CDN integration work if I switch to a different CDN provider or use more than one.

CDN vs ADSL

I tested a simple face-off between Rackspace's public CDN and my home/office Apache Web site in London serving a reasonable-size binary file (a few MB), both for latency and bandwidth.

I tried pulling down the file to the following locations (within co-lo facilities, not retail broadband connections): UK (London), US (Atlanta), SG, AU (Sydney), IN (Mumbai). (Connectivity from the IN machine was sufficiently erratic and poor at the times I was testing that I have excluded it from the results.)

In general, when downloading the file from my office Apache, I could max out my link outbound at at little over 128KB/s, and the round-trip time (ie, latency) varied from 24ms to the UK machine up to 360ms for AU.

As I can construct and serve a page from one of my mirrors in typically 50ms or (much) less, these latencies to serve up page furniture are significant and annoying to end users, especially when exceeding 100ms, ie: for all but the UK.

When downloading a large file, latency is less significant than bandwidth.

With the Rackspace CDN, with content nominally uploaded to the "UK" cloud, latency to download to UK and US machines was under 1ms and the worst was SG at under 70ms which would have beaten all but the UK-to-UK serving from my own office Apache.

The bandwidth available from the Rackspace CDN was good everywhere too, maxing out my SG connection (at about 240kBytes/s) and getting as high as 22MBytes/s downloading to a UK host.

damoncloudpiecetable2

Latency and bandwidth

Note that the SG and AU servers were probably maxing out their in-bound links during the CDN tests (2Mbps and 10Mbps burstable) rather than indicating the maximum CDN bandwidth available.

Conclusion: Fit or fad?

The upshot of this simple experiment is that it is worthwhile in both latency and bandwidth terms to improve the user experience for small and large files (with perceived performance dominated by latency and bandwidth respectively) to consider taking advantage of a commodity CDN, such as that of Rackspace.

Indeed, even with my own faster mirrors I'd have difficulty matching the better CDN numbers, so if hotlinking and so on were not a worry and I wanted to maximise performance, then I should probably only serve the dynamic page content from my own mirrors, with all other material served by the CDN.

So come on Rackspace, Amazon, et al, gimme a way to control bandwidth and risks and you'll have one more customer and widen your appeal to other SMEs too... ®

The Power of One eBook: Top reasons to choose HP BladeSystem

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
Carbon tax repeal won't see data centre operators cut prices
Rackspace says electricity isn't a major cost, Equinix promises 'no levy'
Disaster Recovery upstart joins DR 'as a service' gang
Quorum joins the aaS crowd with DRaaS offering
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Application security programs and practises
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Securing Web Applications Made Simple and Scalable
Learn how automated security testing can provide a simple and scalable way to protect your web applications.