Original URL: http://www.theregister.co.uk/2011/05/27/cloud_migration_part_3/

Rackspace cloud prepared for WAR, but Google AE chokes

Cloud providers leave Damon frowning over mirrors

By Damon Hart-Davis

Posted in Cloud, 27th May 2011 11:00 GMT

WAR on the Cloud, Part 3 I'm moving mirrors of my busy-ish website from my hand-crafted dedicated colo solution into the cloud to try to get geographically closer to my global user-base, reduce latency and improve perceived performance, save money, and hopefully make administration easier.

In part 2 of this series, I managed to get a minimal fairly "dumb" WAR file running in the Amazon AWS cloud and established that it wouldn't be hostile to some features which I rely upon to improve user experience, such as seeing the client's IP address.

I also found that it is possible to rack up minor charges even in the "free" tier when not doing anything exceptional, and my bill over a few weeks with AWS has now surged ahead to a massive 10 cents (except that AWS has "forgiven" the 3c bill from March), and that there is no way to explicitly cap a monthly bill.

Over the past fortnight or so I've moved one leg of my real website into AWS, and given the code and configuration a few days' tweaking to make it play nicely. It also seems that the only really uncontrollable bill would be from inbound bandwidth, so for example an infinite loop is only a problem if you let AWS bring up new instances indefinitely in response to CPU overload/utilisation.

This time I've also been able to investigate and try alternatives to AWS: less cloudy and buzzword-compliant but still more efficient in use of physical resources than my current dedicated boxes. Since my aim is to get mirrors geographically close to my (scattered) users to improve performance, and since my site consumes about 1TB per month, running something cheap from my bedroom (or any other single server) doesn't cut it!

The candidate alternatives for this round are:

Performance and bandwidth

My current servers are old or severely resource-constrained, and the solutions I tested seemed fast enough and indeed faster than some of my existing kit. The minimum requirement for my mirrors is roughly 512MB Java heap (though I can run with half that), 500MHz x86 CPU, and 1GB to 2GB of local working storage ("javax.servlet.context.tempdir") for cache.

Pointing users at a mirror geographically close to them should reduce latency (ie, response time) and may often be more important than CPU ooomph. I'd like lots of small cheap mirrors outside my main US and UK visitor areas.

Another issue is bandwidth: typical use of the site involves some low-bandwidth browsing through a catalogue, followed by the occasional download of a several-MB multimedia file. So the average bandwidth requirements are modest, but I'd like to be able to "burst" to many Mbps so that users with fast connections can download in a few seconds. With one of my current hosts I have a capped "95th percentile" bandwidth agreement which works well for this and reflects real wholesale costs, but all the solutions that I investigated this time around were either pay-by-the-byte without any cost cap, or fixed-maximum-bandwidth with a fixed cost, neither being optimal for reasons of my risk or user experience.

I want an individual cloud mirror to go offline if well over bandwidth (or CPU) budget since in most cases other mirrors – especially those on cost-capped hosting arrangements – can take up the slack, at least until they're full. I already limit peak and monthly-average data outflow from each mirror, and attempt to reduce hotlinking, etc, but an error on my part, or a DoS problem from someone else's DNS error, or broken links, or even a gone-wild search-engine bot (I've seen all of these happen), remain a threat – and this is discounting malicious doings. Google's App Server seems to be capable of automatically pulling the plug in this sort of circumstance, and providers with capped-cost plans are shouldering the risk themselves, so why not some support from the others too?

(I'm not expecting any cloud provider to indefinitely protect a site that's deliberately brought a storm of contumely, ie the brown stuff, on itself, such as by taunting 1337 h4x0rz. But dropping any DNS entry, server interface, and possibly even blackholing routing to a drowning site automatically could save a lot of tears and $$$s all round.)

Note that the only non-US/UK "cloudy" (and English-friendly) solution that I was able to locate was by gently twisting the arm of my existing provider, Webvisions. Other suggestions would be very welcome in comments.

Ease of set-up and management

To run a mirror I need a fairly bare-bones *nix system with JDK1.6, a newish Tomcat (4 or 6) and a minimal attack surface (no spurious services exposed), and /etc/resolv.conf set up for DNS resolution. Then I can drop my WAR file into Tomcat and off we go. Alternatively, the AWS-like solution is to tweak Java settings, upload my WAR file into their container, and again we have traction.

Google App Engine

Unfortunately the Google App Engine fell at the first hurdle as it seems that GAE would choke on my standard WAR file that fires off lots of background threads and that supports long-running operations. The GAE experience seems too different to the alternatives to be worth the development and maintenance effort for now.

Rackspace

Rackspace (UK) offers cloud services, servers and "files".

For this evaluation I am running a single "virtual" Linux server in their UK cloud, taking some strain off my existing main UK server/mirror.

Virtual server hosting is less funky than AWS, but very close to how I run my other machines, so should work with my system "as is" and present few surprises.

Rackspace provides a handy calculator which suggests that for a maybe 100GB-ish monthly outflow, and 512MB memory, I'll pay about GBP40/m in Rackspace fees. (12p/GB flow + 2p/h for a 512MB memory system.)

To replace my current main UK server at ~330GB/m traffic and 1GB+ memory would come to ~GBP70/m vs the ~GBP250/m that I currently pay for my colo, though it hosts some other services for me too and my bandwidth charges are effectively capped.

After my stand-off with Rackspace's sign-up and billing system, even after fixes by them I still couldn't get in without a warm body at the other end of the line.

However, Rackspace does seem to live up to their "fanatical" claim: other than the sign-up glitch, all humans that I have spoken to from tech support to PR flak have been helpful and polite and have often called me (ie, at the company's expense). The only difficulty I had setting up their virtual machine, other than choosing from a wide selection of Linux distros, was with the software firewall (iptables) that I hadn't realised Rackspace sets up by default. Again, tech support was quick and helpful and had me sorted on live chat in minutes. I'm still bowled over by their charm offensive as you can tell: fanboi, moi?

One attractive Rackspace console feature is the ability to easily resize my live mirror without disturbing its content. I started off with a larger configuration than strictly necessary for some elbow room, then downsized to halve recurring fees in 10 minutes once everything was running smoothly.

RSCONSOLE3

Rackspace's web console seems fairly sophisticated and robust, and on a par with AWS in many respects. One nice feature is the ability to run a terminal console up in my browser, though it was somewhat more clunky than a direct SSH session.

The Rackspace mirror sucked in the most traffic even in "stealth" mode – ~25GB/m – suggesting that maybe I need to boost capacity here at home in the UK.

Rackspace charges by the GB of data in and out without any budget cap, and limits outbound bandwidth depending on server memory size.

Amazon

I've spent several days tweaking and adjusting to fit the virtualised model where CPU and bandwidth are likely to be chargeable, as described before, and I think my code has benefited from the enforced spring-clean.

The simple browse/upload of new WAR versions and performance monitoring in the AWS console are great features, and the possibility that a single such upload might be able to deploy in several locations at once is also enticing.

A downside is that the "Elastic Beanstalk" service is US-only for now, and I'm well served there further down the east coast. My AWS-hosted mirror only pulls in about 12GB/m traffic, which suggests that there is little value in staying in EB until it is available outside the US.

And in spite of the time lavished on this outpost, I have had difficulty keeping my AWS instance stable. Every once in a while (down to every couple of days now) the AWS monitoring decides that my system is unresponsive and restarts it, clearing my local cache in passing. My code runs fine for months on end on all sorts of hardware, and monitors various parameters such as CPU and memory and bandwidth and time-to-service user requests in order to regulate itself. This seems not to be enough in AWS. For example, the CPU load obtained from the underlying OS is not virtualised to match any sharing of resources, so can be indicating ~10 per cent use when the AWS monitoring is claiming that I'm saturating the machine with 100 per cent utilisation. Very odd ...

If I progress this AWS installation I may have to put in some work to use their APIs to poll system state though that may also be chargeable service.

I have never received an alert email from AWS, even though I had signed up to do so when significant system events happen, but their ra-ra marketing SPAM keeps rolling in.

AWS has been helpful and polite and reasonably quick in responding to me via normal support channels and in forums, but the lack of a bill cap, the instability and the US-only Elastic Beanstalk service are all bad.

Webvisions

I already host in a couple of locations in AsiaPac with Webvisions, but have in the past had to abandon hosting in Singapore and Beijing with them because the traffic was too small to justify the costs. Maybe, with some more cost-effective "cloudy" solution I can spend less money and target more locations, or better justify my current locations ...

Webvisions provides a simple monitoring service which in the first instance sends an email when a server stops responding, but Webvisions follows up with a phone call if need be; I've been impressed by their general level of service.

Webvision's calculator indicates that I'd probably need to spend something like £50/m for a small 1GB/1GHz Linux mirror (their prices are in Singapore dollars):

Virtual Machine 1: $50 Setup

Note that the set-up charge effectively requires a longer-term commitment than the pay-as-you-go AWS and Rackspace models, where if you're not actually using resources you needn't pay anything. You can provision a new server at any time without delay once the account is open.

The demo/trial system set up for me was the next notch up in most dimensions (faster CPU, more RAM, more disc, more bandwidth) and was definitely nippy.

As the Webvisions configuration already includes a Tomcat and JDK, I did not install my own as I usually would, but in any case the setup was quick and essentially the same as usual.

I directly logged in via SSH to do everything I needed, no web console required.

The Webvisions install pulled in the least traffic (10GB/m), as expected, though not bad compared to the AWS instance, and at maybe one-third the cost of a dedicated Webvisions Linux host. The shared bandwidth isn't burstable, but for its low fixed cost I can live with that. This is definitely the most enticing from a business point of view.

Management wishlist

What I'd really like is for one simple operation to update/upgrade several mirrors as opposed to me manually working through error-prone steps manually as root for each location. With four or five servers, my current mechanisms are OK, and I don't always update all mirrors with every new micro-version (indeed I effectively beta-test some stuff on quieter mirrors before rolling out elsewhere), but if I was to continue running the eight servers during this test, or even more, I'd really benefit from a semi-automated approach.

Conclusions

I really could cut costs (and possibly energy/carbon footprint) by two times or more if I switched my mirrors to the services investigated above.

Webvisions is the best so far from a business perspective, and Rackspace (UK) did better than expected, though the Webvisions model is slightly less nimble than the pay-as-you-go and online provisioning of AWS and Rackspace.

In any case I'd expect to keep my mirrors spread across several providers ("cloud-of-cloud" style) as Tim Worstall suggests: to be largely immune to events like the recent AWS downtime at the cost of a little more management complexity for me.

What's next

Offloading to cloud storage, when in environments that support it, should (for example) make my systems more robust across restarts and give better performance to end users.

It may also be worth tweaking my code, especially for AWS, to work well with multiple instances behind a load balancer on a single URL. That might at least mask some of the instability issues I was seeing with AWS. ®