Feeds

Amazon rejigs EC2 to run parallel HPC apps

A veritable cluster

The essential guide to IT transformation

Online retailer and IT disrupter Amazon is getting its high performance computing act together on its Elastic Compute Cloud (EC2) service by allowing customers to spin up tightly coupled virtual server nodes to run real-world, parallel supercomputing applications.

On Tuesday, Amazon Web Services launched a new service called Cluster Compute Instances, which takes a bunch of x64 servers using Intel's Xeon processors and links them together using 10 Gigabit Ethernet interfaces and switches. As you can see from the Cluster Compute Instances sign-up page, the EC2 virtual server slices function just like any other sold by Amazon, except that the HPC variants have 10 Gigabit Ethernet links and also have a specific hardware profile so propellerheads can seriously tune their applications to run well.

With other EC2 slices, you never know what specific iron you are going to get when you buy a small, medium, large, or extra large virtual slice rated at a certain number of EC2 compute units.

In the case of HPC-specific slices, Amazon is providing a slice that has a two-socket x64 server based on Intel's Xeon X5570, which has a clock speed of 2.93 GHz and 8 MB of on-chip cache memory. Those processors are in the quad-core "Nehalem-EP" family that was announced by Intel in March 2009, not the latest six-core "Westmere-EP" Xeon 5600s that debuted in March of this year. (Amazon could easily plug six-core Xeon 5600s in these machines, since they are socket compatible with the Xeon 5500s).

This server represents an aggregate of 33.5 EC2 compute units and presents 23 GB of virtual memory to the HPC application running atop it. This is four times the extra large EC2 slice in terms of compute units, according to Amazon. The chips run in 64-bit mode, which is necessary to address more than 2 GB of memory in a node.

HPC shops are not generally keen on hypervisors because they eat CPU cycles and generally add network and storage I/O latencies, but at a certain price, some people will try anything and make do, and thus the Cluster Compute Instances on EC2 are based on the Amazon variant of the Xen hypervisor (called Hardware Virtual Machine, or HVM) to virtualize the server's hardware. Amazon requires that the cluster nodes be loaded with an Amazon Machine Image (AMI) stored on Amazon's Elastic Block Storage (EBS) storage cloud.

At the moment, Amazon is restricting the cluster size to eight instances, for a total of 64 cores. This is not a particularly large cluster, probably something on the order of 750 gigaflops of peak theoretical number-crunching oomph before you take out the overhead of virtualization. But it is more than a lot of researchers have on their workstations and PCs, and that is the point. If you want to get more oomph, you can request it.

Clearly larger configurations will not only be available, but are necessary. In the announcement, Lawrence Berkeley National Laboratory, which had been testing HPC applications on the EC2 cloud, said that the new Cluster Compute Instances had a factor of 8.5 times better performance than other EC2 instances that it had been testing. While LBNL was not specific, presumably it was using slow Gigabit Ethernet and perhaps less impressive iron. (Amazon had better hope that was the case).

Peter De Santis, general manager of the EC2 service at Amazon, said that an 880 server sub-cluster was configured to run the Linpack Fortran benchmark test to rank supercomputer power, and was able to deliver 41.82 teraflops (presumably sustained performance, not peak). If by "server" De Santis meant a physical server, then roughly half of the peak flops in the machines are going up the chimney on the EC2 slices.

That sounds pretty awful, but if you sift through the latest Top 500 rankings to find an x64 cluster using 10 Gigabit Ethernet interconnects, you'll see the fattest one is the "Coates" cluster at Purdue University, which is based on 7,944 quad-core Opterons running at 2.5 GHz cores, is rated at a peak 79.44 teraflops but on the Linpack test only delivers 52.2 teraflops. So 34 per cent of the flops on the unvirtualized cluster go up the chimney.

InfiniBand networks deliver a much better ratio because of their higher bandwidth and lower latency, which is why HPC shops prefer them and why Amazon will eventually have to offer InfiniBand too, if it wants serious HPC business. And eventually, Amazon will also have to offer GPU co-processors as well because codes are being adapted to use their relatively cheap teraflops.

As you can see from Amazon's EC2 price list, the Cluster Compute Instances cost $1.60 per hour for on-demand slices, which is actually quite a bit less than the $2.40 per hour Amazon is charging for generic quadruple extra large instances with fat memory. So it looks like Amazon understands that HPC shops are cheapskates compared to other kinds of IT organizations. If you want to reserve an HPC instance for a whole year, you're talking $4,290 and for three years, it's $6,590, plus 56 cents per hour usage.

The HPC slices on EC2 are available running Linux operating systems and are for the moment restricted to the North Virginia region of Amazon's distributed data centers in the United States. (Right next to good old Uncle Sam). No word on when the other regions in the US get HPC slices, or when it will be available in other geographies. Amazon had not returned calls as El Reg went to press to get some more insight into how it will be rolled out in Amazon's Northern California, Ireland, and Singapore data centers. ®

Boost IT visibility and business value

More from The Register

next story
Pay to play: The hidden cost of software defined everything
Enter credit card details if you want that system you bought to actually be useful
Shoot-em-up: Sony Online Entertainment hit by 'large scale DDoS attack'
Games disrupted as firm struggles to control network
HP busts out new ProLiant Gen9 servers
Think those are cool? Wait till you get a load of our racks
Silicon Valley jolted by magnitude 6.1 quake – its biggest in 25 years
Did the earth move for you at VMworld – oh, OK. It just did. A lot
VMware's high-wire balancing act: EVO might drag us ALL down
Get it right, EMC, or there'll be STORAGE CIVIL WAR. Mark my words
Forrester says it's time to give up on physical storage arrays
The physical/virtual storage tipping point may just have arrived
prev story

Whitepapers

Top 10 endpoint backup mistakes
Avoid the ten endpoint backup mistakes to ensure that your critical corporate data is protected and end user productivity is improved.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Backing up distributed data
Eliminating the redundant use of bandwidth and storage capacity and application consolidation in the modern data center.
The essential guide to IT transformation
ServiceNow discusses three IT transformations that can help CIOs automate IT services to transform IT and the enterprise
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.