The configuration, which Cycle Computing refers to as "Tanuki" in case you want to run a job on the same setup, was configured on February 3 to run on top of 1,250 physical servers inside of Amazon's North American data center. The cluster was configured with a 2PB file system Cycle Computing has set up virtualized file systems for HPC jobs on EC2 that have spanned up to 14PB so far, according to Stowe. It is not clear how far it can push capacity, moving more data than this can take days or weeks, depending on the network connection between a company and Amazon's data center.
The Tanuki configuration includes 1,250 of Amazon's extra large instances, which each have eight virtual cores and 7GB of main memory allocated to them. The Condor collector and negotiator as well as CycleServer ran a four-core extra large instance with 17.1GB of memory, and the primary scheduler and disk filer ran on another extra large instance with eight cores and 7GB of memory. The auxiliary schedulers for the cluster were on two large two-core instances with 7.5GB of memory. So the whole shebang had 1,254 servers, 10,014 virtual cores, 8.6TB of aggregate memory, and 2PB of disk. The virtual servers were configured with CentOS 5.4, the freebie clone of Red Hat Enterprise Linux.
Using the CycleCloud and CycleServer tools, it took 15 minutes to fire up the first 2,000 cores in the virtual cluster and within 45 minutes, all of the 10,000 virtual cores on those 1,250 servers were up and running. Stowe used the open source Chef configuration management tool to tweak the settings on each virtual server instance on EC2. Then Genentech loaded up its code and ran the job for eight hours at a total cost of $8,480, including EC2 compute and S3 storage capacity charges from Amazon and the fee for using the Cycle Computing tools as a service. That works out to $1,060 per hour for a 10,000-core cluster. Cycle Computing charges a 20 per cent premium over the raw EC2 capacity as its management fee.
Stowe did some back-of-the-envelope math and says that to buy the physical servers, storage, switching, and plunk it into a data center and operate it – including power, cooling, and people costs – would require around $5m. Assuming the data center is free – which it is not – then that works out to about $571 per hour for the cluster per year, if you write down the cost of the hardware in one year. However, that is not the point. What is the point is that researchers in the HPC space want to run simulations every once in a while, not own clusters and be in the IT utility business, trying to keep their clusters busy and yet available for their own work.
"We're really trying to democratize this so any researcher can get a grant and run their code," says Stowe. "Rather than buying two servers and waiting several months for a simulation to complete, they can spend less money and get their results in a matter of hours." ®
Sponsored: Webcast: Simplify data protection on AWS