Massed x86 ranks 'blowing away' supercomputer monoliths
Dell pitches modular parallel processors
Dell has claimed it is simplifying supercomputing by replacing complex monolithic, proprietary architecture Cray-like machines with modular ones, using racks of industry-standard components.
In fact, in one way it's helping to complicate supercomputing, because writing parallelised code is so hard. But the massed ranks of x86 processors are blowing the Crays away and cloud-based HPC supply is on the horizon.
This message came across at a Dell-hosted supercomputer conference in London with UK and mainland Europe academic supercomputer users. Obviously the pitch was that simplified supercomputing with Dell provides better performance and better value. Presenters provided their snapshots of supercomputing experience, covering the search for planets in space as well as a analysing health statistics against genome data for inherited disease tendencies
"Most issues in science today are computational in nature," was the claim made by Josh Claman, Dell's EMEA public sector business head. Many scientific problems, if not most, need modelling and analysis carried out on a computer to check the theory. Supercomputing, or high-performance computing (HPC) is becoming a broad-based activity as a result. If it can be lowered in cost and made more available, then it will help science move forward.
The academic presenters, all involved with Dell-based HPC datacentres, agreed with that sentiment, being compute power-hungry service providers with budget problems.
There was much comparison of then and now to show how aggregate performance has rocketed in a kind of accelerated Moore's Law way. We heard of a leading 235 gigaflop supercomputer in 1998 contrasted with a 10 petaflop one being built now in Japan*. This, Claman claimed, was half the compute power of the human brain.
We are now in the fourth phase of supercomputer design with dense compute power in many, many clustered nodes built from commodity hardware components. A typical supercomputer today in European academia is a cluster built from racks of 30 1U multi-core Intel servers connected by InfiniBand or 10gigE, and running Linux, with a file system such as Lustre, using 200TB or more of SATA disk storage. Data is striped across the drives to get the bandwidth needed by using lots of spindles at once.
Claman said these enabling technologies are driving the broadening of supercomputing accessibility. Dell has recently been selling a cluster a day for an average price of £99,000 with an average performance of 1.4Tflops.
Where does it start?
A supercomputer starts when a multi-core scientific workstation is not enough. GPUs (graphics processing units) can be good for HPC because they are built to run many, many operations in parallel.
It means there are two types of supercomputer: the single box containing lots and lots of cores and/or graphics processing cores, compared to the clustered multi-node setup, with each node having SMP (symmetric multi-processing) processors. Some HPC applications are best suited to one or the other architecture.
Dell people see hybrid clusters developing with nodes equipped with multiple GPU cores as well as SMP cores. The programming task is characterised by the need to use many, many cores in parallel. This is getting beyond the resources of research scientists whose job is research, not writing code. An IBM supercomputer could have 1,000 cores with many applications only using a subset. The software people have to get better at writing code to use all these cores.
One user said his lab replaced a 5-year old, €1m Cray with a 4-socket Dell machine costing €60,000 and didn't tell his users. They asked him what had happened to the computer, as their jobs were running faster. He said the black art has been taken out of running these systems and the lifecycle costs of power and cooling and so forth radically reduced.
Paul Calleja, the director of Cambridge University's HPC lab, said he runs his supercomputer facility as a chargeable service, based on costed core hours, to its users. "All public sector managers know the dark days are coming, ones with zero growth budgets." He and his colleagues have to produce large efficiency gains and invest the savings in new resources. There will be no other sources of funds to buy new kit.
He bought a Dell HPC box in 2006 on a value for money basis. It has 2,300 cores in 600 Dell servers with an InfiniBand connection fabric. It replaced a Sun system which was ten times slower and cost three times as much to run. His Dell set-up cost £2m, weighs 20 tonnes, needs 1,000 amps of power and delivers 20Tflops. At one time it was the fastest academic machine in the UK.
Racks are laid out in a hot aisle/cold aisle arrangement.
It's all money in the end
A lab scientist costs £100k/year. You can double that for an experiment. He has 300 users and they cost £60m/year. The move from Sun to Dell and a tenfold performance increase must have improved the output of his users. "It's all money in the end, taxpayers' money."
Calleja upgrades his hardware every two years on a rolling procurement and keeps hardware for four years. He delivers core hours to his users and has to continually demonstrate to them that paying for his core hours is cheaper than buying their own compute facilities. He said: "We're the only fully cost-centred HPC centre in the country not relying on subsidy. We have 80 percent paying users and we're breaking even."
Why Dell? It's cheaper and extremely reliable compared to competing suppliers. He's experienced a 1 percent electronic component failure rate in two years.
He's limited by power and space constraints. Calleja is upgrading now and is deploying 50 percent more compute power for 15 percent more electricity, adding 10 percent to his space footprint and the new kit is 20 percent of the original capital cost. That means he lowers his cost per core and offers his users better value core hours.
He said there are three research pillars: experiment; theory; and simulation. Simulation, using a supercomputer, enables you to go places you can't get to by experimentation. The need for simulation is horizontal across science.
Research applications now use shared and open source code that can be parameterised to provide the specific code set needed by researchers, whose time is not best spent writing code. That has become too specialised a job.
Datasets are kept in the data centre, inside the firewall, and users come to the HPC mountain instead of the HPC mountain coming to users, with massive data set transfers across network links between users and the HPC lab.
Calleja has two steps on his processor roadmap. He's looking forward first of all to Nehalem blade servers, 4-core Xeon 5500s, with possibilities for 6- and 8-core ones. The second step is to Sandy Bridge, Intel's next architecture after Nehalem which, he says, will run 8 operations per clock cycle instead of Nehalem's 4.
The blade servers will provide many more cores per rack, driving up the heat output, and he's anticipating moving to back-of-rack water cooling
He's thinking of setting up a solid state drive capacity pool for HPC applications that need the IOPS rates that SSDs can deliver, but SSD pricing has to come down to make this worthwhile. Lustre meta data might be stored in an SSD pool.
Supercomputing weather forecast: it's going to become cloudy
Calleja is also thinking of cloud computing. He makes a clear distinction between the cloud, with computing delivered as service, and grid computing with applications split across computing grids, across geo-clusters for example. He's not keen on this because of the need for massive data set transfers amongst other things.
There is an 8,000-core Dell HPC system in Holland which is idle at night and he could, in theory, rent some of that capacity and supply it to his users. They already use what is, in effect, a cloud HPC service from his data centre in Cambridge, with datasets stored in the cloud. Altering or adding another source of HPC cores, to be accessed over links from outside the firewall, would essentially make no difference to them.
The only change they would notice would be that their research budgets go further, since the core hours they buy would be cheaper. This assumes that needed data sets could be sent to the Dutch HPC centre somehow.
Calleja is also thinking of offering HPS services to users outside Cambridge University, both to other academic institutions and to small and medium businesses needing an HPC resource for financial modelling, risk modelling, automotive and pharmaceutical applications. He is looking at putting commercial multi-gigabit fibre feeds in place, outside the academic networks, to support this.
If he can sell core hours to more clients, then his running costs go down, and his core/hour prices also go down. A couple of other universities are already looking into the idea of using Cambridge HPC resources in this way. Calleja also gets three to four enquiries a month from SMEs about his data centre's HPC facilities.
He is not alone here. The academic JANET network is looking into a shared service model of operation.
If Calleja had profits from supplying cloud HPC services then he could afford more kit. He reckons that there is a sweet spot between university HPC data centres and larger regional HPC sites and his Cambridge data centre could grow to fill it.
The logic here is to build ever larger supercomputers with more and more powerful cores, perhaps backed up with GPUs. These would be operated at a high utilisation rate by delivering highly-efficient parallelised code resources to users, billed by the core hours they use. By keeping data sets inside the HPC lab it is, ironically, becoming another example of a re-invented mainframe approach: an HPC glass house. ®
* A gigaflop is one thousand million floating point operations a second. A petaflop is one million billion such operations a second.