Brit uni builds its own supercomputer from secondhand parts

She may not look like much, but she's got it where it counts, kid

COSMA6 can be used for galactic simulations

Durham University has built itself a secondhand supercomputer from recycled parts and beefed up its contribution to DiRAC (distributed research utilising advanced computing), the integrated facility for theoretical modelling and HPC-based research in particle physics, astronomy and cosmology.

The Institute for Cosmological Computing (ICC) at Durham, in northeast England, runs a COSMA5 system as its DiRAC contribution.

There are five DiRAC installations in the UK, which is a world leader in these HPC fields:

  • Cambridge HPCS Service: Data Analytic Cluster – 9,600 cores (200TFLOPS), 0.75PB (raw) parallel file store, high performance infiniband IO and interconnect (node-to-storage 7GB/s), a single 600 port non-blocking switch and 4GB RAM per core
  • Cambridge COSMOS SHARED MEMORY Service – 1,856 cores (42 TFLOPS), 14.8TB globally shared memory (8GB RAM per core), 146TB high Performance scratch storage, 31 Intel Xeon Phi co-processors capability
  • Leicester IT Services: Complexity Cluster – 4,352 cores (95TFLOPS), 0.8PB parallel file store, high performance IO and interconnect, non-blocking switch architecture, 8GB RAM per core
  • Durham ICC Service: Data Centric Cluster – 6,500 cores, 2PB parallel file store, high performance IO and interconnect, 2:1 blocking switch architecture, 8GB RAM per core
  • Edinburgh 6144 node BlueGene/Q – 98,304 cores, 5D Torus Interconnect, high performance IO and interconnect

The Durham cluster listed above is a COSMA5 system, which features 420 IBM iDataPlex dx360 M4 servers with a 6m720 2.6 GHz Intel Sandy Bridge Xeon E5-2670 CPU cores. There is 53.76TB of DDR3 RAM and Mellanox FDR10 Infiniband in a 2:1 blocking configuration.

It has 2.5PB of DDN storage with two SD12K controllers configured in fully redundant mode. It's served by six GPFS servers connected into the controllers over full FDR and using RDMA over the FDR10 network into the compute cluster. COSMA5 uses the GPFS file system with LSF as its job scheduler.

The ICC and DiRAC needed to strengthen this system and found that the Hartree Centre at Daresbury had a supercomputer it needed rid of. This HPC system was installed in April 2012 but had to go because Daresbury had newer kit.

Durham had a machine room with power and cooling that could take it. Even better, its configuration was remarkably similar to COSMA5.

So HPC, storage and data analytics integrator OCF, and server relocation and data centre migration specialist Technimove dismantled, transported, and rebuilt the machine at the ICC. The whole exercise was funded by the Science and Technology Facilities Council.

COSMA6 arrived at Durham in April 2016, and was installed and tested at the ICC. It now extends Durham's DiRAC system as part of DiRAC 2.5.

COSMA6 has:

  • 497 IBM iDataPlex dx360 M4 server compute nodes
  • 7,952 Sandy Bridge Xeon E5-2670 cores
  • More than 35TB of DDR3 DRAM
  • Mellanox FDR10 InfiniBand switches in 2:1 blocking configuration connects the cores
  • DDN Exascalar storage:
    • 2.5PB data space served by 8 OSSs and 2 MDSs
    • 1.8PB Intel Lustre Scratch space served by six OSSs and two MDSs using IP over IB and RDMA to the cluster

The Lustre filesystem and SLURM are used for its job submission system.

COSMA6

COSMA6 racks

Lydia Heck, ICC technical director, said: "While it was quite an effort to bring it to its current state, as it is the same architecture and the same network layout as our previous system, we expect this to run very well."

Durham now has both COSMA5 (6,500 cores) and COSMA6 (8,000 cores) contributing to DiRAC and available for researchers.

Find out how to access and use DiRAC here. ®

Sponsored: The Joy and Pain of Buying IT - Have Your Say


Biting the hand that feeds IT © 1998–2017