The Register® — Biting the hand that feeds IT

Feeds

Big guns turn sights on cancer-causing genes

Backup is a lifesaver

Agentless Backup is Not a Myth

In the heart of London, researchers are splitting apart the building blocks of life and working towards a cure for cancer. The Biomedical Research Centre, run by King’s College Hospital and Guy's and St Thomas’ NHS Foundation Trust, has a genomic sequencing unit that genotypes tissue from patients with cancer and other diseases.

One of only five such units in the UK, the organisation generates huge amounts of data. Every ten days, the sequencers that it uses to process DNA generate about 400Gb of useable data.

But until recently, it wasn’t that good at processing or storing the results. It didn’t even have any backup capabilities. The facility was processing the data using desktop machines in addition to its central server and storing the valuable results on locally attached hard drives.

Scatter scare

“If those desktop machines lost the data, that meant wasted money and wasted samples," says Don Lokuadassuriyage, cluster manager at King’s.

The organisation experienced data corruption that scattered sequencing results over a variety of unmanaged drives.

In conjunction with Panasas, IBM designed a bespoke server cluster and storage system to make processing more efficient, and centralised the facility’s data storage.

“One major point of the design was to be as environmentally friendly as possible,” says Julian Fielden, managing director at OCF, the systems integrator for the project.

Three sequencers from specialist medical research equipment company Illumina analyse DNA from patient tissue and mouse tissue. These sequencers can find themselves crunching increasingly large data sets very quickly.

Previously, the facility used an Illumina Pipeline Analysis Server to process the data generated by the sequencers, in addition to its local PCs and servers.

Go with the workflow

“You need to be able to scale rapidly as demands increase, to provide them with the most efficient workflow. Workflow is what counts because it is how the science gets done,” says Fielden.

In total, the cluster that OCF and IBM created consisted of 31 nodes, of which 30 were IBM iDataPlex dx360 m2 machines, each housing two Intel Nehalem E5520 2.26GHz 4 core processors.

They also featured 48GB of memory each, a 250GB SATA drive, and InfiniBand adaptors from QLogic.

The other machine was a ”fat” compute node: an IBM x3755 system with four 2.60GHz AMD Opteron 6C Processors, along with 256GB of RAM and 300Gb of storage.

The IBM System x iDataPlex servers featured ultra-low latency 10Gb switching modules which connected the server cluster to the storage system.

These IBM System Networking RackSwitch G8124 switches enable high-speed and low-cost networking for high performance computing, while being very energy efficient.

The Panasas storage system consisted of five 40TB Shelves, providing a total of 200TB. These connected to the head nodes via the 10Gb links, connected to the storage clients over a QDR InfiniBand Interconnect.

Expensive operation

The facility can now also take advantage of backup capabilities, using IBM Tivoli Storage Manager to an IBM TS3310 Automated Tape Library, with two expansions containing up to 225 tapes.

“Each sequencing run costs $10,000, so if you lose the data that’s expensive,” says Lokuadassuriyage.

“Backup is very important, so we keep one onside backup and one offside backup.”

Software-wise, the system is built mostly from generic components. The system runs the CentOS Linux distribution, and uses the Sun Grid Engine, to optimise the available computing resource for incoming jobs, along with MySQL and PostGres SQL.

“The sequencing software runs on Windows, and then we mount the cluster using Samba to enable Linux to work on the data,” says Lokuadassuriyage.

Twin mysteries

The organisation is currently using roughly 10 per cent of its storage, which gives it rather more wiggle room than it had before.

“It took four to five days to process a flow cell’s worth of data, using a desktop with four cores and 12Gb of RAM,” says Lokuadassuriyage.

“Now it’s done in 24 hours, so people can process the data much more quickly.”

The centre is upgrading the technology, adding additional storage for scratch space and ten more iDataplex chassis. This is designed to help carry out extended research into genes shared between twins.

As the work rolls on, the centre is making significant progress in the fight against disease.

“We have found genes that are cancer causing,” says Lokuadassuriyage, adding that the facility is also tackling ailments such as Crohn's disease and diabetes.

“The centre is looking for genetic components to see if it can isolate them and find a way to limit the gene, or ensure that it doesn’t get passed down.”

Unravelling genes may take inordinate amounts of computing power, but it is one reverse engineering problem we can all get behind. ®

Customer Success Testimonial: Recovery is Everything

Environmentally friendly???

“One major point of the design was to be as environmentally friendly as possible,” says Julian Fielden, managing director at OCF, the systems integrator for the project.

Huh? Sod that! The major point of the design should be to store terabytes (if not petabytes) of data, and then analyse that data in such a way that may bring us closer to a cure for cancer.

I'm absolutely sick and fed up to the back teeth of all this environment and carbon footprint malarky. If the system is environmentally friendly then it's a bonus, but to make that a major design goal is just crazy.

1
0

IT!

Because we are behind all scientific progress now. And we can save lives. For once, I am proud to be part of the IT world.

1
0

KCL photos

A loud-hailer icon seems only right for a PR man. You can see more photos (including the genomics sequencing machines, storage, Biomedical Research Centre, etc.) from the deployment at King's College London here, http://blog.ocf.co.uk/?p=701

0
0

More from The Register

Samsung Galaxy Note 8: Proof the pen is mightier?
Sammy’s iPad Mini killer has a stylus to stab other rivals too
Microsoft lures buy-curious vixens, corduroys with a cheap fondle
Surface slab sales latest: Will no one rid Ballmer of these turbulent tabs?
First look: iOS 7 for iPad
No, Apple hasn't released it yet, but that doesn't stop intrepid devs
 breaking news
Curtain drops on Apple Store ahead of WWDC: What lies behind?
Steve Jobs watching from on high. No pressure, lads
 breaking news
Cold, dead hands of Steve Jobs slip from iPhones: The Cult of Ive is upon us
Billionaire biz baron's death clears way for uber-shiny iOS 7
Airbus imagines suitcases that find themselves
Point your mobe at your smalls to track their every move
Surprise! Intel smartphone trounces ARM in power trials
Tests show equal performance while sipping significantly less juice
Samsung plans LTE Advanced version of Galaxy S4
1Gbps download capability could stiffen drooping S4 sales forecasts
Apple said to be 'exploring' 5.7-inch iPhone
Who's the copycat this time, Mr. Cook?
Google Chromebooks now in over 6,600 stores
Major, worldwide retail push begins this summer