Feeds

Los Alamos lends open source hand to life sciences

Having a BLAST

  • alert
  • submit to reddit

Top three mobile application threats

Researchers at Los Alamos National Labs have struck computing gold once again with an open source project that could benefit genetic research.

Three scientists have tried their hand at improving the popular BLAST (Basic Local Alignment Search Tool) search algorithms. The group decided to chop up a BLAST database and spread it across a number of servers instead of throwing lots of horsepower at a single data set. In so doing, the need to run I/O requests to disk was eliminated and the researchers saw huge, super-linear performance gains.

The experiment to put little bits of a database in memory instead of on disk proved a success and has since drawn considerable attention to mpiBLAST from pharmaceutical companies, researchers and even Microsoft.

My computer is smaller than yours

Wu-chun Feng, a Los Alamos researcher, originated the idea for mpiBLAST after solving the Gelsinger Coefficient with the introduction of Green Destiny - a Transmeta processor-powered supercomputer. The 240 processor Linux cluster helped show that some scientific computing tasks will run with adequate performance and incredible reliability on a system that can fit in an average closest.

Having a new type of mini-supercomputer is of interest to life sciences researchers. A number of companies are looking for a way to pack tons of computer power in a small space and hope to do so without buying special cooling equipment or adding a new wing to their labs.

With this in mind, Feng and lead researchers Aaron Darling and Lucas Carey set to work tuning BLAST for the Green Destiny Linux cluster.

BLAST helps scientists search databases of protein and DNA sequences. There are many types of BLAST used for different areas of research, making the tool one of the most popular in the life sciences world.

Start chopping

Many bioinformatics databases have grown past the point where they can fit in the core memory of most servers, which means searches have to query a disk. To get around this, the Los Alamos researchers chopped the database up into small parts that could each reside in memory. This has led to astonishing performance gains, according to the researchers.

"The adverse effects of disk I/O are so significant that BLAST searches using database segmentation can exhibit super-linear speedup versus searches on a single node," they wrote in the paper.

Super-linear, you say? Aye, it's the truth.

The researchers used mpi (Message Passing Interface) to handle communications between the servers. They found that speed ups in performance held up even with all of the intercommunication from server to server.

In one instance, the group compared a job run on a single database and one broken up across 128 systems. The "single worker" took 22.4 hours to complete a search, while the cluster crunched through the data in 8 minutes. From zero up to 120 processors, the cluster showed super-linear speed-up in every case.

As the number of processor goes well beyond 140 chips, mpiBLAST tends to drift back toward reality and performance suffers.

The scientists found that "the master" will wait until all "workers" have completed their jobs before formatting the results. This means the cluster can be slowed as a result of the least efficient worker.

Solutions to this problem are proposed in their paper PDF , but regardless of some slowdowns, mpiBLAST performs like a champ up to 120 processors.

What needs to be done?

The researchers have some suggestions for where they think mpiBLAST needs to go.

One of the preferred additions to the code would be a fault tolerance mechanism to account for server failures. Having each server signal back to a master system should do the trick. If no alive and kicking signal comes in, the master switches the workload to another system.

The group also hopes to develop a way to automatically determine the number of servers that should be used on a given query to get the best performance.

The project caught the eye of Microsoft earlier this year. A Redmond worker contacted Los Alamos to see if some other license other than the GPL could be used with mpiBLAST. Microsoft wanted to take a peek at the code but couldn't bring itself to do so with the cancerous GPL hanging over its head.

Nonetheless, mpiBLAST runs on Unix, Linux and Windows.

For those interested in helping out with mipBLAST, there is a SourceForge project underway here. ®

Related Link

Homepage for mpiBLAST

Related Stories

Supercomputer eats more power than small city
Transmeta blades power landmark supercomputer breakthrough
Data speed record crushed

High performance access to file storage

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database sqares the circle of cheap memory vs speed
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
AMD's 'Seattle' 64-bit ARM server chips now sampling, set to launch in late 2014
But they won't appear in SeaMicro Fabric Compute Systems anytime soon
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
prev story

Whitepapers

Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.