Feeds

Evergrid ships 'little blue pill' for clusters

Don't let CD get you down

Next gen security for virtualised datacentres

SC06 A small software start-up thinks it might have "the little blue pill" necessary to keep massive clusters up and running at all times.

Evergrid this week unveiled something called the Availability Management Suite, but you might as well call it cluster Viagra. The company’s software works to make sure that minor – or major – system failures don't hamper the overall completion of large computing jobs. In total, Evergrid helps you keep your cluster up, sturdy and strong for hours and even days at a time.

Okay, we'll stop the bad "jokes" for a moment and get to the point.

Evergrid is working off the premise that the high performance computing industry lacks the proper tools for restarting jobs on large clusters. Systems made up of thousands of boxes will have failures – plenty of them – and need a way to get jobs restarted automatically.

Today, administrators must keep a close eye on their clusters, break jobs down into different chunks or try running jobs on smaller systems in order to avoid the pains of system failures. Evergrid tries to sidestep all this mess by running continuous "checkpoints" that capture the state of servers and their applications. If a failure occurs, the system can roll back to the last known state and then get cranking away on jobs once again.

Some companies and labs have developed their own checking systems over the years, but Evergrid believes that the industry is begging for a standard here.

Thankfully, customers will not need to rewrite any of their own software to make Evergrid's code work. The Evergrid "abstraction layer" slides in between an OS kernel and its applications. The checkpoints can then gather information on the state of memory, file I/O and the network at desired intervals.

Evergrid CEO's Dave Anderson, speaking here at the Supercomputing conference, told us that the monitoring software could chew through as much as 5 per cent of a system's resources. He, however, pitched that as an Armageddon type scenario and claimed that more often than not administrators will not notice the Evergrid code.

Come January, Evergrid plans to GA its Availability Services software and Resource Manager. Together, these products form the "flagship" Availability Management Suite.

The Availability package performs the checkpoint operations, while the resource manager handles a broad array of tasks such as making sure certain jobs get a set amount of processing power and priority levels.

We caught a demo of the Suite, and it worked as billed. An Evergrid administrator had three jobs spread across an eight-server cluster. The software could detect idle systems, throw them at jobs as needed and caught a failure when we ripped out one server. It stopped the one job that was affected and then got it up and running again in a couple of seconds on the available hardware.

By the second half of next year, Evergrid hopes to extend out of the HPC market and target smaller business clusters. It's looking, in particular, at the database and application server markets. Such customers might flock to Evergrid's tools for stopping jobs and then restarting them on new servers. This would allow a company to juggle different jobs with more flexibility than they have today.

For now, however, Evergrid will center on the HPC crowd that has enormous clusters built out of thousands of machines. Many HPC users deals with jobs that take days, weeks and sometimes months to process. They often have to restart these jobs from scratch due to system failures.

Evergrid has one patent and another handful of patents pending that it thinks will protect its IP from overzealous Linux coders who might come up with something similar. Some of you will be familiar with the company's CTO Dr. Srinidhi Varadarajan who built the massive G5 cluster at Virginia Tech and is credited with banging out much of the special "checking" sauce.

Evergrid claims two major customers at this point – an unnamed financial services company and the University of Oklahoma. The company has yet to set official pricing for its software, although Anderson guessed it will come in around "$250 per node with large volume discounts."

There's more information available here. ®

5 things you didn’t know about cloud backup

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
Docker kicks KVM's butt in IBM tests
Big Blue finds containers are speedy, but may not have much room to improve
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Gartner's Special Report: Should you believe the hype?
Enough hot air to carry a balloon to the Moon
Flash could be CHEAPER than SAS DISK? Come off it, NetApp
Stats analysis reckons we'll hit that point in just three years
Dell The Man shrieks: 'We've got a Bitcoin order, we've got a Bitcoin order'
$50k of PowerEdge servers? That'll be 85 coins in digi-dosh
prev story

Whitepapers

Endpoint data privacy in the cloud is easier than you think
Innovations in encryption and storage resolve issues of data privacy and key requirements for companies to look for in a solution.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Solving today's distributed Big Data backup challenges
Enable IT efficiency and allow a firm to access and reuse corporate information for competitive advantage, ultimately changing business outcomes.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.