Rowdy clusters put to the grindstone by Grid Engine 8.1

Batch number-crunching in the trendy cloud era

SANS - Survey on application security programs

The advent of virtualized and cloudy infrastructure has not diminished the need for scheduling software like Grid Engine. It's obvious just how necessary such schedulers are for orchestrating and aggregating capacity of server computing pools.

That's the plan from Univa, the company that's providing a fork of the Grid Engine software still controlled (but largely ignored) by Oracle. The software engineers who used to work at Sun Microsystems and then Oracle but now work at Univa, are crafting Grid Engine 8.1.

Univa acquired an OEM licence from Sun back in 2007 and made a living selling support and other services atop the gridware, which aggregates CPU cycles on clusters of PCs and services so they can be used to run parallel HPC workloads on bare-metal machinery.

In 2008, Sun actually paid Univa to do some development on Grid Engine, which ended up in Oracle's lap after Big Larry bought Sun in January 2010. Univa bears Oracle no animosity, and it took a year before the company decided to fork Grid Engine and create its own open-source variant.

Since that time, Univa has put out two releases of an updated Grid Engine stack (and it can call it Grid Engine thanks to that OEM agreement with Sun) and is now working on a third, version 8.1, which will come out sometime before the end of the second quarter, according to Univa CEO Gary Tyreman.

Univa Grid Engine roadmap

Univa's Grid Engine roadmap (click to enlarge)

The code base for Grid Engine, version 6.2 update 5, is still out there, freeze-dried since the Oracle takeover of Sun. Univa's developers, many of whom were hired from Sun after it became clear that Oracle had other ideas about how to cluster machines and was fundamentally uninterested in traditional HPC workloads, put about 90 enhancements into the product to make its own Grid Engine 8.0, released in May 2011.

It then did another 200 enhancements for the 8.0.1 release, which came out last October. The feature count is not yet set for Grid Engine 8.1, which is due before the end of the first half of 2012, but it is well in the order of four times the enhancements that Univa was able to get into its first release.

While customers are still able to get the Grid Engine V6.2U5 software, their grids are scaling much further, and this is an issue.

"Do you really have the confidence that everything that works at 20,000 cores will work at 100,000 cores?" asks Tyreman, adding that programs written for one scale can still run into issues when expanded across yet more iron. And plenty of Grid Engine users still working from the V6.2U5 release are going to find this out.

In January 2011, when Univa forked Grid Engine, Tyreman said he knew for a fact there were over 4 million CPUs gridded up using his company's software in more than 1,000 government, academic, and commercial establishments, and he guessed further that there were 2,000 to 10,000 organisations using the open source tools and maintaining themselves.

"Last year, when we launched Univa Grid Engine, we knew there was a bunch of people out there using Grid Engine," Tyreman tells El Reg. "Now, after doing more research, we are very confident that there are 10,000 sites actually using the software."

Getting cosy with the cloud

As grid software goes, this is a pretty large installed base. Even if most customers can get by with the open source version, a percentage of those at the upper end of scalability - who need better integration with cloud controllers like Eucalyptus or the Amazon EC2 cloud or need integration with Hadoop big data munchers - are going to be looking for help from Univa, which has done the work to make these integrations possible as well as scale up the size of the grids.

Tyreman says that Univa has been working with Cloudera and other Hadoop disties to use Grid Engine as a workload manager for Hadoop clusters, which tend to be a bit unruly and cranky. "Hadoop will splatter jobs out there, but it is indiscriminate and it doesn't have policies for sharing and driving the utilization on the server cluster as high as you can," he explains. "We have a lot of work to make the integration better, but at the end of the day, it's just APIs."

Everyone is trying to solve the same problem. This is why IBM acquired Platform Computing, the main competition to Grid Engine and also a credible threat to Hadoop, last October, snapping up the company's high-speed Java messaging and processing framework called Symphony.

The Apache Hadoop project has the Capacity Scheduler and the Fair Scheduler to drive up utilisation on Hadoop clusters and get multiple Hadoop jobs to play nice.

OpenStack is getting Distributed Scheduler for large clouds, too, so virty server instances are not immune from the problem. And more than a few shops are looking at the price tag for vSphere and vCloud Controller and wondering how to make Grid Engine and OpenStack or CloudStack work together nicely.

As for the 8.1 release of Grid Engine from Univa, the object of most of the development is to get the total cost of ownership of a large grid down a lot lower as a grid scales up. On the performance front, Grid Engine will now have processor core and NUMA memory bindings that will allow jobs to run consistently as they are dispatched across the grid. This boosts performance around 10 per cent for jobs and also drives up utilization on the grid.

The update also includes resource maps that figure out how hardware and software resources are ordered and used in the cluster to drive up throughput. (To do less of this "splattering" that even job schedulers like Grid Engine sometimes do.)

The 8.1 edition also includes better job debugging and diagnostics to help system administrators figure out where things are getting bunched up on the grid, and has templates that integrate Grid Engine with popular message passing interface (MPI) cluster protocols to rein them in, too. The grid system will also play nicer with the PostgreSQL database, which handles job spooling for Grid Engine and needs to be scolded to balance the speed of submission of new jobs against the choking hazard of lots of small jobs.

Univa's Grid Engine 8.1 will start beta testing shortly. The production support contract for the software is the same price as it was last year, at $99 per core per year. If you want to UniCloud extensions that allow it to control jobs on Eucalyptus or EC2 clouds that are set up to run Grid Engine, then you have to pay $150 per core per year. If you deploy on Amazon EC2, GoGrid, or Rackspace Cloud, you have to obviously buy the capacity. You can also deploy UniCloud on internal clouds running VMware ESXi or Oracle Xen hypervisors. ®

3 Big data security analytics techniques

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
IBM rides nightmarish hardware landscape on OpenPOWER Consortium raft
Google mulls 'third-generation of warehouse-scale computing' on Big Blue's open chips
It's GOOD to get RAIN on your upgrade parade: Crucial M550 1TB SSD
Performance tweaks and power savings – what's not to like?
AMD's 'Seattle' 64-bit ARM server chips now sampling, set to launch in late 2014
But they won't appear in SeaMicro Fabric Compute Systems anytime soon
Microsoft's Nadella: SQL Server 2014 means we're all about data
Adds new big data tools in quest for 'ambient intelligence'
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
prev story


Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mainstay ROI - Does application security pay?
In this whitepaper learn how you and your enterprise might benefit from better software security.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.