Feeds

Sun marries Hadoop to Grid Engine

DIY Googleplex

Gartner critical capabilities for enterprise endpoint backup

Sun Microsystems may be in a PR muzzle until sugar daddy Oracle gets permission to buy it from European antitrust regulators, but the coders who maintain Sun's myriad software products are still banging away on their keyboards in an effort to not only look useful to keep their jobs, but be useful.

They just can't engage the IT press to talk about what they are up to. Which is why Sun's blogs come in handy, and in this case, as a means of letting us know about an update to Sun's Grid Engine grid software.

Grid Engine 6.2 update 5 appears to have been launched last week if you reckon the date of a blog posting by Dan Templeton, a staff engineer who works on the grid middleware. Templeton says that with this update, Grid Engine is the first workload manager with support for applications created using the open source Hadoop programming environment hosted over at the Apache Foundation. Instead of having to set up a dedicated Hadoop cluster, you can treat Hadoop like any other application and submit jobs to a Grid Engine grid.

Hadoop is an analog to the distributed programming environment used by Google that was created by rival Yahoo! and taken open source. Hadoop consists of the Hadoop Distributed File System, which is a distributed and fault-tolerant file system, and the MapReduce application parallelization and execution environment that works in conjunction with HDFS. In March 2009, Cloudera put out a commercialized version of Hadoop with enterprise-grade support, and would no doubt argue about Sun's claims. Still, the ability to submit Hadoop jobs to a Grid Engine grid and having it cope with Hadoop jobtrackers and tasktrackers is pretty cool.

The Grid Engine software, which is aware of HDFS, is able to route processing jobs to where the data is already located in the nodes, which speeds up execution of those jobs. (This is a whole lot smarter than starting up a job somewhere on the Hadoop cluster and then trying to move the data over to that node.)

With Grid Engine 6.2 update 5, the job scheduler has also been tweaked so it can allocate jobs to specific types of processors and server configurations if grid applications need certain features - high clock speeds, multiple cores, big caches, lots of main memory, and so forth - to run properly. Templeton says that, for instance, some cache-hungry applications will run in half the time if a job is plunked on four cores spread across four server sockets instead of four cores sharing a single socket.

Now Grid Engine administrators can use a feature called core binding specify the kind of hardware resources they need, and Grid Engine can do its best to allocate a job to them when they are available in the pool.

The update also includes a feature called slotwise preemption, which is a more sophisticated way of allocating resources than just saying job queue A is always subordinate to job queue B; you can say clever things like have no more than four jobs running across queues A and B, and if there is a conflict for resources, queue B always loses.

The update also includes tweaks that make it easier to integrate a Grid Engine setup with Amazon's EC2 compute cloud and to power down unused server nodes in a grid - capabilities that debuted with update 3 of the software last year but which apparently still had some rough edges.

You can plow through the release notes on Sun Grid Engine 6.2 update 5 here and download the software there. Sun offers commercial support for Grid Engine, but pricing was not available at press time.

Sun does not offer support for Hadoop as far as El Reg knows, but Cloudera certainly does for its variant. It won't be long before Oracle-Sun cook up a Cloudera-Grid Engine partnership. Oracle may even snap up Cloudera before lunch some day. ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Object storage bods Exablox: RAID is dead, baby. RAID is dead
Bring your own disks to its object appliances
Nimble's latest mutants GORGE themselves on unlucky forerunners
Crossing Sandy Bridges without stopping for breath
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 10 endpoint backup mistakes
Avoid the ten endpoint backup mistakes to ensure that your critical corporate data is protected and end user productivity is improved.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.