Feeds

Sun marries Hadoop to Grid Engine

DIY Googleplex

Intelligent flash storage arrays

Sun Microsystems may be in a PR muzzle until sugar daddy Oracle gets permission to buy it from European antitrust regulators, but the coders who maintain Sun's myriad software products are still banging away on their keyboards in an effort to not only look useful to keep their jobs, but be useful.

They just can't engage the IT press to talk about what they are up to. Which is why Sun's blogs come in handy, and in this case, as a means of letting us know about an update to Sun's Grid Engine grid software.

Grid Engine 6.2 update 5 appears to have been launched last week if you reckon the date of a blog posting by Dan Templeton, a staff engineer who works on the grid middleware. Templeton says that with this update, Grid Engine is the first workload manager with support for applications created using the open source Hadoop programming environment hosted over at the Apache Foundation. Instead of having to set up a dedicated Hadoop cluster, you can treat Hadoop like any other application and submit jobs to a Grid Engine grid.

Hadoop is an analog to the distributed programming environment used by Google that was created by rival Yahoo! and taken open source. Hadoop consists of the Hadoop Distributed File System, which is a distributed and fault-tolerant file system, and the MapReduce application parallelization and execution environment that works in conjunction with HDFS. In March 2009, Cloudera put out a commercialized version of Hadoop with enterprise-grade support, and would no doubt argue about Sun's claims. Still, the ability to submit Hadoop jobs to a Grid Engine grid and having it cope with Hadoop jobtrackers and tasktrackers is pretty cool.

The Grid Engine software, which is aware of HDFS, is able to route processing jobs to where the data is already located in the nodes, which speeds up execution of those jobs. (This is a whole lot smarter than starting up a job somewhere on the Hadoop cluster and then trying to move the data over to that node.)

With Grid Engine 6.2 update 5, the job scheduler has also been tweaked so it can allocate jobs to specific types of processors and server configurations if grid applications need certain features - high clock speeds, multiple cores, big caches, lots of main memory, and so forth - to run properly. Templeton says that, for instance, some cache-hungry applications will run in half the time if a job is plunked on four cores spread across four server sockets instead of four cores sharing a single socket.

Now Grid Engine administrators can use a feature called core binding specify the kind of hardware resources they need, and Grid Engine can do its best to allocate a job to them when they are available in the pool.

The update also includes a feature called slotwise preemption, which is a more sophisticated way of allocating resources than just saying job queue A is always subordinate to job queue B; you can say clever things like have no more than four jobs running across queues A and B, and if there is a conflict for resources, queue B always loses.

The update also includes tweaks that make it easier to integrate a Grid Engine setup with Amazon's EC2 compute cloud and to power down unused server nodes in a grid - capabilities that debuted with update 3 of the software last year but which apparently still had some rough edges.

You can plow through the release notes on Sun Grid Engine 6.2 update 5 here and download the software there. Sun offers commercial support for Grid Engine, but pricing was not available at press time.

Sun does not offer support for Hadoop as far as El Reg knows, but Cloudera certainly does for its variant. It won't be long before Oracle-Sun cook up a Cloudera-Grid Engine partnership. Oracle may even snap up Cloudera before lunch some day. ®

Beginner's guide to SSL certificates

More from The Register

next story
The cloud that goes puff: Seagate Central home NAS woes
4TB of home storage is great, until you wake up to a dead device
Azure TITSUP caused by INFINITE LOOP
Fat fingered geo-block kept Aussies in the dark
You think the CLOUD's insecure? It's BETTER than UK.GOV's DATA CENTRES
We don't even know where some of them ARE – Maude
Intel offers ingenious piece of 10TB 3D NAND chippery
The race for next generation flash capacity now on
Want to STUFF Facebook with blatant ADVERTISING? Fine! But you must PAY
Pony up or push off, Zuck tells social marketeers
Oi, Europe! Tell US feds to GTFO of our servers, say Microsoft and pals
By writing a really angry letter about how it's harming our cloud business, ta
SAVE ME, NASA system builder, from my DEAD WORKSTATION
Anal-retentive hardware nerd in paws-on workstation crisis
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Website security in corporate America
Find out how you rank among other IT managers testing your website's vulnerabilities.
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.