Feeds

Sun marries Hadoop to Grid Engine

DIY Googleplex

Protecting against web application threats using SSL

Sun Microsystems may be in a PR muzzle until sugar daddy Oracle gets permission to buy it from European antitrust regulators, but the coders who maintain Sun's myriad software products are still banging away on their keyboards in an effort to not only look useful to keep their jobs, but be useful.

They just can't engage the IT press to talk about what they are up to. Which is why Sun's blogs come in handy, and in this case, as a means of letting us know about an update to Sun's Grid Engine grid software.

Grid Engine 6.2 update 5 appears to have been launched last week if you reckon the date of a blog posting by Dan Templeton, a staff engineer who works on the grid middleware. Templeton says that with this update, Grid Engine is the first workload manager with support for applications created using the open source Hadoop programming environment hosted over at the Apache Foundation. Instead of having to set up a dedicated Hadoop cluster, you can treat Hadoop like any other application and submit jobs to a Grid Engine grid.

Hadoop is an analog to the distributed programming environment used by Google that was created by rival Yahoo! and taken open source. Hadoop consists of the Hadoop Distributed File System, which is a distributed and fault-tolerant file system, and the MapReduce application parallelization and execution environment that works in conjunction with HDFS. In March 2009, Cloudera put out a commercialized version of Hadoop with enterprise-grade support, and would no doubt argue about Sun's claims. Still, the ability to submit Hadoop jobs to a Grid Engine grid and having it cope with Hadoop jobtrackers and tasktrackers is pretty cool.

The Grid Engine software, which is aware of HDFS, is able to route processing jobs to where the data is already located in the nodes, which speeds up execution of those jobs. (This is a whole lot smarter than starting up a job somewhere on the Hadoop cluster and then trying to move the data over to that node.)

With Grid Engine 6.2 update 5, the job scheduler has also been tweaked so it can allocate jobs to specific types of processors and server configurations if grid applications need certain features - high clock speeds, multiple cores, big caches, lots of main memory, and so forth - to run properly. Templeton says that, for instance, some cache-hungry applications will run in half the time if a job is plunked on four cores spread across four server sockets instead of four cores sharing a single socket.

Now Grid Engine administrators can use a feature called core binding specify the kind of hardware resources they need, and Grid Engine can do its best to allocate a job to them when they are available in the pool.

The update also includes a feature called slotwise preemption, which is a more sophisticated way of allocating resources than just saying job queue A is always subordinate to job queue B; you can say clever things like have no more than four jobs running across queues A and B, and if there is a conflict for resources, queue B always loses.

The update also includes tweaks that make it easier to integrate a Grid Engine setup with Amazon's EC2 compute cloud and to power down unused server nodes in a grid - capabilities that debuted with update 3 of the software last year but which apparently still had some rough edges.

You can plow through the release notes on Sun Grid Engine 6.2 update 5 here and download the software there. Sun offers commercial support for Grid Engine, but pricing was not available at press time.

Sun does not offer support for Hadoop as far as El Reg knows, but Cloudera certainly does for its variant. It won't be long before Oracle-Sun cook up a Cloudera-Grid Engine partnership. Oracle may even snap up Cloudera before lunch some day. ®

Choosing a cloud hosting partner with confidence

More from The Register

next story
Wanna keep your data for 1,000 YEARS? No? Hard luck, HDS wants you to anyway
Combine Blu-ray and M-DISC and you get this monster
US boffins demo 'twisted radio' mux
OAM takes wireless signals to 32 Gbps
Google+ GOING, GOING ... ? Newbie Gmailers no longer forced into mandatory ID slurp
Mountain View distances itself from lame 'network thingy'
Apple flops out 2FA for iCloud in bid to stop future nude selfie leaks
Millions of 4chan users howl with laughter as Cupertino slams stable door
Students playing with impressive racks? Yes, it's cluster comp time
The most comprehensive coverage the world has ever seen. Ever
Run little spreadsheet, run! IBM's Watson is coming to gobble you up
Big Blue's big super's big appetite for big data in big clouds for big analytics
Seagate's triple-headed Cerberus could SAVE the DISK WORLD
... and possibly bring us even more HAMR time. Yay!
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
The next step in data security
With recent increased privacy concerns and computers becoming more powerful, the chance of hackers being able to crack smaller-sized RSA keys increases.