Feeds

Sun marries Hadoop to Grid Engine

DIY Googleplex

Internet Security Threat Report 2014

Sun Microsystems may be in a PR muzzle until sugar daddy Oracle gets permission to buy it from European antitrust regulators, but the coders who maintain Sun's myriad software products are still banging away on their keyboards in an effort to not only look useful to keep their jobs, but be useful.

They just can't engage the IT press to talk about what they are up to. Which is why Sun's blogs come in handy, and in this case, as a means of letting us know about an update to Sun's Grid Engine grid software.

Grid Engine 6.2 update 5 appears to have been launched last week if you reckon the date of a blog posting by Dan Templeton, a staff engineer who works on the grid middleware. Templeton says that with this update, Grid Engine is the first workload manager with support for applications created using the open source Hadoop programming environment hosted over at the Apache Foundation. Instead of having to set up a dedicated Hadoop cluster, you can treat Hadoop like any other application and submit jobs to a Grid Engine grid.

Hadoop is an analog to the distributed programming environment used by Google that was created by rival Yahoo! and taken open source. Hadoop consists of the Hadoop Distributed File System, which is a distributed and fault-tolerant file system, and the MapReduce application parallelization and execution environment that works in conjunction with HDFS. In March 2009, Cloudera put out a commercialized version of Hadoop with enterprise-grade support, and would no doubt argue about Sun's claims. Still, the ability to submit Hadoop jobs to a Grid Engine grid and having it cope with Hadoop jobtrackers and tasktrackers is pretty cool.

The Grid Engine software, which is aware of HDFS, is able to route processing jobs to where the data is already located in the nodes, which speeds up execution of those jobs. (This is a whole lot smarter than starting up a job somewhere on the Hadoop cluster and then trying to move the data over to that node.)

With Grid Engine 6.2 update 5, the job scheduler has also been tweaked so it can allocate jobs to specific types of processors and server configurations if grid applications need certain features - high clock speeds, multiple cores, big caches, lots of main memory, and so forth - to run properly. Templeton says that, for instance, some cache-hungry applications will run in half the time if a job is plunked on four cores spread across four server sockets instead of four cores sharing a single socket.

Now Grid Engine administrators can use a feature called core binding specify the kind of hardware resources they need, and Grid Engine can do its best to allocate a job to them when they are available in the pool.

The update also includes a feature called slotwise preemption, which is a more sophisticated way of allocating resources than just saying job queue A is always subordinate to job queue B; you can say clever things like have no more than four jobs running across queues A and B, and if there is a conflict for resources, queue B always loses.

The update also includes tweaks that make it easier to integrate a Grid Engine setup with Amazon's EC2 compute cloud and to power down unused server nodes in a grid - capabilities that debuted with update 3 of the software last year but which apparently still had some rough edges.

You can plow through the release notes on Sun Grid Engine 6.2 update 5 here and download the software there. Sun offers commercial support for Grid Engine, but pricing was not available at press time.

Sun does not offer support for Hadoop as far as El Reg knows, but Cloudera certainly does for its variant. It won't be long before Oracle-Sun cook up a Cloudera-Grid Engine partnership. Oracle may even snap up Cloudera before lunch some day. ®

Internet Security Threat Report 2014

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
IBM storage revenues sink: 'We are disappointed,' says CEO
Time to put the storage biz up for sale?
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.