So you want to build the next Google. Who ya gonna call? Er, Big Blue?

IBM's cluster scheduler kicks OpenStack's Nova in teeth, eyes VMware

Intelligent flash storage arrays

Analysis IBM has announced a new version of its Platform Resource Scheduler (PRS), which lines up jobs and resources in mammoth OpenStack Havana environments.

In doing so, Big Blue hopes to give enterprises a shot at achieving the same levels of efficiency as Google's highly tuned servers.

Though the tech competes against VMware's Distributed Resource Scheduler, it could become a credible general-purpose job scheduler to rival Google's secretive Borg and Omega systems, and the Apache Mesos project.

A resource scheduler and workload placer is a system that takes jobs, and figures out when to run them and where to run them to maximize IT utilization. It must also leave some spare capacity, rather than consume all the available infrastructure, to ensure there's redundancy to pick up from any failures. And it must hit its deadlines.

Google's Borg system is rumored to have been so good at this task juggling act that it saved the ad-slinger from building an entire data center.

IBM's resource scheduling tech is designed as a drop-in replacement for the scheduler within the Nova component of the open-source cloud manager OpenStack. Nova makes scheduling decisions according to information it stores during its setup, and it selects jobs for compute nodes whose configurations match various filters.

PRS, by contrast, uses the distributed agent framework in Big Blue's Platform Computing products, which considers realtime "machine and hypervisor loads" among other information when making decisions. Thus, PRS can look at the available compute capacity in realtime and make ongoing judgements when placing workloads. It can shift things around as needed using the underlying hypervisor's live migration ability.

"This means that as workloads and resources evolve, workload placement is automatically re-balanced," IBM marketing chap Gord Sissons told The Reg via email.

"The key benefits are: better quality of service in terms of performance and availability, because hypervisors are less likely to be over-subscribed; better utilization, since [virtual machines] can be packed more optimally while respecting service level requirements; and reduced administrator workload, since the re-balancing is automated.

"This is important as OpenStack environments get large. The real 'intellectual property' in the offering is in the pre-configured policies - the idea is that a cloud administrator can simply specify a policy like 'load balancing' or 'packing', and the scheduler will automatically seek to achieve the goal of the policy."

It'll babysit your 50,000 cores. If you can afford it

It's worth noting that this system is unlikely to have the capabilities of Google's Omega system, which is believed to draw on CPU-core-level telemetry from a system named CPI2, along with other Chocolate Factory innovations.

However, by drawing on other IBM technology such as Platform Symphony, it is able to gain some advanced abilities, such as the aforementioned distributed agent-based scheduling, which (we're told) lets IBM's tech "opportunistically 'borrow' resources not in use by different tenants - loaning, borrowing and pre-emption policies are specified in flexible resource sharing plans that can vary with time."

The whole system can also sit on top of IBM's well-regarded General Parallel File System, which gives it some capabilities more advanced than the main open-source equivalent, the Hadoop Distributed File System. Google is likely to field its own tech in this arena, but has published very little on it.

From what we understand, these capabilities mean IBM's PRS is more advanced than parts of the open-source Apache Mesos project – though at the cost of being proprietary and hence only having one major developer (IBM) driving the project.

One drawback of Big Blue's approach is its dependence on full virtualization, which means when passing information between two VMs on the same server there is an overhead. This compares with kernel-level direct transfers within Omega and Mesos thanks to containerization via cgroups, and so on.

IBM says it already has some customers running in the range of 50,000-cores – hardly Google, but not insignificant.

Though the technology strikes this hack as being handy for the few companies out there with boisterous, instance-filled OpenStack environments not already under some kind of scheduler, it seems unlikely it can maintain feature parity with the open-source scheduler and resource placer Apache Mesos.

Mesos is already in wide use at Twitter – the company hired Benjamin Hindman, co-creator of the tech, recently – and has also been used by trendy room-renting network Airbnb. IBM argues that the Mesos project as it stands is immature – true, but with hefty resources behind it, that may not remain the case.

The prerequisites for enterprises wanting to have a nibble at IBM's answer to Google's most advanced system is the use of IBM Power Systems or IBM System x (including iDataPlex), Red Hat Enterprise Linux 6.3, and IBM SmartCloud Entry V3.2.

Though many view IBM's recent OpenStack love-in as more marketing than substance, this release shows that in some parts of Big Blue's titanic organization, some very clever people are working to supercharge the open-source project – for a price. ®

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
The cloud that goes puff: Seagate Central home NAS woes
4TB of home storage is great, until you wake up to a dead device
Fat fingered geo-block kept Aussies in the dark
You think the CLOUD's insecure? It's BETTER than UK.GOV's DATA CENTRES
We don't even know where some of them ARE – Maude
Intel offers ingenious piece of 10TB 3D NAND chippery
The race for next generation flash capacity now on
Want to STUFF Facebook with blatant ADVERTISING? Fine! But you must PAY
Pony up or push off, Zuck tells social marketeers
Oi, Europe! Tell US feds to GTFO of our servers, say Microsoft and pals
By writing a really angry letter about how it's harming our cloud business, ta
SAVE ME, NASA system builder, from my DEAD WORKSTATION
Anal-retentive hardware nerd in paws-on workstation crisis
prev story


Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Internet Security Threat Report 2014
An overview and analysis of the year in global threat activity: identify, analyze, and provide commentary on emerging trends in the dynamic threat landscape.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.