Docker ported into Hadoop as benchmarks show SCREAMING FAST performance

Code committers hope unholy union of open source tech will spawn speedy gonzalez virtualization

Maximizing your infrastructure through virtualization

The Hadoop community is working on patches that will bring the popular app-containerization technology Docker into the data management system, and independent benchmarks are showing the tech has a huge speedup over traditional virtualization approaches.

Docker is an open source Linux containerization technology that uses underlying kernel elements like namespaces, lxc, and cgroups to let an admin run multiple apps with all their dependencies in secure sandboxes on the same underlying Linux OS, making it an attractive alternative to typical virtualization, which bundles a copy of the OS with each app.

In a set of benchmarks an IBM employee released on Thursday, the company showed that Docker containerization has some huge advantages over the KVM hypervisor from a performance perspective.

Alongside this, El Reg has discovered some fascinating work by the Hadoop community to bring the tech into the eponymous data analysis and management engine.

Combined, these crumbs of news add more grist to the idea that Docker could become an eventual replacement for traditional virtualization approaches, granting organizations big benefits from an open source tech.

To start with, benchmarks conducted by IBM show that Docker has a number of performance advantages over the KVM hypervisor when running on the open source cloud infrastructure tool OpenStack.

In an informative post published on Thursday, IBM chap Boden Russell goes into further details about the results.

"From an OpenStack Cloudy operational time perspective (boot, reboot, delete, snapshot, etc.) docker LXC outperformed KVM ranging from 1.09x (delete) to 49x (reboot)," Russell wrote. "Based on the compute node resource usage metrics during the serial VM packing test: Docker LXC CPU growth is approximately 26x lower than KVM. On this surface this indicates a 26x density potential increase from a CPU point of view using docker LXC vs a traditional hypervisor. Docker LXC memory growth is approximately 3x lower than KVM. On the surface this indicates a 3x density potential increase from a memory point of view using docker LXC vs a traditional hypervisor."

Impressive stuff, indeed.

Altiscale wants to spin a Docker YARN

Not only does Docker have desirable resource-usage characteristics, but the way it allows devs to package up applications has attracted attention from the open source Hadoop community.

Recently we learned that some people are diligently working to add Docker support into a crucial component of Apache Hadoop 2.0 named YARN, with the goal of increasing the usefuleness of both techs.

YARN was introduced in version two of Apache Hadoop. It lets the software run multiple applications within Hadoop rather than purely MapReduce jobs. Thanks to this, YARN is helping to transform Hadoop from a batch processing and storage system into a more general tool for manipulating and storing data.

By combining YARN with Docker, the community hopes it can make it trivial for developers to package up an application in a Docker container, then sling it onto the YARN tech as part of a larger Hadoop installation.

Altiscale, the company behind the code contributions that make this possible, was kind enough to answer some of our questions about why this could be useful.

"As a company building a Hadoop as a Service platform, we are particularly interested in YARN as it allows Hadoop to move beyond map-reduce to a much more diverse variety of applications," explained the company's chief executive Raymie Stata to El Reg via email. "One of the key components of YARN that make this possible are containers. The existing YARN container implementation does not adequately provide all the types of isolation required to address a scenario we are noticing with our larger customers – multiple, independent groups in the same organization with different software requirements."

By adding in Docker support, Altiscale hopes it can flatten some of the barriers that lie between enterprise developers and a greater use of Hadoop.

"A common struggle for users is software dependency management," Stata explained. "Docker provides an intriguing approach to solving that problem by allowing users to upload prepackaged environments (or images) into repositories which can then easily be downloaded and run in isolation. For example, there are public repositories in the Docker community called Docker registries which provide a variety of language environments such as Java and Ruby. There is also support for private repositories where containers with more specialized environments can be placed."

Other members of the Hadoop community are keen on the addition of Docker as well.

"Where Docker makes perfect sense for YARN is that we can use Docker Images to fully describe the *entire* unix filesystem image for any YARN container," explained Arun Murthy, a founder and architect at Hortonworks, to El Reg in an email.

"This way, instead of forcing the user to deal with individual files or binaries (as today) we can allow the application to package up the *entire* Unix filesystem image it needs as Docker image and then get perfect predictability, from an environment perspective, at runtime. This is where Docker has the most amount of interest to the YARN/Hadoop community - particularly for people packaging up complex applications which need their own version of perl, python, java, libc etc. etc. ... that is hard to manage on YARN currently."

The addition of Docker to YARN looks like a potentially useful tool and is another example of the enthusiasm with which Silicon Valley has adopted the young open source technology.

This follows Red Hat announcing broad support for Docker in its eponymous Linux distribution, and launching a project named "Atomic" built around the tech.

Amazon also recently added Docker support to its "Elastic Beanstalk" platform-as-a-service cloud.

These moves back up an earlier assertion by a Red Hat employee that: "Docker as a packaging tool for shipping software may be a game changer". ®

The Power of One eBook: Top reasons to choose HP BladeSystem

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Amazon Reveals One Weird Trick: A Loss On Almost $20bn In Sales
Investors really hate it: Share price plunge as growth SLOWS in key AWS division
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
Carbon tax repeal won't see data centre operators cut prices
Rackspace says electricity isn't a major cost, Equinix promises 'no levy'
prev story


Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Application security programs and practises
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Securing Web Applications Made Simple and Scalable
Learn how automated security testing can provide a simple and scalable way to protect your web applications.