Feeds

Netflix lets free simian software for cloud chaos

Angry ape kills virtual machines at random

Providing a secure and efficient Helpdesk

Streaming video provider Netflix has released Chaos Monkey, its homegrown tool that's designed to boost the resilience of cloud-based applications in the bluntest way possible: by knocking them down.

"Do you think your applications can handle a troop of mischievous monkeys loose in your infrastructure?" asks Netflix's Cory Bennett and Ariel Tseitlin in a blog post. "Now you can find out."

The way Chaos Monkey works is conceptually fairly simple. It runs as a service on Amazon Web Services (AWS), where it seeks out Auto Scaling Groups (ASGs) of virtual machine instances. When it finds one, it picks one of its virtual machines at random and terminates it.

At first blush, this may sound like the most maddening piece of software ever, and if a hacker figured out a way to use it maliciously, it could probably cause someone some real headaches.

But Chaos Monkey is a tool, and the reason it runs around your network like a psychopathic ape is because in reality, system failures are one of the most common types of problems the people who manage cloud services must deal with in everyday life.

The point isn't to pull the plug on virtual machines for the fun of it. The point is to ensure that even though the plug has been pulled on a server or two here and there, the overall system is resilient enough to keep running anyway.

"Failures happen and they inevitably happen when least desired or expected," the Netflix developers write. "If your application can't tolerate an instance failure would you rather find out by being paged at 3am or when you're in the office and have had your morning coffee?"

Netflix has made the source code for Chaos Monkey available on GitHub under the Apache open source license. The company says it's just the first of a family of tools it calls the "Simian Army" that it plans to release to the public.

Like Chaos Monkey, the others – including Latency Monkey, Conformity Monkey, Doctor Monkey, Janitor Monkey, Security Monkey, 10-18 Monkey, and the unnervingly-named Chaos Gorilla – are all designed to root out unseen problems in cloud architectures.

The company says Janitor Monkey, which searches for unused resources and disposes of them, is the next likely candidate for release.

But even these tools can't guarantee 100 per cent uptime for cloud-based appplications. During the large-scale AWS outage in June, Netflix was knocked down along with several other customers. Still, Netflix reps say they're confident that the company's rigorous resiliency testing, using the Simian Army among other tools, is the right approach.

"We take our availability very seriously and strive to provide an uninterrupted service to all our members," Netflix developer Greg Orzell wrote in a postmortem of the outage. "We're still bullish on the cloud and continue to work hard to insulate our members from service disruptions in our infrastructure." ®

Internet Security Threat Report 2014

More from The Register

next story
Google+ goes TITSUP. But WHO knew? How long? Anyone ... Hello ...
Wobbly Gmail, Contacts, Calendar on the other hand ...
UNIX greybeards threaten Debian fork over systemd plan
'Veteran Unix Admins' fear desktop emphasis is betraying open source
Preview redux: Microsoft ships new Windows 10 build with 7,000 changes
Latest bleeding-edge bits borrow Action Center from Windows Phone
Microsoft promises Windows 10 will mean two-factor auth for all
Sneak peek at security features Redmond's baking into new OS
Netscape Navigator - the browser that started it all - turns 20
It was 20 years ago today, Marc Andreeesen taught the band to play
DEATH by PowerPoint: Microsoft warns of 0-day attack hidden in slides
Might put out patch in update, might chuck it out sooner
Redmond top man Satya Nadella: 'Microsoft LOVES Linux'
Open-source 'love' fairly runneth over at cloud event
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.