Feeds

Boffins fawn over dirt cheap server clusters

Fast array of wimpy nodes

Internet Security Threat Report 2014

A team of researchers at Carnegie Mellon University have been studying how they can make cheap, low-powered, and relatively unimpressive server nodes gang up and do more work than the two-socket x64 server that is the workhorse of the IT industry. They have come up with an approach called FAWN, which is short for Fast Array of Wimpy Nodes.

Last week at the ACM's Symposium on Operating Systems Principles, Carnegie Mellon researchers working with Intel Labs presented a paper (pdf) on the FAWN concept, demonstrating where a combination of wimpy server nodes built on motherboards that usually end up in inexpensive PCs or homegrown media servers can do more of the Web 2.0-style query work per unit of energy than more powerful boxes.

This is actually the second paper that the FAWN project has put out this year; the first one (pdf), introducing the work done by David Andersen, Jason Franklin, Amar Phanishayee, Lawrence Tan, and Vijay Vasudevan of Carnegie Mellon and Michael Kaminsky of Intel Labs.

This is by no means the first time that researchers, hyperscale data centers and IT suppliers have taken a look at clustering low-powered and relatively wimpy server nodes together to aggregate large amounts of computing, memory, and I/O capacity together. But the FAWN project researchers are trying to push the envelope - the power envelope, that is - and push it down as low as it can practically go.

To demonstrate what an array of FAWN machines can do, The Carnegie Mellon researchers say they have created a key value storage cluster that is similar in concept to Amazon's Dynamo and the open source projects Memcached (which is championed these days by Facebook, among others) and Voldemort (a distributed key value storage system, which is a database but not a relational or object database and which is used by LinkedIn). The prototype FAWN machines - the project is in its third generation - are the kinds of things you could build in your living room on a fairly modest budget.

The first generation wimpy nodes consisted of a cluster of eight baby PCs in beige boxes networked with a cheap switch, and the second generation mounted 14 boards together on a bare frame and didn't even bother with a chassis. The picture shown is of the second generation FAWN setup, but according to the paper published at SOSP, the latest cluster has 21 nodes, each using a 500 MHz Atom processor, 256 MB of main memory, and 4 GB of CompactFlash storage.

This is not, by any stretch of the imagination, a powerful PC. In fact, this is wimpy even for a wimpy PC. But according to the CMU researchers, here is the key thing that has Google Network Appliance kicking in money for the FAWN project alongside Intel: Each one of those wimpy nodes consumers under 5 watts of juice as it is running at near peak performance processing queries and retrieving data from the FAWN distributed store (FAWN-DS).

A node can do 1,300 256-byte queries per second, according to the paper and process 364 queries per joule of energy. This, say the techies, is two orders of magnitude better bang per joule than a regular server can deliver.

Carnegie Mellon FAWN Cluster

The second-generation FAWN Project cluster

One of the key factors behind the wimpy nodes doing so well is that on such a node, processing, main memory and flash memory speeds are more in synch than they are on a modern x64 or RISC server. Because CPUs are revving so much faster than the I/O devices that feed them, they are often tapping their feet, waiting for data.

To compensate for this, modern CPUs have layered on all kinds of features - speculative execution, out-of-order execution, superscalar execution, simultaneous multithreading, branch prediction and the like as well as two or three layers of cache - that try to make up for the big gap between CPUs and their I/O. The problem is, these features not only cost money, they also consume a lot of power. So you get the CPUs to do the work, but at a big cost.

Without naming any names, the Carnegie Mellon researchers say that a quad-core superscalar processor running at several gigahertz can process approximately 100 million instructions per joule, but these in-order, relatively stupid chips used in the wimpy nodes can deliver one billion instructions per joule while running at a much lower frequency.

To make some comparisons, the researchers took a single server node based on an Intel desktop quad-core Q6700 processor and put 2 GB of memory on it as well as a Mtron Mobi solid state disk. This machine was set up to run Linux with a tickless kernel (2.6.27) and all of the power management features were optimized. The machine consumed 64 watts when idle, and from 83 watts to 90 watts when it was loaded up doing query work.

The Intel server node was able to process 4.771 random 256 byte reads, providing an efficiency rating of 52 queries per joule. The 21-node FAWN cluster idled at 83 watts, and peaked at 99 watts during puts and 91 watts during gets. This is 36,000 queries against a 20 GB dataset, which is what gives you the 364 queries per joule (including the power drawn from the switch linking the nodes). Nodes based on the desktop mobo that Carnegie Mellon tested using disk drives instead of SSD did awful, as you would expect, delivering only 17 queries per joule.

The techies at Carnegie Mellon and Intel are not nuts. They are not suggesting that there is no place for disk, but rather that you have to have the technology reflect the dataset size and query rate you are trying to deliver, and you have to make choices.

CMU FAWN Tradeoffs

The tradeoffs between queries and data set size

As their research shows, there is a trade-off that seems to be unavoidable for now, because disk drives are more capacious than flash drives, which are fatter than main memory. And that is this: If you want to query large data sets, you need to use disks and that means you can't hit the high query rates of a FAWN array using only main memory (possible if you datasets are really tiny) or flash. That means you will pay a lot more for servers, and they will be a lot less efficient. ®

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
Intel, Cisco and co reveal PLANS to keep tabs on WORLD'S MACHINES
Connecting everything to everything... Er, good idea?
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Oracle hires former SAP exec for cloudy push
'We know Larry said cloud was gibberish, and insane, and idiotic, but...'
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.