Feeds

Atom smasher claims Hadoop cloud migration victory

Big-data love in the datacenter

Choosing a cloud hosting partner with confidence

Commodity servers running big CPUs with fat cores are not necessarily the best at running the Hadoop. Just ask the bunch of customers who have bought Atom-smasher micro servers from SeaMicro to crunch their big-data workloads.

SeaMicro has been peddling its SM10000-64 micro server, based on Intel's dual-core, 64-bit Atom N570 processor and cramming 256 of these chips into a 10U chassis.

The machine includes an integrated load balancer, an internal network switch that links the server nodes into a 3D torus (like supercomputers use), a slew of Gigabit Ethernet or 10 Gigabit Ethernet uplinks to the outside world, and 64 disk drives for the server nodes to store data upon. The SM10000-64 is not so much a micro server as a complete data center in a box, designed for low power consumption and loosely coupled parallel processing, such as Hadoop or Memcached, or small monolithic workloads, like Web servers.

SeaMicro is beating its chest about the fact that online match-maker eHarmony has recently switched from running its people-matching algorithms out there on a service provider's cloud to SM10000-64 machines running in its own data centers. eHarmony didn't say what cloud provider it used, but according to SeaMicro co-founder and chief executive Andrew Feldman, running the matching algorithms against the 29 different criteria in an eHarmony account against the combined user base of over 33 million lonely people looking for love in the right place, took too long and never ran at the same speed on the cloud.

SeaMicro SM10000 Side View

SM10000-64 plus eHarmony: Love at first byte.

The matching job done in Hadoop could take three to five hours, with the time varying depending on how busy the cloud was at any given time. And that unpredictability caused a logjam in the rest of eHarmony's applications, which are dependent on the results of these matching algorithms. Feldman was not at liberty to say how much faster the eHarmony matching algorithms run on the SM10000-64 machines, but tells El Reg that SeaMicro was able to "dramatically reduce the time it took to do the job". And by moving off the cloud, eHarmony has been able to cut its processing costs compared to what it was paying on the cloud by 74 per cent. Those cloud data upload charges sure do mount up, eh?

Sounds to us like it is time for someone to start a Hadoop cloud based on SeaMicro machines and with guaranteed service levels.

Feldman jokes that the eHarmony deal is the largest Hadoop implementation that SeaMicro is able to talk about, which suggests there are some government agencies with three-letter acronyms that are messing around with the micro servers.

On another Hadoop-related deal that SeaMicro won, the company can't talk about who the customer was but can talk about the benchmarking process it used to win the deal and what the results were.

At this customer site, the Hadoop job had to complete in 10 minutes and 50 seconds or less. The SeaMicro Atom-smasher was positioned against racks of Intel Xeon servers; both sets of machines ran the CentOS 5.4 clone of Red Hat Enterprise Linux and the Cloudera Hadoop distribution (CDH3 to be precise).

SeaMicro set up an SM10000-64 configuration that could do the Hadoop chew job in the allotted time and then kept adding Xeon boxes to the Xeon cluster until it got in under the allotted time. This benchmark ran the customer's applications using real customer data.

Power consumption was measured using Xitron 2801 power meters and aggregating the power consumption from the servers using National Instruments' LabView 7.1 graphical tool. Here's how the machines stacked up:

SeaMicro Hadoop test

SeaMicro Atom vs Xeon cluster on Hadoop data chewing

To get the job done in the customer's Hadoop calculation batch window, it took two whole SM10000-64 servers, each with 64 SATA disks and 512 cores running at 1.66GHz. Actually, the SeaMicro setup did it with 10 seconds to spare. This occupied 20U of space, or a little less than a half of a standard server rack, and consumed 880 watt-hours of juice during the run. Each chassis costs $140,000 at list price, so you are looking at $280,000 for this setup.

It took 76 1U rack servers, each equipped with two quad-core Xeon L5630 low-voltage processors running at 2.13GHz to do the Hadoop job. Each server had four SATA disks, for a total of 304 disk drives, a lot more than the 128 required for the SeaMicro machine.

Hadoop servers generally have at least six drives to avoid I/O contention and customers are increasingly moving to even higher disk drive counts these days. In any event, the Xeon setup running the customer workload filled nearly two racks and consumed 3,387 watt-hours of electricity during its 10 minute and 50 second run.

The SeaMicro machine did the job in one quarter of the rack space and burning one quarter of the juice.

To get a sense of what the Xeon solution would cost, I configured a ProLiant DL160 G6 server with two Xeon L5630 processors, 8GB of memory, and four 500GB disks, and that works out to $4,270 each. Just for the bare servers, you are in for $324,520, and you need to buy a couple of switches to lash them together. The operational costs will also play into the favor of the SeaMicro setup.

Intel will be crowing that it doesn't care whether customers use Atoms or Xeons, but the funny thing about the SeaMicro architecture is that it doesn't care about what processors it uses, either. It could turn out to be ARM or Tilera chips if the Atom roadmap is not aggressive enough in the future. ®

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
'Kim Kardashian snaps naked selfies with a BLACKBERRY'. *Twitterati gasps*
More alleged private, nude celeb pics appear online
Wanna keep your data for 1,000 YEARS? No? Hard luck, HDS wants you to anyway
Combine Blu-ray and M-DISC and you get this monster
US boffins demo 'twisted radio' mux
OAM takes wireless signals to 32 Gbps
Google+ GOING, GOING ... ? Newbie Gmailers no longer forced into mandatory ID slurp
Mountain View distances itself from lame 'network thingy'
Apple flops out 2FA for iCloud in bid to stop future nude selfie leaks
Millions of 4chan users howl with laughter as Cupertino slams stable door
Students playing with impressive racks? Yes, it's cluster comp time
The most comprehensive coverage the world has ever seen. Ever
Run little spreadsheet, run! IBM's Watson is coming to gobble you up
Big Blue's big super's big appetite for big data in big clouds for big analytics
Seagate's triple-headed Cerberus could SAVE the DISK WORLD
... and possibly bring us even more HAMR time. Yay!
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Beginner's guide to SSL certificates
De-mystify the technology involved and give you the information you need to make the best decision when considering your online security options.