Feeds

Atom smasher claims Hadoop cloud migration victory

Big-data love in the datacenter

HP ProLiant Gen8: Integrated lifecycle automation

Commodity servers running big CPUs with fat cores are not necessarily the best at running the Hadoop. Just ask the bunch of customers who have bought Atom-smasher micro servers from SeaMicro to crunch their big-data workloads.

SeaMicro has been peddling its SM10000-64 micro server, based on Intel's dual-core, 64-bit Atom N570 processor and cramming 256 of these chips into a 10U chassis.

The machine includes an integrated load balancer, an internal network switch that links the server nodes into a 3D torus (like supercomputers use), a slew of Gigabit Ethernet or 10 Gigabit Ethernet uplinks to the outside world, and 64 disk drives for the server nodes to store data upon. The SM10000-64 is not so much a micro server as a complete data center in a box, designed for low power consumption and loosely coupled parallel processing, such as Hadoop or Memcached, or small monolithic workloads, like Web servers.

SeaMicro is beating its chest about the fact that online match-maker eHarmony has recently switched from running its people-matching algorithms out there on a service provider's cloud to SM10000-64 machines running in its own data centers. eHarmony didn't say what cloud provider it used, but according to SeaMicro co-founder and chief executive Andrew Feldman, running the matching algorithms against the 29 different criteria in an eHarmony account against the combined user base of over 33 million lonely people looking for love in the right place, took too long and never ran at the same speed on the cloud.

SeaMicro SM10000 Side View

SM10000-64 plus eHarmony: Love at first byte.

The matching job done in Hadoop could take three to five hours, with the time varying depending on how busy the cloud was at any given time. And that unpredictability caused a logjam in the rest of eHarmony's applications, which are dependent on the results of these matching algorithms. Feldman was not at liberty to say how much faster the eHarmony matching algorithms run on the SM10000-64 machines, but tells El Reg that SeaMicro was able to "dramatically reduce the time it took to do the job". And by moving off the cloud, eHarmony has been able to cut its processing costs compared to what it was paying on the cloud by 74 per cent. Those cloud data upload charges sure do mount up, eh?

Sounds to us like it is time for someone to start a Hadoop cloud based on SeaMicro machines and with guaranteed service levels.

Feldman jokes that the eHarmony deal is the largest Hadoop implementation that SeaMicro is able to talk about, which suggests there are some government agencies with three-letter acronyms that are messing around with the micro servers.

On another Hadoop-related deal that SeaMicro won, the company can't talk about who the customer was but can talk about the benchmarking process it used to win the deal and what the results were.

At this customer site, the Hadoop job had to complete in 10 minutes and 50 seconds or less. The SeaMicro Atom-smasher was positioned against racks of Intel Xeon servers; both sets of machines ran the CentOS 5.4 clone of Red Hat Enterprise Linux and the Cloudera Hadoop distribution (CDH3 to be precise).

SeaMicro set up an SM10000-64 configuration that could do the Hadoop chew job in the allotted time and then kept adding Xeon boxes to the Xeon cluster until it got in under the allotted time. This benchmark ran the customer's applications using real customer data.

Power consumption was measured using Xitron 2801 power meters and aggregating the power consumption from the servers using National Instruments' LabView 7.1 graphical tool. Here's how the machines stacked up:

SeaMicro Hadoop test

SeaMicro Atom vs Xeon cluster on Hadoop data chewing

To get the job done in the customer's Hadoop calculation batch window, it took two whole SM10000-64 servers, each with 64 SATA disks and 512 cores running at 1.66GHz. Actually, the SeaMicro setup did it with 10 seconds to spare. This occupied 20U of space, or a little less than a half of a standard server rack, and consumed 880 watt-hours of juice during the run. Each chassis costs $140,000 at list price, so you are looking at $280,000 for this setup.

It took 76 1U rack servers, each equipped with two quad-core Xeon L5630 low-voltage processors running at 2.13GHz to do the Hadoop job. Each server had four SATA disks, for a total of 304 disk drives, a lot more than the 128 required for the SeaMicro machine.

Hadoop servers generally have at least six drives to avoid I/O contention and customers are increasingly moving to even higher disk drive counts these days. In any event, the Xeon setup running the customer workload filled nearly two racks and consumed 3,387 watt-hours of electricity during its 10 minute and 50 second run.

The SeaMicro machine did the job in one quarter of the rack space and burning one quarter of the juice.

To get a sense of what the Xeon solution would cost, I configured a ProLiant DL160 G6 server with two Xeon L5630 processors, 8GB of memory, and four 500GB disks, and that works out to $4,270 each. Just for the bare servers, you are in for $324,520, and you need to buy a couple of switches to lash them together. The operational costs will also play into the favor of the SeaMicro setup.

Intel will be crowing that it doesn't care whether customers use Atoms or Xeons, but the funny thing about the SeaMicro architecture is that it doesn't care about what processors it uses, either. It could turn out to be ARM or Tilera chips if the Atom roadmap is not aggressive enough in the future. ®

Reducing security risks from open source software

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
Amazon Reveals One Weird Trick: A Loss On Almost $20bn In Sales
Investors really hate it: Share price plunge as growth SLOWS in key AWS division
US judge: YES, cops or feds so can slurp an ENTIRE Gmail account
Crooks don't have folders labelled 'drug records', opines NY beak
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
Carbon tax repeal won't see data centre operators cut prices
Rackspace says electricity isn't a major cost, Equinix promises 'no levy'
prev story

Whitepapers

Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Seven Steps to Software Security
Seven practical steps you can begin to take today to secure your applications and prevent the damages a successful cyber-attack can cause.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.