Feeds

Atom smasher claims Hadoop cloud migration victory

Big-data love in the datacenter

Remote control for virtualized desktops

Commodity servers running big CPUs with fat cores are not necessarily the best at running the Hadoop. Just ask the bunch of customers who have bought Atom-smasher micro servers from SeaMicro to crunch their big-data workloads.

SeaMicro has been peddling its SM10000-64 micro server, based on Intel's dual-core, 64-bit Atom N570 processor and cramming 256 of these chips into a 10U chassis.

The machine includes an integrated load balancer, an internal network switch that links the server nodes into a 3D torus (like supercomputers use), a slew of Gigabit Ethernet or 10 Gigabit Ethernet uplinks to the outside world, and 64 disk drives for the server nodes to store data upon. The SM10000-64 is not so much a micro server as a complete data center in a box, designed for low power consumption and loosely coupled parallel processing, such as Hadoop or Memcached, or small monolithic workloads, like Web servers.

SeaMicro is beating its chest about the fact that online match-maker eHarmony has recently switched from running its people-matching algorithms out there on a service provider's cloud to SM10000-64 machines running in its own data centers. eHarmony didn't say what cloud provider it used, but according to SeaMicro co-founder and chief executive Andrew Feldman, running the matching algorithms against the 29 different criteria in an eHarmony account against the combined user base of over 33 million lonely people looking for love in the right place, took too long and never ran at the same speed on the cloud.

SeaMicro SM10000 Side View

SM10000-64 plus eHarmony: Love at first byte.

The matching job done in Hadoop could take three to five hours, with the time varying depending on how busy the cloud was at any given time. And that unpredictability caused a logjam in the rest of eHarmony's applications, which are dependent on the results of these matching algorithms. Feldman was not at liberty to say how much faster the eHarmony matching algorithms run on the SM10000-64 machines, but tells El Reg that SeaMicro was able to "dramatically reduce the time it took to do the job". And by moving off the cloud, eHarmony has been able to cut its processing costs compared to what it was paying on the cloud by 74 per cent. Those cloud data upload charges sure do mount up, eh?

Sounds to us like it is time for someone to start a Hadoop cloud based on SeaMicro machines and with guaranteed service levels.

Feldman jokes that the eHarmony deal is the largest Hadoop implementation that SeaMicro is able to talk about, which suggests there are some government agencies with three-letter acronyms that are messing around with the micro servers.

On another Hadoop-related deal that SeaMicro won, the company can't talk about who the customer was but can talk about the benchmarking process it used to win the deal and what the results were.

At this customer site, the Hadoop job had to complete in 10 minutes and 50 seconds or less. The SeaMicro Atom-smasher was positioned against racks of Intel Xeon servers; both sets of machines ran the CentOS 5.4 clone of Red Hat Enterprise Linux and the Cloudera Hadoop distribution (CDH3 to be precise).

SeaMicro set up an SM10000-64 configuration that could do the Hadoop chew job in the allotted time and then kept adding Xeon boxes to the Xeon cluster until it got in under the allotted time. This benchmark ran the customer's applications using real customer data.

Power consumption was measured using Xitron 2801 power meters and aggregating the power consumption from the servers using National Instruments' LabView 7.1 graphical tool. Here's how the machines stacked up:

SeaMicro Hadoop test

SeaMicro Atom vs Xeon cluster on Hadoop data chewing

To get the job done in the customer's Hadoop calculation batch window, it took two whole SM10000-64 servers, each with 64 SATA disks and 512 cores running at 1.66GHz. Actually, the SeaMicro setup did it with 10 seconds to spare. This occupied 20U of space, or a little less than a half of a standard server rack, and consumed 880 watt-hours of juice during the run. Each chassis costs $140,000 at list price, so you are looking at $280,000 for this setup.

It took 76 1U rack servers, each equipped with two quad-core Xeon L5630 low-voltage processors running at 2.13GHz to do the Hadoop job. Each server had four SATA disks, for a total of 304 disk drives, a lot more than the 128 required for the SeaMicro machine.

Hadoop servers generally have at least six drives to avoid I/O contention and customers are increasingly moving to even higher disk drive counts these days. In any event, the Xeon setup running the customer workload filled nearly two racks and consumed 3,387 watt-hours of electricity during its 10 minute and 50 second run.

The SeaMicro machine did the job in one quarter of the rack space and burning one quarter of the juice.

To get a sense of what the Xeon solution would cost, I configured a ProLiant DL160 G6 server with two Xeon L5630 processors, 8GB of memory, and four 500GB disks, and that works out to $4,270 each. Just for the bare servers, you are in for $324,520, and you need to buy a couple of switches to lash them together. The operational costs will also play into the favor of the SeaMicro setup.

Intel will be crowing that it doesn't care whether customers use Atoms or Xeons, but the funny thing about the SeaMicro architecture is that it doesn't care about what processors it uses, either. It could turn out to be ARM or Tilera chips if the Atom roadmap is not aggressive enough in the future. ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
Don't worry about that cable, it's part of the config
Azure TITSUP caused by INFINITE LOOP
Fat fingered geo-block kept Aussies in the dark
You think the CLOUD's insecure? It's BETTER than UK.GOV's DATA CENTRES
We don't even know where some of them ARE – Maude
Want to STUFF Facebook with blatant ADVERTISING? Fine! But you must PAY
Pony up or push off, Zuck tells social marketeers
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
Oi, Europe! Tell US feds to GTFO of our servers, say Microsoft and pals
By writing a really angry letter about how it's harming our cloud business, ta
prev story

Whitepapers

Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Protecting users from Firesheep and other Sidejacking attacks with SSL
Discussing the vulnerabilities inherent in Wi-Fi networks, and how using TLS/SSL for your entire site will assure security.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.