Original URL: https://www.theregister.com/2009/02/17/vmware_specweb_test/

Fake server beats real server on Web test

Careful with that server consolidation pitch

By Timothy Prickett Morgan

Posted in OSes, 17th February 2009 20:48 GMT

Server virtualization juggernaut - well, at least on x64 iron - VMware is beside itself with glee that a virtualized Linux server running atop ESX Server hypervisor narrowly beat out real Linux boxes on a popular Web serving benchmark test.

Yes, I know. This sounds like a perpetual motion machine, getting the hypervisor without any overhead. But the results that VMware has achieved on the SPECweb2005 benchmark test, administered by the Standard Performance Evaluation Corporation, might say more about the limits of that benchmark than they do about any overhead that a hypervisor like ESX Server imposes on a server.

Unfortunately for anyone trying to figure out what overhead ESX Server imposes, VMware's VMark benchmark is explicitly designed to obfuscate any calculations you might want to make. The apples to applesauce to oranges comparisons that you can make from the limited SPECweb2005 data similarly do not allow you to figure out the overhead that ESX Server imposes - and then reckon what a higher utilization level on the server for processors and memory enabled by the hypervisor might yield, thereby making it all worth the trouble.

But there is a bigger problem, which I will get into in a minute. First, a gripe.

If you really wanted to test the mettle of virtual servers against physical servers, you would test a suite of applications like the mix used in the VMark test on one server, and then load ESX Server on the same exact box and run one instance of the software, just like on the physical box. Then, you could see the overhead in a one-to-one comparison on identical iron.

Then, to drive up utilization, you could run two or maybe three instances of the software stack atop the hypervisor, showing at each new instance how much extra work you could do because virtualization forces higher system utilization. You could also add and subtract main memory to show the effect it has on physical and virtual workloads, too, which might help with capacity planning.

And finally, you could give a price for the configuration in each case to allow customers to calculate the bang for the buck each setup yields. Maybe virtualization drives up utilization and costs at the same rate; maybe not. Can you tell? Not from the VMark and SPECweb2005 tests.

Having said all that, some data is always better than none, and this being the first SPECweb2005 test that has had VMware's ESX Server hypervisor as part of the configuration, it is inherently interesting.

VMware chose a Hewlett-Packard ProLiant DL585 G5 server using 2.6 GHz quad-core "Shanghai" Opteron 8382 processors. The DL585 is a quad-socket rack server, so the resulting machine had 16 cores and the box was configured with 128 GB of main memory and four 73 GB disks to house ESX Server 3.5 Update 3. There were 15 virtual machine partitions created on the box, and each was configured with Red Hat Enterprise Linux 5 Update 1 as guest operating systems, and the HotSpot JVM (from Sun Microsystems) and the Rock Web server (from Accoria Networks) were loaded on each Linux instance.

One VM was pinned to a specific core in the box, but ESX Server could play with the extra one; a dozen VMs were set up with 8 GB of virtualized main memory, and the remaining three got 6 GB. Each of the VMs had four virtual network adapters, and VMware did some funky NUMA memory affinity settings to make it run in a balanced way (you can see that at the bottom of the SPEC report).

Anyway, the guest Linux operating systems atop ESX Server ran the SPECweb2005 benchmark suite, which simulates a suite of banking, e-commerce, and support applications, and linked out to two EMC Clariion CX3-40 Fibre Channel SAN arrays with a total of 14.6 TB of disk capacity. The DL585 had four 10 Gigabit Ethernet ports to link out to simulated end users and two Gigabit Ethernet links for hooking to the back end simulator, or Bessim in the SPEC lingo.

With those 15 VMs flailing away, the DL585 was able to support 80,000 simultaneous banking sessions, 69,525 e-commerce sessions, and 33,941 support sessions. These results are averaged and then normalized against a reference platform, yielding a rating of 44,000 on the SPECweb2005 test.

As it turns out, that ESX Server-based setup beat out a real DL580 server that HP did some tests on using Intel's quad-core "Harpertown" processors running at 2.93 GHz. The DL580 is a quad-socket box, and in this case, HP set it up with 64 GB of main memory, eight internal disks plus two MSA 70 disk arrays, with a total of 2.1 TB of disk capacity. Now, right here, before we go any further, that's half the memory and one-seventh the disk capacity of the ESX Server setup on similar processing iron. This is a pretty big piece of cash.

The DL580 with Xeons was set up with Red Hat Enterprise Linux 5, an earlier release, and used the Java HotSpot JVM and the Rock Web server. Now, on the SPECweb2005 test, as in other Java-based benchmarks, these JVMs provide a kind of system partitioning that allows multiple application clones to run side-by-side. Much as logical or virtual machine partitions do for real workloads.

But in the real world, where applications are not always written in Java, but could be systems software like databases that are written in C++ or god only knows what else, this JVM partitioning trick doesn't work. You are beginning to see the limits of the SPECweb2005 test, my friends, when it comes to using it as a means of demonstrating the usefulness of server virtualization.

Anyway, the real HP server could handle 71,104 banking sessions, 55,552 e-commerce sessions, and 36,032 support sessions, and that normalized down to a SPECweb2005 rating of 40,046. Which is lower than what the ESX Server setup had.

It is hard to say why because the machines are not identical. Given that this Intel machine uses the old front side bus architecture and external memory controllers out on the chipset, I think it is a safe guess that the integrated memory controllers and virtualization-assistance features of the Opteron chips more than compensated for the overhead imposed by ESX Server on the hardware resources. Having twice the main memory and two Fibre Channel disk arrays didn't hurt performance, either.

But do some math. Forget the difference in disk configuration and prices for a moment. Assume the DL580 and DL585 are roughly equivalent in price and performance. (The DL580 G6 using Intel's "Nehalem EP" Xeons probably will be.) If you buy a onesie today from HP, a barebones DL585 with four quad-core X7350 processors, 64 GB of memory, and four base disks (72 GB 2.5-inch SAS drives spinning at 15K RPM) costs $25,419 with RHEL 5 installed.

Infrastructure 3 Foundation edition costs $1,640 for two processors, or $3,280 for the DL580 and DL585 machines (for VMware, a processor is a socket, regardless of core count). Why would you pay that price plus another $3,811 for another 64 GB of memory to get an extra 9.9 per cent performance? That's increasing the cost of the system by $7,091, or 27.9 per cent, to get that extra oomph.

It is a bad strategy, unless there are other benefits to virtualization, such as high availability enabled through VMotion virtual machine teleportation, or rapid provisioning or image standardization that you can get from using the Virtual Infrastructure tools. And if you want all the bells and whistles, then we're really talking about $7,188 per two processor sockets, and that raises the overall price of the base system to $43,606 to get at that extra performance.

That's a 71.5 per cent increase. This may be well worth it, considering the benefits. But such a decision will not be driven by performance, rest assured. And what is true about VMware's sophisticated server virtualization software in this regard is equally true of the XenServer stack from Citrix Systems and whatever Microsoft cooks up with partners for Hyper-V.

A benchmark showing the consolidation of Exchange Servers or other groupware products would perhaps be a lot more compelling, I would guess. But until someone does a standard benchmark for such a workload, it will be hard to tell.

Bootnote: There is actually a SPECweb2005 benchmark test of the DL585 G5 running RHEL 5 that is more similar to the machine tested running the VMware hypervisor and RHEL slices. That machine under test used 2.3 GHz "Barcelona" quad-core processors, which ran at 300 MHz less than the Shanghai chips used in the ESX Server setup. The box had 64 GB of main memory, just like the Xeon box I mentioned above, and had a slightly different disk configuration (3.7 TB of capacity on four MSA 70 arrays plus some disks inside the box). It was able to achieve a rating of 43,854 on the SPECweb2005 benchmark, basically the same performance as the ESX Server setup that had twice the memory and a lot more disks, plus an extra 13 per cent more clocks.

A bunch of people reading this story, as well as VMware in its announcement, wanted to compare the DL585 running ESX Server to a 24-core DL580 setup using Intel's six-core "Dunnington" chips, but I didn't think this was a fair comparison even though both machines had 128 GB of memory because the processor core count is not close. That box was rated at 50,013 on the SPECweb2005 test, which shows that these extra eight cores got more work done, but there are diminishing returns since 50 per cent more cores only delivered 24.9 per cent more oomph on the test. ®