Nehalems make like elephants on HPC memory test
Istanbul's touch of Alzheimer's
Intel's Nehalem EP chip has significantly out-peformed AMD's Istanbul on a set a memory-intensive benchmark tests.
The techies at supercomputer cluster maker Advanced Clustering Technologies are at it again, running their own benchmarks on single server nodes using popular high-performance computing tests normally used on entire clusters. This time around, ACT is putting the latest x64 chips into two-socket systems and running the Stream memory benchmark on the boxes.
By running various HPC tests on single servers, ACT is helping educate customers on the pros and cons of the new Intel quad-core 'Nehalem EP' Xeon 5500 and Advanced Micro Devices six-core 'Istanbul' Opteron 2400 processors.
On that test, one of ACT's Pinnacle rack servers equipped with two quad-core 2.66 GHz Xeon X5550s with 12 GB of DDR3 main memory running at 1.33 GHz was able to deliver 74.03 gigaflops of sustained performance against a peak theoretical performance of 85.12 gigaflops. But a Pinnacle machine configured with two of the six-core Opteron 2435 processors running at 2.6 GHz and 16 GB of DDR2 main memory running at 800 MHz was able to deliver 99.38 gigaflops (against a peak theoretical performance of 124.8 gigaflops).
So, AMD won that one - especially when you consider that the Opteron-based Pinnacle HPC node from ACT cost $3,500 compared to the $3,800 price on the Xeon-based Pinnacle box.
Now, with the Stream benchmark, the test is not about flops so much as memory bandwidth, and given the higher clock speed of the DDR3 main memory compared to DDR2 memory, you'd expect the Nehalem EP server node to do better than it did on the Linpack test. And indeed it did.
Corder's home-done Stream benchmark tests were done on exactly the same iron as the Linpack tests, and for good measure, Corder tossed in some numbers for older quad-core Xeons and Opterons to show how much better the new chips are versus the old.
The Nehalem EPs really cleaned the Istanbul's clocks on this test. Using 1.33 GHz DDR3 memory, the server using the X5570 processors was able to 37,122 MB/sec of bandwidth on the Stream test, while the machine equipped with 1.07 GHz memory modules hit 32,770 MB/sec and one using 800 MHz memory could handle 25,490 MB/sec. A Pinnacle server equipped with the earlier "Harpertown" Xeon 5400s - quad-core chips using the old frontside bus architecture and 800 MHz DDR2 main memory - could only deliver 9,776 MB/sec of bandwidth on the Stream test, and dropping down to 667 MHz memory pushed performance down to 6,102 MB/sec.
By contrast - and this is a big contrast - the Istanbul-based Pinnacle server using 800 MHz DDR2 main memory - as fast as it gets - topped out at 20,534 MB/sec of memory bandwidth on the stream tests, which was actually a little bit lower than the results ACT saw with a Pinnacle server equipped with quad-core "Shanghai" Opterons, which came in at 20,687 MB/sec. A server using the older quad-core "Barcelona" Opterons and 667 MHz DDR2 main memory was able to deliver 16,965 MB/sec on Stream.
As Intel has promised, ACT confirms that the Nehalem EP chips and their new QuickPath Interconnect bus architecture delivers nearly four times the memory bandwidth as its Harpertown predecessors, and nearly double the memory performance of the current crop of AMD Opterons. And there is nothing AMD can do about it until it switches to DDR3 main memory early next year with the "Magny-Cours" and "Lisbon" kickers to the Istanbuls.
AMD will be offering the G34 chipset with four DDR3 memory channels per socket (up to twelve DIMMs) and the C32 chipset with two channels per socket (up to four DIMMs). AMD's plan is to offer two different kinds of two-socket servers: one where memory bandwidth is key (that's the G34) and one where cheaper price and floating point or integer power are more important (that's the C32). AMD has the right idea. But it really needs this architecture to be here now to blunt Intel's considerable memory bandwidth advantage. ®