Intel (finally) uncages Nehalem-EX beast
Like Itanium. But you might actually use it
Intel's switch to the Nehalem architecture was finally completed Tuesday with the launch of the Nehalem-EX Xeon 6500 and 7500 processors, the last of the Core, Xeon, and Itanium chips to get the Quick Path Interconnect and a slew of features that make Intel chips compete head-to-head with alternatives from Advanced Micro Devices. The price war at the midrange and high-end of the x64 market can now get underway, while the all-out, total price war awaits the debut of AMD's Opteron 6100 processors in the second quarter.
Since the summer of 2008. Intel has been previewing its top-end, eight-core Nehalem-EX beast, which we now know as the Xeon X7560. As it has done with prior generations of Xeons, the Nehalem-EX line is not comprised of one or two chips, but a mix of chips with different features (clock speed, cache memory, HyperThreading, and Turbo Boost) dialed up and down to give customers chips tuned for specific workloads.
While last year's Nehalem-EP Xeon 5500  and this year's Westmere-EP Xeon 5600  processors are aimed at workstations or servers with two sockets, with the Nehalem-EX lineup, Intel has broadened the definition of its Expandable Server (this is apparently what EX is short for, with EP is supposed to be an abbreviation for Efficient Performance) to include two-socket machines as well as the four-socket and larger machines that prior generations of Xeon MP processors were designed for.
Intel, no doubt, would have preferred to keep the Xeon DP and Xeon MP product lines more distinct, and charged a hefty premium for machines that needed expanded processor sockets or memory capability. But server makers and their customers were having none of that. With the rapid adoption of server virtualization and the need for larger memory footprints even for two-socket boxes, the Nehalem-EX processors have been tweaked so they can be used to support very fat memory configurations on even two-socket workhorse servers. This will eat into the volume Xeon 5500 and 5600 market, to be sure, but it is better to sell a Xeon 6500 or 7500 server in a two-socket box than have a customer dump Intel for AMD.
The Xeon 6500 and 7500 processors will also blur some lines between Xeon processors and the former "flagship" Itanium processors, which were supposed to take over the desktop and server arena starting a decade ago, but have been relegated mostly to high-end servers from HP running HP-UX, NonStop, and OpenVMS at this point in their history. The Itaniums were distinct in many ways from the Xeons, but the main distinction they held was better reliability, availability, and serviceability (RAS) features than Xeons had, and on par with mainframe, RISC, and other proprietary architectures from days done by.
The eight-core Nehalem-EX Xeon 7500 beast
But at the launch event today in San Francisco, Kirk Skaugen, vice president of the Intel Architecture Group and general manager of its Data Center Group, made no bones about the fact that the Nehalem-EX processors and their related Boxboro chipset that is shared with the Itanium 9300 processors launched  in early February have common RAS features.
The new chip, explained Skaugen, has 20 new RAS features, including extended page tables and virtual I/O capabilities as well as a function that is in mainframes, RISC iron, and Itaniums called machine check architecture recovery, which allows a server to have a double-bit error in main memory and cope with it without halting the system. With Windows, Solaris, and Linux supporting these RAS features, as well as VMware's ESX Server hypervisor, this makes servers based on the Xeon 7500s just as suitable a replacement for proprietary midrange and mainframe platforms and RISC/Unix servers as the formerly beloved Itaniums.
Skaugen said that the Nehalem-EX chips would allow server makers to create two-socket servers that support up to 512GB of main memory, nearly three times as much as AMD can do using 8GB DIMMs with the Magny-Cours Opteron 6100s announced  yesterday. Intel will be able to support 1TB of main memory in a four-socket configuration, while the controller inside the Opteron 6100 only allows a four-socket machine using these chips to address a maximum of 512GB.
Skaugen rubbed it in a little that Intel's Nehalem-EX partners had over 50 new products in rack, tower, and blade form factors, and that it had 75 per cent more four-socket designs than with any prior server chip launch in its history. A dozen OEM partners have 15 different servers in the works that will span eight or more processor sockets, and apparently some are pushing their designs up to 16, 32, or 64 sockets.
The big bad box at the Nehalem-EX launch, of course, was the Altix UV massively parallel supercomputer, which El Reg told you all about last November . The Altix UV machines allow for up to 2,048 cores (that's 256 sockets and 128 two-socket blades) to be lashed together in a shared memory system suitable for running HPC codes. The shared global memory is not the same as a more tightly coupled symmetric multiprocessing (SMP) or non-uniform memory access (NUMA) cluster used in general purpose servers for running applications and databases. But that said, the Altix UVs are very powerful machines indeed and are intended to scale to petaflops of performance.
The Boxboro chipset that Intel is shipping as a companion to the Nehalem-EX chips supports configurations with two, four, or eight sockets gluelessly. If you want more sockets than that, you have to create your own chipsets, as HP, IBM, Silicon Graphics, and Bull have done for sure and others will no doubt follow.
But you can't just plug any old Nehalem-EX chip into any old configuration. That would be too simple, and Intel likes to charge premiums for features, like most capitalists. Take a gander at the feeds and speeds of the Nehalem-EX lineup:
The Intel Nehalem-EX Xeon 7500 and 6500 processors
The first thing you will notice is that there are two different families of Nehalem-EX processors. The Xeon 7500s are aimed at general-purpose workloads and offer the most socket expandability. All of these chips can be used in two-socket or four-socket boxes, and some of them can be used in eight-socket or larger machines, too. The Xeon 6500s are cut-down versions of the chips that only work in two-socket boxes and that are specially tuned for the HPC market. These chips, explained Skaugen, were optimized to have the highest bytes per floating point operation ratio while minimizing the amount of node-to-node communication among the processors in the complex.
The top-end X7560 part has eight-cores spinning at 2.26GHz, has 24MB of L3 cache on the chip, and is rated at 130 watts using Intel's thermal design point (TDP) scale. The chip supports Turbo Boost, which allows for a core to have it cycle time jacked up if other cores are shut down when they're not being used, and it also supports Intel's HyperThreading simultaneous multithreading, which virtualizes the physical pipeline in the chip so it looks like two virtual pipelines to a system's operating system and its applications. In best-case scenarios, HT can boost performance of applications by around 30 per cent. In 1,000-unit trays, the per-chip price for the X7560 is a whopping $3,692. That is exactly what Intel charged for a dual-core Montvale Itanium 2 with 24MB of L3 cache.
The X7550 drops the clocks down to 2GHz, chops the L3 cache down to 18MB, and the price comes down to $2,729, which is exactly what Intel was charging for its top-bin six-core Dunnington Xeon X7460 processor running at 2.66GHz with 16MB of L3 cache. The next part down, the X7542, jacks the clocks up to 2.66GHz, drops the cache down to 18MB, cuts out HyperThreading, and reduces the core count down to six from eight; the price drops down to $1,980.
For that same $1,980 you can get a standard 105 watt part, the E7540, running at 2GHz with six cores and that same 18MB cache. If you are willing to take lower clock speeds, you can get even cheaper standard parts, the E7530 and E7520, which cost $1,391 and $856, respectively. Intel has also cooked up two low-voltage parts, the L7555 and L7545, running at 1.86GHz and rated at 95 watts, which have eight and six cores, respectively. These are reasonably pricey chips that will no doubt be used inside Nehalem-EX blade servers where a premium is expected in exchange for extra density.
Generally speaking, the Xeon 6500 processors are cheaper than their Xeon 7500 counterparts because they have some features and functions turned off, as El Reg predicted they would  last fall. This is in keeping with the general philosophy that HPC shops are super-stingy and will not pay one extra penny for a feature they don't want and will never use.
The Nehalem-EX processors are implemented in 45 nanometer processes and have 2.3 billion transistors. ®