In the Epyc center: More Zen server CPU specs, prices sneak out of AMD
And a quick look at the chips' encrypted RAM tech
Posted in Servers, 20th June 2017 20:00 GMT
Updated Here it is: the official lineup of AMD's Epyc processors, which will go toe to toe with Intel's Xeons that utterly dominate the data center world.
Epyc. That's not a typo. AMD's desktop and laptop chips are called Ryzen, and its server-class family is called Epyc. Welcome to the new AyyyyyyyMD. Both Ryzen and Epyc are built from AMD's x86 Zen microarchitecture.
Is this AMD's big comeback; can it hope to compete against monopoly player Intel; blah, blah, blah – we'll go through all that later. For now, let's skip the opinions, and instead talk specs: each Epyc part is a 14nm system-on-chip (SoC) processor fabricated by Global Foundries. There are four silicon dies in each package, rather than one mega-die, which is cheaper and easier to manufacture. Up to eight cores can be used per die, or up to 32 in total per processor package; each core can run one or two hardware threads.
The dies are all connected together internally using AMD's Infinity Fabric – an enhanced version of HyperTransport. Infinity is also used to connect dual-socket Epyc CPUs together. Each processor package can support up to 2TB of DDR4 RAM over eight channels, and has 128 PCIe lanes. When you pair two Epyc SoCs in a dual-socket system, they each give up 64 PCIe lanes to talk to each other using the Infinity protocol. In other words, a single-socket Epyc has 128 PCIe lanes available, and a dual-socket Epyc system has 128 lanes available – not 256 – because each processor gives up 64 lanes for their socket-to-socket interconnect.
AMD is, by the way, really pushing Epyc as a great single-socket chip, with dual-socket capabilities if you need it.
Die four me ... inside an AMD Epyc chip (the little loops represent the Infinity fabric)
The Epyc family supports AMD's encrypted memory features [whitepaper, manual]. This works in one of three modes. One is transparent mode in which all reads and writes to RAM are decrypted and encrypted by a key held in the memory controller. The key is generated during power-up by the BIOS, and should never leave the controller – it can't be read by any software. This cryptography happens transparent to the running operating systems, applications, hypervisors, and virtual machines.
The second mode is SEM (Secure Encrypted Memory) in which selected pages of memory can be marked by the underlying operating system as encrypted or non-encrypted, and the controller takes care of the cryptography again using a key only it knows and is regenerated on boot. Just set bit 47 in the physical address mapping of a given page to enable encryption.
Both of these modes are designed to prevent miscreants with physical access to a box from being able to sniff the contents of the RAM from the buses while the computer is running, or stop thieves from seizing non-volatile RAM DIMMs and extracting sensitive information held on them. This is for people who are paranoid that someone is going to literally break into their machines.
The other mode is fscking insane: it's SEV (Secure Encrypted Virtualization). It is AMD's courageous attempt to provide encrypted virtual machines that are protected from the hypervisor, the underlying operating system, other VMs, and any other code on the machine.
Each VM is assigned an address space ID (ASID) as normal by the hypervisor, and this ID is tied to an encryption key held in the controller. When CPU core time is given to a virtual machine, the controller takes the VM's ASID, looks up its private key, and uses that for encrypting and decrypting all memory accesses on the fly. The hypervisor has its own ASID – zero – and can never see the keys. Thus not even a rogue or hijacked hypervisor can make sense of a virtual machine's contents, let alone any other software running in other VMs, because all the data will appear scrambled. The hypervisor and host operating system simply don't have the keys.
Here's where it gets weird. SEV is designed for paranoid people who don't trust whoever is hosting their virtual machines. The technology verifies that a VM started as expected and wasn't tampered with before or during boot-up, and that the encryption system is working correctly. This involves AMD holding a database of signing keys for each platform, and yeah... we'll dig into this in detail later.
All the cryptography (it's AES-128) happens on the fly before the data leaves the SoC, adding about 7ns of latency to each access. That translates into a performance hit of 1.5 per cent, we're told, when enabled. It works across multiple cores, and even with DMA in certain circumstances. It's all powered by an ARM Cortex coprocessor and AMD's custom firmware, all encased in the Epyc SoC – and thus it all hinges on that small chunk of hidden code not being buggy. The coprocessor also provides services such as secure boot, ensuring only cryptographically signed operating systems start up, if required.
Below are the official stats for the new Epyc parts, announced today, paired with Intel CPUs that AMD is pitching each of its components against.
For example, according to a presentation AMD gave to analysts and journalists on Monday at its offices in Austin, Texas, the Epyc 7601 was compared to the Intel Xeon E5-2699A v4. In a SPECint_rate_base2006 integer-based benchmark run by AMD, the 7601 was, we're told, 47 per cent faster than the Intel part.
This particular benchmark tests for standard, everyday performance, rather than peak output. We're usually highly allergic to vendor-issued benchmarks, but we're publishing these to give you an idea of where AMD is trying to position itself in the data center market, and the sort of components it's gunning for. It's not indicative of true performance because it doesn't compare to running real or involved workloads – for example, virtual machines that span multiple cores, which stresses other parts of the chip such as the inter-core connectivity.
No rival was suggested for the 7501, by the way, so we compared it to a Xeon E5-4669 v4 for the hell of it.
As for the headings: cores should be obvious, it's the number of CPU cores per system-on-chip package; threads is the number of hardware threads; base and turbo are the normal and peak CPU clock frequencies; L3 is the last-level cache size; TDP is the maximum power draw; SPECint is the increase the AMD part has over its Intel rival in AMD's own aforementioned benchmarks; and price is the recommended retail price. Where there are two TDP figures, the part can be configured to operate in either mode – high power and performance versus lower power and lower performance.
AMD has split its Epyc SKUs into dual and single socket classes – they can be used in either configuration, though, unless they are a P-coded SKU, and a couple appear twice because they straddle both classes. So, for example, AMD recommends using the 7301 in a dual-socket system as an alternative to a pair of $800-plus Intel Xeon E5-2640 v4s, and the 7551P in a single-socket server versus a pair of Xeon E5-2650 v4s. Yes, AMD is pitching its single-socket-class SKUs against selected dual-socket Intel chips, claiming it can outperform them.
|CPU SKU||Cores / threads||Base / turbo GHz||L3 (MB)||TDP||SPECint||Price|
|Epyc 7601||32 / 64||2.2 / 3.2||64||180W||+47%||$4000|
|Xeon E5-2699A v4||22 / 48||2.4 / 3.6||55||145W||-||$4938|
|Epyc 7551||32 / 64||2 / 3||64||180W||+44%||$3200|
|Xeon E5-2698 v4||20 / 40||2.2 / 3.6||50||135W||-||$3226|
|Epyc 7501||32 / 64||2 / 3||64||155/170W||N/A||Unknown|
|Xeon E5-4669 v4||22 / 44||2.2 / 3||55||135W||-||$7007|
|Epyc 7451||24 / 48||2.3 / 3.2||48||180W||+47%||$2400|
|Xeon E5-2695 v4||18 / 36||2.1 / 3.3||45||120W||-||$2428|
|Epyc 7401||24 / 48||2 / 3||48||155/170W||+53%||$1700|
|Xeon E5-2680 v4||14 / 28||2.4 / 3.3||35||120W||-||$1745|
|Epyc 7351||16 / 32||2.4 / 2.9||32||155/170W||+63%||$1100|
|Xeon E5-2650 v4||12 / 24||2.2 / 2.9||30||105W||-||$1171|
|Epyc 7301||16 / 32||2.2 / 2.7||32||155/170W||+70%||$800|
|Xeon E5-2640 v4||10 / 20||2.4 / 3.4||25||90W||-||$939|
|Epyc 7281||16 / 32||2.1 / 2.7||32||155/170W||+60%||$600|
|Xeon E5-2630 v4||10 / 20||2.2 / 3.1||25||85W||-||$671|
|Epyc 7251||8 / 16||2.1 / 2.9||16||120W||+23%||$400|
|Xeon E5-2620 v4||8 / 16||2.1 / 3||20||85W||-||$422|
|CPU SKU||Cores / threads||Base / turbo GHz||L3 (MB)||TDP||SPECint||Price|
|Epyc 7551P||32 / 64||2 / 3||64||180W||+21%||$2000|
|2 x Xeon E5-2650 v4||12 / 24||2.2 / 2.9||30||105W||-||$1171|
|Epyc 7401P||24 / 48||2 / 3||48||155/170W||+22%||$1000|
|2 x Xeon E5-2630 v4||10 / 20||2.2 / 3.1||25||85W||-||$671|
|Epyc 7351P||16 / 32||2.4 / 2.9||32||155/170W||+21%||$700|
|2 x Xeon E5-2620 v4||8 / 16||2.1 / 3||20||85W||-||$422|
|Epyc 7281||16 / 32||2.1 / 2.7||32||155/170W||+63%||$600|
|2 x Xeon E5-2609 v4||8 / 8||1.7 / 1.7||20||85W||-||$310|
|Epyc 7251||8 / 16||2.1 / 2.9||16||120W||+38%||$400|
|2 x Xeon E5-2603 v4||6 / 6||1.7 / 1.7||15||85W||-||$213|
(The above prices are list prices according to our sister site The Next Platform – AMD officially says the 7601, the 7551 and the 7501 start from $3,400; the 7451 and 7401 start from $1,850; the 7351, 7301 and 7281 start from $650; and the 7251 starts from $475. The one-socket 7551P is priced $2,100, the 7401P is $1,075, and the 7351P is $750.)
So, here are some initial thoughts on the above. The power figures might surprise you. Also, the above Xeons are all 14nm scale-out Broadwell E5-26xx parts from 2016, rather than beefy scale-up E7s or the full-fat Broadwell E5-46xx family. And don't forget, Intel is launching its Skylake-based Xeons this year, meaning we don't yet know how the fledgling Epyc will stand up against Chipzilla's next wave of server processors.
For now, AMD is comparing its Epyc products to the vast majority of server processors being bought and shipped today – scale-out workhorse Broadwells filling up data centers worldwide. Crucially, it will all come down to the price: the argument will be that you can buy an alternative to a given Xeon for less money. Ultimately, AMD has to look good on one key metric: performance per watt per dollar – it's all the big chip buyers, like Google and Facebook, care about after years of paying eye-watering prices for Intel chips. Something new has to come along to challenge Chipzilla's levies on the industry.
Regarding power, AMD says its Epyc processors are system-on-chips: they contain the north and southbridges in the package, rather than as separate controllers, so all you have to do is add some RAM. And storage and networking and any GPUs, and so on. So, some of the chipset power is absorbed into the Epyc SoCs.
For what it's worth, the above Broadwell E5-26xx v4s each have 40 PCIe lanes, and support up to 1.54TB of RAM, per socket. Each Epyc has 64KB and 32KB of L1 instruction and data cache, respectively, versus 32KB for both in the Broadwell family, and 512KB of L2 cache versus 256KB. AMD says Epyc matches the Broadwells in L2 and L2 TLB latencies, and has roughly half the L3 latency of Intel's counterparts.
We understand the Epyc chips are available from today, and will start shipping in July. People we know testing the hardware at the moment say they're expecting system firmware updates next month or so to hopefully iron out lingering performance issues in the launch silicon.
Finally, AMD is also talking up its Radeon graphics processors for accelerating AI software. Look out for the Radeon Instinct MI25 (Vega architecture, 16GB HBM2 RAM, 300W, dual PCIe slot) for training; the MI6 (Polaris architecture, 16GB DDR5 RAM, 150W, single slot) for training and inference; and the MI8 (Fiji architecture, 4GB HBM1 RAM, 175W, dual slot) for inference.
We're told each MI25 can hit 12.3TFLOPS using 32-bit floating-point math, or 24.6TFLOPS using 16-bit FP, and has a 484GB/s memory bandwidth. The MI16 can top 5.7TFLOPS using 16 or 32-bit FP, with a memory bandwidth of 224GB/s. The MI8 can reach 8.2TFLOPS using 16-bit or 32-bit FP, and has a memory bandwidth of 512GB/s. They're all due to start shipping to "technology partners" in the third quarter of this year.
Check back later this week for a full dive into Epyc and Zen's architecture, with a roundup of AMD's latest desktop, server and GPU accelerator offerings, once we've escaped the sweltering Texas climate. ®