The Register® — Biting the hand that feeds IT

Feeds

ARM cranks up cache and memory designs for servers

Gearing up for the x86-ARM war

Email delivery: Hate phishing emails? You'll love DMARC

ARM Holdings wants chip makers to bring more cores and cache to bear as they craft server chips based on its Cortex family of system-on-chip (SoC) designs, and to that end the company is boosting the on-chip caching and main memory controllers of its current ARMv7 and future ARMv8 designs to make them better able to compete against x86 systems.

The new CoreLink CCN-504 that ARM is showing off today at the Linley Tech Processor Conference in San Jose is a cache coherent network that lashes up to four quad-core Cortex-A15 processors and from 8MB to 16MB of L3 cache into a fully coherent, single system image. This network is at the very heart of the SoC and is what links the cores to each other, to the cache, to main memory controllers, and to peripheral controllers that are not resident on the processors.

In many cases, chip makers will take the updated CoreLink cache network and use it to link to controllers that they put on the die, and there is even a chance that some vendors will use the cache coherent network to link ARM processor cores and GPU or other kinds of coprocessors into a single complex that does hybrid ceepie-geepie computing on a single die.

This heterogeneous computing with CPUs and GPUs sharing L3 cache reduces the need to pass data from CPU to GPU and back again and to go out to main memory, which saves both time and energy in the supercomputing uses that ARM Holding and its enthusiasts see in their future. This hybrid approach, linking CPUs and GPUs through the L3 cache network is called big.LITTLE by ARM, and as you might expect, the first thing we need to do is give that feature a proper name that doesn't look like a ransom note.

ARM envisions that companies will be interested in plunking DSPs and other kinds of accelerators onto a SoC, and one interesting possibility might be the Epiphany line of RISC coprocessors created by Adapteva. Imagine putting two quad-core ARM chips and two of the 64-core Epiphany coprocessors into a single SoC.

ARM's new CoreLink CCN-504 cache coherent network

ARM's new CoreLink CCN-504 cache coherent network

The updated cache coherency network supports double the cores of the current generation used in the Cortex-A9 chips, and it is compatible with the quad-core Cortex-A15 reference design from ARM Holdings and its derivatives as well as with the impending 64-bit ARMv8 designs that are expected to start rolling out next year from a variety of vendors.

Applied Micro Circuits very much wants to be first to deliver an ARMv8-based server chip with its X-Gene processor, but we'll see. It is not clear that Applied Micro is using CoreLink to lash together cores and caches. At the moment, storage chip and array maker LSI and ARM server chip upstart Calxeda (which just raised $55m in funding in its second round this week) are the first licensees for the new cache circuits.

The cache controller network has a bandwidth of around 1Tb/sec and runs as high as CPU clock frequencies. It has a 128-bit bus and an integrated snoop directory to minimize the amount of broadcasting you have to do over computing elements to keep them coherent across the L3 cache.

The CCN-504 design also has clock gating on processors and L3 cache segments, allowing you to shut down either in bits to save on power. You can, if the workload allows it, completely power down the L3 cache after backing it up to memory and run cores with their L2 caches. The cache network also supports up to 18 AMB 4 AXI4 or ACE-Lite peripheral ports in addition to the processor ports and two memory controller ports.

Incidentally, there is also a new memory controller, called the CoreLink DMC-520, that is designed to work with the CCN-504 design. These controllers support DDR3, low-volt DDR3, and DDR4 memory sticks. The DDR4 spec was just published by the JEDEC Solid State Technology Association, and will run at 1.2 volts instead of the 1.5 and 1.35 volts of DDR3 sticks and have memory chips that range in size from 2Gb to 16Gb.

DDR4 is not expected to be ramped up until 2015, but should start trickling into systems in 2014. ARM shou7ld be on the front end of that transition rather than on the back-end if it wants to get some leverage, with DDR4 memory running twice as fast, at 3.2GHz, than DDR3 memory and using less power, too.

ARM also says that the CCN-504 caching coherency network is just the first in what will eventually be a family of designs, so do not think for a second that this will be the limit of scalability on ARM SoCs aimed at servers. Lead licensees of ARM intellectual property can get the CCN-504 and DMC-520 designs today, and ARM expects for products from partners using the designs to start sampling their products next year. ®

5 ways to reduce advertising network latency

Whitepapers

Microsoft’s Cloud OS
System Center Virtual Machine manager and how this product allows the level of virtualization abstraction to move from individual physical computers and clusters to unifying the whole Data Centre as an abstraction layer.
5 ways to prepare your advertising infrastructure for disaster
Being prepared allows your brand to greatly improve your advertising infrastructure performance and reliability that, in the end, will boost confidence in your brand.
Supercharge your infrastructure
Fusion­‐io has developed a shared storage solution that provides new performance management capabilities required to maximize flash utilization.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Avere FXT with FlashMove and FlashMirror
This ESG Lab validation report documents hands-on testing of the Avere FXT Series Edge Filer with the AOS 3.0 operating environment.

More from The Register

next story
Multipath TCP: Siri's new toy isn't a game-changer
This experiment is an alpha and carriers could swat it like a bug
Barmy Army to get Wi-Fi to the seat for cricket's Ashes
Sydney Test Match will offer replays to the smartmobe
Dedupe-dedupe, dedupe-dedupe-dedupe: Flashy clients crowd around Permabit diamond
3 of the top six flash vendors are casing the OEM dedupe tech, claims analyst
Disk-pushers, get reel: Even GOOGLE relies on tape
Prepare to be beaten by your old, cheap rival
Dragons' Den star's biz Outsourcery sends yet more millions up in smoke
Telly moneybags went into the cloud and still nobody's making any profit
Hong Kong's data centres stay high and dry amid Typhoon Usagi
180 km/h winds kill 25 in China, but the data centres keep humming
Microsoft lures punters to hybrid storage cloud with free storage arrays
Spend on Azure, get StorSimple box at the low, low price of $0
prev story