Original URL: https://www.theregister.co.uk/2013/02/19/tilera_tile_gx72_processor/

Tilera etches '*ss-kicking' 72-core system-on-chip for network gear

Current Tile-Gx Plan B fits market needs better than Plan A

By Timothy Prickett Morgan

Posted in Servers, 19th February 2013 14:03 GMT

It is not just difficult to design and manufacture a chip for workloads that will be run many years in the future, it is damned near impossible. This is because so many shifting alternative technologies will materialize between the time you make your plan and when it is executed. Any chipmaker has to be both flexible and patient - an equally difficult feat for both upstart processor vendors and incumbents. Tilera, still very much in startup mode nine years after its founding, is getting traction with its many-cored Tile-Gx system-on-chips and is rolling out a new model with 72 cores on a single die.

Tilera has tweaked the Tile-Gx SoC lineup a bit from its plans nearly three years ago

Tilera has tweaked the Tile-Gx SoC lineup a bit from its plans nearly three years ago

"We are very confident that the GX72 will kick ass," says Bob Doud, director of marketing at Tilera. He told El Reg that the 72-core chip splits the difference between the two high-end chips that were expected to follow the 36-core variant of the Tile-Gx family - code-named "Greylock" - to market.

The 64-core and 100-core variants are not coming, and instead of a 100-core chip that was expected late last year, a 9-core variant was quietly put into the field for embedded system users who needed very modest but radically low-powered RISC processors that can run Linux and that have everything you need in a server or embedded device but on a memory stick.

The Tile-Gx family of chips are the third generation of SOCs to come out of Tilera and are implemented in a 40 nanometer process from Taiwan Semiconductor Manufacturing Corp. The first two generations – the Tile and TilePro SoCs – used an ancient 90 nanometer process that is cheap and well understood. TSMC had its share of issues with 40 nanometer processes, which gave Tilera some grief, but so did the fact that Tilera has created a radically different processor architecture that takes a little time to sell.

Tilera must have been hoping that it would be able to break into the server racket, but that is a tough market. That said, Taiwanese ODM Quanta is an investor in Tilera and has been quietly peddling custom servers based on Tile processors for several years. Doud tells El Reg that three of the top 20 hyperscale data centers use servers based on Tile SoCs in some capacity in their vast data centers. Moreover, Tilera has sold "many tens of thousands" of the Tile-Gx36 chip that was launched late in 2011 and that shipped in the summer of 2012, displacing Freescale PowerPC chips in a number of router designs to name one case, he adds. The Tile-Gx family has over 100 designs wins in various kinds of network and server devices about a year after it was available, and up from 20 design wins a year ago.

Sure, the Tile-Gx36 was expected at the end of 2010 originally and didn't start sampling until a year later. But Tilera has nearly doubled its sales force and "revenues are rising dramatically," according to Doud. And the chips are finding their ways into switches, routers, network adapter cards, various kinds of special network devices like load balancers or video transcoders where the x86 or Power processor costs too much and burns too much heat for the work it does. This is pretty good as well in a world where ARM processors are sucking up more and more oxygen in the data center processor conversation.

The Tilera Tile-Gx72 system-on-chip

The Tilera Tile-Gx72 system-on-chip

Tilera may have trimmed back on its high-end Tile-Gx chip plans by doing a 72-core variant instead of two chips with 64 and 100 cores, but the basic design of the SoC has not changed. The premise of the design, which is a spinout of a US Defense Advanced Research Projects Agency (DARPA) effort at MIT called Raw, is to put lots of relatively simple cores on a chip and link them with a mesh network with enough coherency that a single copy of the operating system can run across those cores.

In the modern lingo, it is a large number of wimpy cores that look like a massively multithreaded single brawny core as far as Linux is concerned.

For the Tile-Gx chips, the 64-bit cores have three instruction threads and also sport additional SIMD instructions that make use of a four multiplier-accumulator (MAC) per cycle unit that can deliver 600 billion MACs per second. (Which means it can whip the ass of a digital signal processor.) The Tile-Gx cores have floating point math instructions that allow a floating point operating to be done in five cycles instead of hundreds of cycles when done in software. Each one of those Tile cores burns something on the order of 400 milliwattts implemented in the 40 nanometer process and spin at 1GHz or 1.2GHz.

The Tile-Gx core has 32KB of L1 data cache, 32KB of L1 instruction cache, and 256KB of L2 cache; the mesh network across the cores is used to link those L1 and L2 caches into a single, coherent L3 cache shared by all the cores on the chip. The Tile-Gx72 announced today has a total of 23MB of cache memory across its die.)

Block diagram of the Tile-Gx72 chip

Block diagram of the Tile-Gx72 chip

Depending on the Tile-Gx make and model, physical main memory is set at either 39-bits (for a maximum of 512GB) or 40-bits (for a maximum of 1TB). The Tile-Gx72 has four memory controllers that have an aggregate of 60GB/sec of memory bandwidth, and the iMesh network that links cores to each other and to peripheral and network ports on the die has more than 100Tb/sec of bandwidth.

The chips also store two Multistream iMesh Crypto Accelerator (MiCA) engines and are able to deliver 40Gb/sec of bandwidth on cryptographic work and 20Gb/sec on compression and decompression. The Tile-Gx chip also includes a packet processing accelerator that sits between the cores and the on-chip network interfaces called mPIPE (short for multicore programmable intelligent packet engine), which does load balancing between the cores and the network interfaces. The chip has four PCI-Express 2.0 controllers (two x4 and two x8) and virtualized network interfaces that can implement eight Ethernet ports running at 10Gb/sec or 32 running at 1Gb/sec.

Powering up a quad-port NIC with a Tile-Gx72 processor

Powering up a quad-port NIC with a Tile-Gx72 processor

Because it has its own instruction set (inspired by minimalist designs like the MIPS architecture), Tilera has to cook up its own Linux variant. In this case, Tilera has a variant of the CentOS clone of Red Hat Enterprise Linux (based on the Linux 3.0 kernel) with over 2,000 packages running on it. ANSI-compliant C and C++ as well as PHP and Java are supported on the chip and with this Linux, and Erlang was in the works but it is not clear if it is done.

Looking ahead, Tilera is still working on its "Stratton" kickers to the Greylock chips, but is not making any announcements about when they might be due or what they may look like. The plan, nearly three years ago, was for the Stratton chips to come out in 2013 and sport as many as 200 cores on a die with a shrink to 28 nanometers at TSMC. That would allow for servers to be made such that you could cram as many as 40,000 cores into a single rack.

Doud says the Tile-Gx72 will offer anywhere from 1.6X to 2X the performance of competing x86, ARM, Power, or DSP platforms for networking or server jobs. And because of this, Tilera is going to take a little more time with the Stratton generation. "We have seen people have problems with 28 nanometer, and we are perfectly happy to let them clean the pipes. We will focus on our future architecture and current sales and let Nvidia do the spins."

Tilera didn't release pricing on the Tile-Gx72 processor. ®