Tilera throws gauntlet at Intel's feet
36-core Tile-Gx server chip crosses Sandy Bridge
Upstart mega-multicore chip maker Tilera has not yet started sampling its future Tile-Gx 3000 series of server processors, and companies have already locked in orders for the chips.
That is how eagerly hyperscale data center operators are anticipating some alternative to power-hungry Xeon processors from Intel and Opteron processors from Advanced Micro Devices.
Ahead of the Structure 2011 conference in San Francisco tomorrow, Tilera is lifting the veil a bit more on its Tile-Gx 3000 series of server chips, which will sport 36, 64, or 100 cores in a single socket. The Gx 3000 series of chips are the second prong in a three-prong attach that Tilera is making on the IT market.
The company already launched its Tile-Gx 8000 series  in early May, which are aimed at network equipment makers and which have encryption and zippy I/O capabilities that are not required for server workloads.
The company will eventually offer a Tile-Gx 5000 processors, with hefty compute capabilities aimed at video encoding and streaming workloads.
With the first-generation of Tile64 and second-generation TilePro processors from Tilera, the company had one chip design regardless of the intended workload.
MIPS for chips?
Tilera has not confirmed this – just like it won't confirm that its processor cores are based on a variant of the MIPS architecture. But it seems likely that instead of making three different chip designs with three different numbers of cores, Tilera is doing a deep sort on three different processors.
This would see it tweak clock speeds and deactivating features not needed on chips for specific workloads to aim at the server, streaming, and networking markets. This makes a lot more sense that having nine unique processor designs – three each for the 3000, 5000, and 8000 series.
The Tilera chips, regardless of generation, have the same basic idea: use simple cores, put lots of them on a chip, and link them together using a mesh network.
Each core on the chip has three instruction threads and has 32KB of L1 data cache and 32KB of L1 instruction cache, and also has a 256KB L2 cache; the mesh network is used to link those L1 and L2 caches into a single, coherent L3 cache shared by all the cores on the chip. (So the top-end, 100-core variant of the Tile-Gx chip has 32MB of total cache.)
The Tile-Gx chips have 64-bit processing on their cores, and include floating point math instructions that allow a floating point operating to be done in five cycles instead of hundreds of cycles when done in software.
This is, believe it or not, important for PHP support, Ihab Bishara, director of cloud computing applications at Tilera, tells El Reg.
The Tile-Gx chips might support 64-bit processing, but physical memory addressing on the chips is either 39-bit or 40-bit, which works out to either 512GB or 1TB of maximum main memory. Each core burns less than a half watt of power.
Here's what the block diagram of the 36-core Tile-Gx 3036 looks like:
Block diagram of Tilera's Tile-Gx 3036 processor
Although the Tile-Gx 8000 network processor had Multistream iMesh Crypto Acelerators, and are able to deliver 40 Gb/sec of bandwidth on cryptographic work and 20 Gb/sec on compression and decompression jobs, these seem to be deactivated on the Tile-Gx 3000 chips for servers.
The server and network processors both have packet processing accelerators, originally called the multicore programmable intelligent packet engine, or mPIPE for short and what Tilera is calling the Smart NIC hardware this time around.
The DDR3 main memory controllers are on the chip, as are PCI-Express and network interfaces, so there is no need for a northbridge or southbridge chipset. You grab this chip, put some memory and peripheral slots on the board, and away you go.
The Tile-Gx 3000 and 8000 series of chips are being fabbed by Taiwan Semiconductor Manufacturing Corp using its 40 nanometer processes – the same ones that AMD uses to make its GPUs. Prior generations of Tilera chips were made with 90 nanometer processes by TSMC.
Here's how the Tile-Gx 3000 server processors stack up against the Tile-Gx 8000 network processors:
Bishara says that the company is thinking about offering a variant of the Tile-Gx 3000 running at 1.2GHz, but at the moment the plan is to offer 1GHz and 1.5GHz clock speeds.
Sampling in July
The Tile-Gx 3036 chip will be sampling in July – and yes, Chinese server maker Quanta is one of the early OEMs getting access to these chips. This 36-core chip will have a total of 12MB of cache on the die, with 66Tb/sec of iMesh bandwidth across the 6x6 grid of cores and 200 Gb/sec of memory bandwidth across its two DDR3 memory controllers.
The Tile-Gx 3036 supports up to 512GB of memory, and delivers 48Gb/sec of bandwidth across its two PCI-Express 2.0 ports (one x8 and one x4). Bishara says that Tilera expects to see Tile-Gx 3036 processors appear in products by the end of the year.
The 64-core and 100-core variants of the Tile-Gx 3000 series implement their cores in 8 x 8 and 10 x 10 grids and scale up the DDR3 memory controllers and PCI-Express 2.0 lanes accordingly. The Tile-Gx 3064 and 3100 processors will sample in early 2012 and will appear in products about six months later if all goes according to plan.
Because it is using a non-x64 instruction set, the Tilera chip requires its own variant of the Linux stack, which Tilera has put together. Specifically, Tilera has taken a CentOS-compatible Linux stack with the Linux 2.6.36 kernel and over 2,000 RPM packages to create its own Linux. (Yes, that is an authentic replica of an exact duplicate.)
The current development tool chain includes ANSI-compliant C and C++ as well as PHP, and Tilera's software engineers are putting the final touches on tuning up the Just-In-Time (JIT) compiler for its Java stack, which is important for customers wanting to run Hadoop and other big-data crunchers on Tilera-based machines.
The stack will also support the Erlang language created by Ericsson, which is significant because the Couchbase NoSQL database is written in a combination of C and Erlang. Couchbase running on Tilera servers will be demonstrated at Structure this week, and the company has already shown off memcached  (written in C and C++) and Hadoop (written in Java) back in March.
Tilera vs Xeon
So how does the Tile-Gx 3000 stack up?
To give a sense of that, Tilera ran a merge/sort benchmark on its three variants of its Tile-Gx 3000 processors and also on an as-yet-unannounced eight-core "Sandy Bridge" Xeon E5 processor.
This benchmark was done at the behest of an unnamed storage company that is pitting the Tilera chips against Xeons. The Tile-Gx 3000s stack up pretty well in terms of performance and performance per watt, as you can see:
How the Tile GX-3000s stack up against Intel's Sandy Bridge
The 36-core chip stood toe-to-toe with the Xeon E5 running at 2.8GHz on the merge/sort benchmark, and the 64-core version did about 1.75 times the work and the 100-core version could do about 2.5 times the work. (This chart shows the old names of the Tile-Gx processors before they were broken into the 3000, 5000, and 8000 series.)
Perhaps more significantly, the Tilera chips just put the Xeon processors to shame on thermal efficiency, being able to do the work at ten times the efficiency per unit of work.
As you might imagine, Tilera wants to charge a premium for that efficiency, but it is prevented from getting too big for its britches because its Tile processors are not compatible with the x64 instruction set and require companies to port and certify their software to run upon them.
When pressed about its future pricing strategy, Bishara says that the Tile-Gx 3036 "will be competitive with the eight-core Sandy Bridge chip," and adds that Tilera "needs to be competitive with Intel on price so that means we need to at least match them". ®