Tilera to stuff 200 cores onto single chip
Plus memory, controllers, mesh network...
Multicore chip upstart Tilera has announced an ambitious product roadmap for its TileGX systems-on-a-chip that will see the company plunk up to 200 cores – plus their memory and peripheral controllers and a mesh network linking the chips – onto a single die within the next few years. The company is also trotting out a new server partner and investor – PC and server maker Quanta – using its current TilePro64 processors in a 2U server that has 512 cores jam-packed into a 2U rack form factor.
Yup, that's a factor of five better than the SM10000 Atom-based server announced last week by upstart server maker SeaMicro, which is putting 512 Atom Z530 cores into a 10U form factor, including switching and storage for the server nodes. The SeaMicro server can run Windows, Linux, or anything else with the appropriate drivers that run atop the x64 architecture.
The 32-bit RISC cores on the Tilera chips have an undisclosed architecture (that everyone suspects is a variant of the MIPS architecture) and are restricted to a home-grown variant of Linux cooked up by Tilera. So SeaMicro has the advantage on compatibility with existing x64 systems and Tilera has the advantage on density and, depending on how this Quanta server is eventually priced, perhaps on bang for the buck and performance per watt.
The service providers that Tilera and Quanta are hoping to sell the Tile-based servers to are not afraid to recompile a Linux software stack on a new architecture if it gives them an edge, any more than they care about using various Unix and Linux systems for specific jobs today. So the lack of compatibility with the x64 architecture should not be an issue for the cloudy niche that Tilera and Quanta are chasing, just like it has not been an issue for the nearly 50 design wins for networking and security appliances that had adopted Tilera SoCs for future products.
Tilera and Quanta are not revealing all that much about the future S2Q servers, but they did put out some details. The server uses so-called "twin" half-width server boards, and according to Ihab Bishara, director of cloud computing applications at Tilera, it's based roughly on an x64 mobo design that has been rejiggered to support the TilePro64 processor. The TilePro64 chip is made by Taiwan Semiconductor Manufacturing Corp using a 90 nanometer process and was announced in September 2008.
It has 64 cores on a single die (in an 8x8 grid) along with 16 KB of L1 cache per core, 5.6 MB of L2/L3 cache per core. The L2 caches are made coherent by the iMesh mesh interconnect and function like an L3 as well as segmented L2 caches for each core. Wrapped around the Tile cores are four DDR2 main memory controllers, two Gigabit Ethernet ports, two PCI Express controllers, two 10 Gb/sec XAUI interfaces, and two flexible I/O interfaces to support peripherals like as compact flash memory or disk drives.
The iMesh network on the chip is actually five separate networks to handle memory access, streaming packet transfers, user data network, cache misses, and interprocess communications. That iMesh also allows for a Linux instance to span multiple cores, SMP-style, to scale up performance as needed for a single Linux workload. The exact limits of this iMesh SMP capability have not been divulged.
For the SQ2 server, Quanta opted for the TilePro64 running at 900 MHz, which is a bit faster than the standard 700 MHz and 866 MHz parts that started shipping in October 2009. Each twin mobo in the Quanta SQ2 server has two of the TilePro64 processors on it, plus eight DDR2 memory slots per processor, as you can see below:
As Bishara points out, there is a lot of air on that board and therefore plenty of opportunity to cram more components on it and still stay in the twin mobo form factor. The SQ2 server will support 4 GB DDR2 memory sticks running at 667 MHz (with ECC), for a maximum capacity of 64 GB per module. The TilePro64 processors act as a single node, meaning there is not symmetric multiprocessing linking their cache or main memories, so each node has 64 cores and up to 32 GB of memory.
The Quanta system board has four Gigabit Ethernet ports, four 10 Gigabit Ethernet ports with SFP+ connectors and two 10/100 Mbit Ethernet ports for IMPI 2.0 remote management, as cloudy infrastructure users like. (If you have clusters with failover for applications built in, as cloud workloads do, you don't need a full-blown service processor for the server.)
The board also has two 10/100 Mbit Ethernet ports for plugging in management consoles. The Tile64Pro chip would allow as many as sixteen Gigabit Ethernet and eight 10 Gigabit Ethernet ports to be put on the board without adding any auxiliary chips to the mobo. (Half of each of these components is dedicated to each Tile64Pro processor.)
Next page: Hot pluggin'
With all those cores and just 32 bits? If that's only a true 32 bit process address space then that's going to lead to some nasty memory addressing models to make use of large amounts of memory on single data sets. I thought by now designers would have realised all the nastiness that that introduces. Surely that's a regressive step.
Finally, a system that will be able to run a single Java application almost as well as a good C or C++ program on an old 386 piece of hardware.