Tilera to stuff 200 cores onto single chip
Plus memory, controllers, mesh network...
Multicore chip upstart Tilera has announced an ambitious product roadmap for its TileGX systems-on-a-chip that will see the company plunk up to 200 cores – plus their memory and peripheral controllers and a mesh network linking the chips – onto a single die within the next few years. The company is also trotting out a new server partner and investor – PC and server maker Quanta – using its current TilePro64 processors in a 2U server that has 512 cores jam-packed into a 2U rack form factor.
Yup, that's a factor of five better than the SM10000 Atom-based server announced last week by upstart server maker SeaMicro, which is putting 512 Atom Z530 cores into a 10U form factor, including switching and storage for the server nodes. The SeaMicro server can run Windows, Linux, or anything else with the appropriate drivers that run atop the x64 architecture.
The 32-bit RISC cores on the Tilera chips have an undisclosed architecture (that everyone suspects is a variant of the MIPS architecture) and are restricted to a home-grown variant of Linux cooked up by Tilera. So SeaMicro has the advantage on compatibility with existing x64 systems and Tilera has the advantage on density and, depending on how this Quanta server is eventually priced, perhaps on bang for the buck and performance per watt.
The service providers that Tilera and Quanta are hoping to sell the Tile-based servers to are not afraid to recompile a Linux software stack on a new architecture if it gives them an edge, any more than they care about using various Unix and Linux systems for specific jobs today. So the lack of compatibility with the x64 architecture should not be an issue for the cloudy niche that Tilera and Quanta are chasing, just like it has not been an issue for the nearly 50 design wins for networking and security appliances that had adopted Tilera SoCs for future products.
Tilera and Quanta are not revealing all that much about the future S2Q servers, but they did put out some details. The server uses so-called "twin" half-width server boards, and according to Ihab Bishara, director of cloud computing applications at Tilera, it's based roughly on an x64 mobo design that has been rejiggered to support the TilePro64 processor. The TilePro64 chip is made by Taiwan Semiconductor Manufacturing Corp using a 90 nanometer process and was announced in September 2008.
It has 64 cores on a single die (in an 8x8 grid) along with 16 KB of L1 cache per core, 5.6 MB of L2/L3 cache per core. The L2 caches are made coherent by the iMesh mesh interconnect and function like an L3 as well as segmented L2 caches for each core. Wrapped around the Tile cores are four DDR2 main memory controllers, two Gigabit Ethernet ports, two PCI Express controllers, two 10 Gb/sec XAUI interfaces, and two flexible I/O interfaces to support peripherals like as compact flash memory or disk drives.
The iMesh network on the chip is actually five separate networks to handle memory access, streaming packet transfers, user data network, cache misses, and interprocess communications. That iMesh also allows for a Linux instance to span multiple cores, SMP-style, to scale up performance as needed for a single Linux workload. The exact limits of this iMesh SMP capability have not been divulged.
For the SQ2 server, Quanta opted for the TilePro64 running at 900 MHz, which is a bit faster than the standard 700 MHz and 866 MHz parts that started shipping in October 2009. Each twin mobo in the Quanta SQ2 server has two of the TilePro64 processors on it, plus eight DDR2 memory slots per processor, as you can see below:
As Bishara points out, there is a lot of air on that board and therefore plenty of opportunity to cram more components on it and still stay in the twin mobo form factor. The SQ2 server will support 4 GB DDR2 memory sticks running at 667 MHz (with ECC), for a maximum capacity of 64 GB per module. The TilePro64 processors act as a single node, meaning there is not symmetric multiprocessing linking their cache or main memories, so each node has 64 cores and up to 32 GB of memory.
The Quanta system board has four Gigabit Ethernet ports, four 10 Gigabit Ethernet ports with SFP+ connectors and two 10/100 Mbit Ethernet ports for IMPI 2.0 remote management, as cloudy infrastructure users like. (If you have clusters with failover for applications built in, as cloud workloads do, you don't need a full-blown service processor for the server.)
The board also has two 10/100 Mbit Ethernet ports for plugging in management consoles. The Tile64Pro chip would allow as many as sixteen Gigabit Ethernet and eight 10 Gigabit Ethernet ports to be put on the board without adding any auxiliary chips to the mobo. (Half of each of these components is dedicated to each Tile64Pro processor.)
The QS2 system boards are hot-pluggable too, which means you can yank one out and replace it without having to power down the entire box - something cloudy infrastructure users want very much. The Quanta SQ2 server design slides four of these modules into a 2U twin-style chassis, for a total of eight nodes. The chassis has two dozen 2.5-inch drive bays in the front of the unit, which can have SAS or SATA disk drives or solid state drives hot-plugged into them.
Here's a very grainy set of images of the Quanta SQ2 box:
Each Tile64Pro chip inside the SQ2 box burns between 35 and 50 watts running typical workloads, and running the open source Memcache-d caching program full-out burns 35 to 40 watts. (Web and data caching are one of the cloudy workloads that Quanta is targeting with this SQ2 machine.) The four-module box burns about 400 watts under heavy load, according to Bishara. (That does not include the draw from disks). With 512 cores in the box, the 2U server has 512 cores with an aggregate of 1.3 trillion integer operations per second of oomph and 176 Gb/sec of aggregate I/O bandwidth.
Bishara says that based on these feeds and speeds, a single Quanta SQ2 server can replace about eight dual-socket Xeon 5600-class servers running cloudy workloads. The Quanta QS2 design allows for 10,752 cores to be crammed into a single 42U rack, and burn less than 8 kilowatts of juice.
The Quanta SQ2 server will be available in limited quantities starting in September and will be generally available during the fourth quarter. Pricing for the box was not dilvulged.
Coming soon: 40,000 core rack
As El Reg has previously reported, supercomputer maker Silicon Graphics has committed to putting a petaflops of computing into a single rack under its Project Mojo effort, and has mumbled a bit about supporting various architectures, including Tilera processors, as part of that effort. It would be very interesting to see SGI pair Tilera processors with GPU co-processors to get there.
And it looks like this could work. Bishara says that the long-awaited third generation of Tile processors, announced last October are making their way through qualification. These Tile-Gx processors are being implemented in a 40 nanometer process at TSMC's wafer bakers and will sport 64-bit cores and floating point math.
The Tile-Gx family will run at between 1 GHz and 1.5 GHz, thanks to the process shrink, and will come in versions with 16, 36, 64, and 100 cores. The Tile-GX16 and Tile-GX36 processors will have 16 and 36 cores, as their names suggest, and will start sampling to customers in the fourth quarter with production volumes one or two quarters later, according to Bishara. The larger Tile-Gx64 and Tile-Gx100 processors will sample in the second quarter of 2011 and will be in production in either the third or fourth quarter if all goes according to plan.
The Tile-Gx100 processor will have four DDR3 memory controllers on its SoC grid (a 10x10 layout) and will be able to address 1 TB of main memory. This will represent a big bump up in performance – about 8X the oomph compared to the Tile64Pro being used by Quanta in the SQ2 server – and a factor of 32X increase in main memory. The Tile-GX100 SoC will have three PCI-Express controllers (with a total of 20 lanes) running down the left side of the chip, plus controllers for other I/O devices. The right side of the chip will have two network I/O controllers, each capable of supporting sixteen Gigabit Ethernet ports, four 10 Gigabit Ethernet ports, or a single 40 Gigabit Ethernet port. Here's the block diagram of the Tile-GX100:
With the Tile-Gx generation of chips, Tilera says that its server partners will be able to cram around 20,000 cores in a rack (with more performance than the current cores), and with a fourth generation of chips code-named "Stratton," due in 2013 using a 28 nanometer process from TSMC, it will be possible to do double that again to around 40,000 cores. Bishara did not say what other architectural changes were coming with the fourth generation, but you can bet it will be something interesting. ®