More on that monster Cerebras AI chip, Xilinx touts 'world's largest' FPGA, and more

Including another round of 10th-gen Intel Cores

Hot Chips Now that the Hot Chips conference is over until next year, let's bring you up to speed quickly on developments from and around the Silicon Valley event.

The world's biggest chip is a pain in the butt to cool

A real crowd-pleaser this year was Cerebras, an AI hardware startup, which unveiled what is claimed to be the largest ever chip for training artificially intelligence software. Founder and chief hardware architect Sean Lie flashed the bronze-colored iPad-sized processor on stage, and said the thing was “the largest square that could be carved from a single wafer.”

The bonkers 46,000 mm2 silicon die, it is claimed, packs 1.2 trillion TSMC-fabbed 16nm transistors, 400,000 cores laid out in a 2D mesh with a fabric bandwidth of 100-petabits per second, 18GB of on-chip memory, and a memory bandwidth of 9PB/s. That may sound impressive, though don’t get your hopes up too high, we don’t know how it performs in production yet. Nor how much it costs. Nor whether or not there will be volume production for anyone outside the usual gang of tier-one cloud giants.

Lie said that Cerebras isn’t disclosing any performance specs yet. Discussions between hardware geeks estimated that power consumption could suck up anywhere from 14 to 15 kilowatts, and that performance in the range of petaflops wouldn’t be too surprising. The processor needs its own custom housing and liquid cooling: you won't be dropping this into a standard box. The mesh is designed with a grid of holes allowing the liquid cooling to flow through, it appears. Air cooling won't be enough.

cerebras_vs_GPU

A rough size comparison between Cerebras' chip and the largest GPU

Cerebras announced that this chip is now “running customer workloads,” though it isn’t available on the market yet. It’ll be interesting to see who its first customers are. This isn't a general compute part: think of it as an ultra math coprocessor for AI workloads, assuming it works as advertised outside of the labs.

A new non-profit MLCommons will manage MLPerf benchmarking efforts

The folks over at MLPerf, an industry group trying to benchmark the training and inference performance of machine-learning hardware and software platforms, are creating a non-profit called MLCommons.

The goal of MLCommons is to “accelerate ML innovation and increase its positive impact on society,” said Peter Mattson, general chair of MLPerf and a Google engineer. With all the different chips, frameworks and AI models available, it’s a nightmare to figure out what combination is best for specific task, whether that’s building a recommendation or computer vision system.

MLPerf was set up to help the machine learning community figure that out, but the results have been lackluster so far. The competition is dominated by Nvidia and Google, and few other companies bother listing their training and inference performances publicly to avoid looking bad next to the pair of giants' GPUs and TPUs.

Aside from encouraging more companies to submit results, MLCommons will also be compiling large public datasets for anyone to train AI models and also plan on expanding outreach to draw other specialists in.

Intel's Spring Hill, aka NNP-I 1000 inference chip

Intel revealed the details about Spring Crest or the NNP-T, its ASIC geared towards handling heavy workloads in AI training – and Spring Hill, or the NNP-I 1000, used for inference.

We covered Spring Crest, previously known as the NNP-L, on the first day of Hot Chips, so we’ll focus on Spring Hill here. The chip will be fabricated using Intel’s 10nm process node, and comes with 10 to 12 inference compute engine cores (ICE). These inference chips require less memory and precision than processors used in training: Spring Hill's job is to ingest input data and quickly make decisions and predictions, such as where to steer a car, or who to shoot on a battlefield, using previously trained models.

Spring Hill supports INT8 precision, has a maximum performance of 92 TOPS, a thermal dissipation power of up to 50W, and packs 75MB of on-die memory with a bandwidth of 68GB/s. It also boasts the highest performance to power efficiency of 4.8TOPs per Watt compared for any other inference chip.

intel_spring_hill

Intel's Hot Chips slide on Spring Hill ... Click to enlarge

Housed inside each ICE sits another unit known as a deep-learning compute grid. This part is where the deep-learning algorithm instructions are executed to process input data held in the SRAM.

intel_spring_hill_DLE

Click to enlarge

Intel didn’t mention when Spring Hill chips will arrive, though it did say it was already working on designing the next two generations. Ofri Wechsler, an Intel fellow, said more details would be teased in public, and that it hoped to submit results for the next MLPerf inference test, so keep your eyes peeled.

Bonus non-Hot Chips news

In more Intel developments, Chipzilla announced another eight 14nm 10th Gen Intel Core i3, i5, and i7 processors for laptops. Here are the base specifications:

  • Up to six CPU cores
  • Up to 4.9GHz max turbo clock frequency
  • Up to 12MB on-die cache
  • Up to 1.15GHz built-in GPU clock frequency
  • LPDDR4x, LPDDR3, DDR4 memory speed increase to 2666 MT/s

If you want more details about each processor, they’re listed right here.

Also, Xilinx has a beefy FPGA out – the "world's largest," it is claimed, in fact. The Virtex UltraScale+ VU19P is a giant 16nm-node FPGA that contains the “highest logic density and I/O count on a single device ever built.” Here are the brief specs: the VU19P has 8,938,000 logic cells, 3,840 DSP slices, 224Mb of RAM, and 2,072 IO lines, and supports up to eight PCIe 3 x16 or PCIe 4x8 links as well as CCIX. You can find out more over here.

Finally, Huawei touted its Ascend 910 AI processor, the latest addition in its Ascend-Max series.

The chip was announced in 2018, and its specs have since changed a little since then. It supports a precision of FP16 at 256TFLOPS, at the INT8 that doubles to 512TFLOPS, and its power consumption has been reduced to 310W from 350W. The Ascend 910 is geared towards training models rather than performing inference.

Huawei also touted Mindspore, a software framework that works hand in hand with its hardware to optimize and accelerate machine-learning workloads across devices, edge and the cloud. “MindSpore will go open source in the first quarter of 2020. We want to drive broader AI adoption and help developers do what they do best,” said Eric Xu, Huawei’s rotating chairman. ®




Biting the hand that feeds IT © 1998–2019