The Register® — Biting the hand that feeds IT

Feeds

Ex-Cray supercomputer interconnect guru Scott leaves Nvidia for Google

Working on new systems at the Chocolate Factory

Free ESG report : Seamless data management with Avere FXT

Steve Scott, the system interconnect expert who was the lead designer for the three most current generations of node-lashing routers and server interconnect interfaces for Cray supercomputers, has a new gig at hyperscale data center operator Google. For the past two years, Scott has been the chief technology officer for Nvidia's Tesla GPU coprocessor business.

A spokesperson at Google confirmed that Scott has indeed joined the Chocolate Factory "team", but said that Google is not able to comment further. Contacted by El Reg about what he would be doing in Mountain View, Scott replied thus:

"I'll be working on new Google systems. Great work, but not so interesting to the outside world."

To which your systems desk hack replied:

"Don't kid yourself, man. People love this stuff. What do you mean, 'working on new Google systems'? Can you be even a little more precise?"

And then the Google static cut in and put an end to information exchange.

Scott spent 19 years designing systems and interconnects at three different incarnations of Cray after getting his BS in electrical and computing engineering, his MS in computer science, and his PhD in computer architecture at the University of Wisconsin.

At the time he left Cray, Scott held 27 US patents in the areas of interconnection networks, processor microarchitecture, cache coherence, synchronization mechanisms, and scalable parallel architectures. He was the lead designer on Cray's X1 parallel vector machine, and was one of the key designers for the "SeaStar" interconnect used in the "Red Storm" super created by Cray for Sandia National Laboratory and commercialized in the XT line of machines. He was also one of the key designers for the follow-on "Gemini" and "Aries" interconnects funded by the US Defense Advanced Research Projects Agency and used in the XE and XC series of machines, respectively.

Scott left Cray in early August 2011, and with the hindsight of history, we know why: Cray was going to get out of the interconnect business.

In April 2012, Intel bought the intellectual property for the Gemini and Aries interconnects from Cray for $140m, and brought on board the people who worked on them as well. (Excepting Scott, who had already left the building.) At the moment, the plan is for Intel to fund the development of the components used in a follow-on system, code-named "Shasta", that was set to employ a kicker interconnect called "Pisces", slated for delivery in 2016 or so. (Cray originally thought it could get the Pisces interconnect into the field in 2015).

A spokesperson for Nvidia said that Scott left the GPU chip maker two weeks ago, and was unsure when he started at Google. Nvidia has started a search for a new CTO for the Tesla GPU coprocessor line, and in the meantime has Bill Dally, the guy from Stanford University who literally wrote the book on networking, backfilling alongside his role as chief scientist at Nvidia. Jonah Alben, who is the GPU architect at Nvidia, Ian Buck, who is the general manager of GPU computing software, and Sumit Gupta, who is general manager of the Tesla Accelerated Computing business unit, will all be kicking in to keep the Tesla roadmap on track.

By the way, the other architect of the Aries interconnect, Mike Parker, is a senior research scientist at Nvidia, and Dally had a hand in the Aries design even though he didn't work for Cray at the time and was a professor at Stanford. And El Reg has contended that a market wanting cheap floating point computing systems might push Nvidia into creating its own dragonfly-style interconnect.

So what does Google want with Scott?

In an interview with El Reg published back in April, we chatted with Scott about the future Project Denver ARM processors from Nvidia and how they might be used in future supercomputers or hyperscale data centers with a dragonfly interconnect much like the Aries that Intel controls or the "Echelon" interconnect that was to be part of a US Defense Advanced Research Projects Agency effort to create exascale systems.

Echelon was interesting in that it had a dragonfly point-to-point topology to link all server nodes directly to all other server nodes, but it also added a global address space that synchronized data as it moved on cores inside one processor or across multiple processors in the system.

When asked about the applicability of future HPC systems to general purpose systems, this is what Scott had to say, and it is telling. In fact, it lays out the case for why Scott ended up at Google:

I would be delighted if the stuff that we are creating for HPC will be useful for general purpose data centers. We definitely think about such things. And from a networking perspective, a lot of things that are good for HPC are also good for scale-out data centers.

The systems that Google, Amazon, Facebook, and others are fielding are bigger than HPC supercomputers. And as they get rid of disks and start trying to run everything in memory, all of a sudden the network latency is starting to matter in a way that it didn't.

They care about global congestion and communication, with jobs like MapReduce, and the amount of bandwidth per node is small compared to HPC. But if you build a network right, it is sliced anyway. Both networks will have good global congestion control, good separation for different types of jobs, and good global adaptive routing – all with low latency.

Google already builds its own servers and plain vanilla Ethernet switches. Maybe Mountain View has its eye on a homegrown interconnect that, in the long run, might end up being commercialized by vendors such as Nvidia as a competitive alternative to the Aries and Pisces interconnects from Intel.

That's a big maybe, of course, but Google is not averse to investing in its own hardware designs when it ends up with better data centers. And it has a guaranteed customer – and possibly more – if it does it right.

It is a very intriguing thought. ®

5 ways to reduce advertising network latency

Whitepapers

5 ways to reduce advertising network latency
Implementing the tactics laid out in this whitepaper can help reduce your overall advertising network latency.
Supercharge your infrastructure
Fusion­‐io has developed a shared storage solution that provides new performance management capabilities required to maximize flash utilization.
Avere FXT with FlashMove and FlashMirror
This ESG Lab validation report documents hands-on testing of the Avere FXT Series Edge Filer with the AOS 3.0 operating environment.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Email delivery: 4 steps to get more email to the inbox
This whitepaper lists some steps and information that will give you the best opportunity to achieve an amazing sender reputation.

More from The Register

next story
Dedupe-dedupe, dedupe-dedupe-dedupe: Flashy clients crowd around Permabit diamond
3 of the top six flash vendors are casing the OEM dedupe tech, claims analyst
Disk-pushers, get reel: Even GOOGLE relies on tape
Prepare to be beaten by your old, cheap rival
Dragons' Den star's biz Outsourcery sends yet more millions up in smoke
Telly moneybags went into the cloud and still nobody's making any profit
Hong Kong's data centres stay high and dry amid Typhoon Usagi
180 km/h winds kill 25 in China, but the data centres keep humming
Microsoft lures punters to hybrid storage cloud with free storage arrays
Spend on Azure, get StorSimple box at the low, low price of $0
WD unveils new MyBook line: External drives now bigger... and CHEAP
Less than £0.04/GB, but it loses the Thunderbolt speed
VMware vSAN test pilots: Don't panic but there's a chance of DATA LOSS
AHCI SATA controller won't play nice with Virtzilla's robo-storage beta
prev story