Feeds

Ex-Cray supercomputer interconnect guru Scott leaves Nvidia for Google

Working on new systems at the Chocolate Factory

Next gen security for virtualised datacentres

Steve Scott, the system interconnect expert who was the lead designer for the three most current generations of node-lashing routers and server interconnect interfaces for Cray supercomputers, has a new gig at hyperscale data center operator Google. For the past two years, Scott has been the chief technology officer for Nvidia's Tesla GPU coprocessor business.

A spokesperson at Google confirmed that Scott has indeed joined the Chocolate Factory "team", but said that Google is not able to comment further. Contacted by El Reg about what he would be doing in Mountain View, Scott replied thus:

"I'll be working on new Google systems. Great work, but not so interesting to the outside world."

To which your systems desk hack replied:

"Don't kid yourself, man. People love this stuff. What do you mean, 'working on new Google systems'? Can you be even a little more precise?"

And then the Google static cut in and put an end to information exchange.

Scott spent 19 years designing systems and interconnects at three different incarnations of Cray after getting his BS in electrical and computing engineering, his MS in computer science, and his PhD in computer architecture at the University of Wisconsin.

At the time he left Cray, Scott held 27 US patents in the areas of interconnection networks, processor microarchitecture, cache coherence, synchronization mechanisms, and scalable parallel architectures. He was the lead designer on Cray's X1 parallel vector machine, and was one of the key designers for the "SeaStar" interconnect used in the "Red Storm" super created by Cray for Sandia National Laboratory and commercialized in the XT line of machines. He was also one of the key designers for the follow-on "Gemini" and "Aries" interconnects funded by the US Defense Advanced Research Projects Agency and used in the XE and XC series of machines, respectively.

Scott left Cray in early August 2011, and with the hindsight of history, we know why: Cray was going to get out of the interconnect business.

In April 2012, Intel bought the intellectual property for the Gemini and Aries interconnects from Cray for $140m, and brought on board the people who worked on them as well. (Excepting Scott, who had already left the building.) At the moment, the plan is for Intel to fund the development of the components used in a follow-on system, code-named "Shasta", that was set to employ a kicker interconnect called "Pisces", slated for delivery in 2016 or so. (Cray originally thought it could get the Pisces interconnect into the field in 2015).

A spokesperson for Nvidia said that Scott left the GPU chip maker two weeks ago, and was unsure when he started at Google. Nvidia has started a search for a new CTO for the Tesla GPU coprocessor line, and in the meantime has Bill Dally, the guy from Stanford University who literally wrote the book on networking, backfilling alongside his role as chief scientist at Nvidia. Jonah Alben, who is the GPU architect at Nvidia, Ian Buck, who is the general manager of GPU computing software, and Sumit Gupta, who is general manager of the Tesla Accelerated Computing business unit, will all be kicking in to keep the Tesla roadmap on track.

By the way, the other architect of the Aries interconnect, Mike Parker, is a senior research scientist at Nvidia, and Dally had a hand in the Aries design even though he didn't work for Cray at the time and was a professor at Stanford. And El Reg has contended that a market wanting cheap floating point computing systems might push Nvidia into creating its own dragonfly-style interconnect.

So what does Google want with Scott?

In an interview with El Reg published back in April, we chatted with Scott about the future Project Denver ARM processors from Nvidia and how they might be used in future supercomputers or hyperscale data centers with a dragonfly interconnect much like the Aries that Intel controls or the "Echelon" interconnect that was to be part of a US Defense Advanced Research Projects Agency effort to create exascale systems.

Echelon was interesting in that it had a dragonfly point-to-point topology to link all server nodes directly to all other server nodes, but it also added a global address space that synchronized data as it moved on cores inside one processor or across multiple processors in the system.

When asked about the applicability of future HPC systems to general purpose systems, this is what Scott had to say, and it is telling. In fact, it lays out the case for why Scott ended up at Google:

I would be delighted if the stuff that we are creating for HPC will be useful for general purpose data centers. We definitely think about such things. And from a networking perspective, a lot of things that are good for HPC are also good for scale-out data centers.

The systems that Google, Amazon, Facebook, and others are fielding are bigger than HPC supercomputers. And as they get rid of disks and start trying to run everything in memory, all of a sudden the network latency is starting to matter in a way that it didn't.

They care about global congestion and communication, with jobs like MapReduce, and the amount of bandwidth per node is small compared to HPC. But if you build a network right, it is sliced anyway. Both networks will have good global congestion control, good separation for different types of jobs, and good global adaptive routing – all with low latency.

Google already builds its own servers and plain vanilla Ethernet switches. Maybe Mountain View has its eye on a homegrown interconnect that, in the long run, might end up being commercialized by vendors such as Nvidia as a competitive alternative to the Aries and Pisces interconnects from Intel.

That's a big maybe, of course, but Google is not averse to investing in its own hardware designs when it ends up with better data centers. And it has a guaranteed customer – and possibly more – if it does it right.

It is a very intriguing thought. ®

The essential guide to IT transformation

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
Death by 1,000 cuts: Mainstream storage array suppliers are bleeding
Cloud, all-flash kit, object storage slicing away at titans of storage
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
VMware vaporises vCHS hybrid cloud service
AnD yEt mOre cRazy cAps to dEal wIth
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
7 Elements of Radically Simple OS Migration
Avoid the typical headaches of OS migration during your next project by learning about 7 elements of radically simple OS migration.
BYOD's dark side: Data protection
An endpoint data protection solution that adds value to the user and the organization so it can protect itself from data loss as well as leverage corporate data.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?