Feeds

Ex-Cray supercomputer interconnect guru Scott leaves Nvidia for Google

Working on new systems at the Chocolate Factory

Security for virtualized datacentres

Steve Scott, the system interconnect expert who was the lead designer for the three most current generations of node-lashing routers and server interconnect interfaces for Cray supercomputers, has a new gig at hyperscale data center operator Google. For the past two years, Scott has been the chief technology officer for Nvidia's Tesla GPU coprocessor business.

A spokesperson at Google confirmed that Scott has indeed joined the Chocolate Factory "team", but said that Google is not able to comment further. Contacted by El Reg about what he would be doing in Mountain View, Scott replied thus:

"I'll be working on new Google systems. Great work, but not so interesting to the outside world."

To which your systems desk hack replied:

"Don't kid yourself, man. People love this stuff. What do you mean, 'working on new Google systems'? Can you be even a little more precise?"

And then the Google static cut in and put an end to information exchange.

Scott spent 19 years designing systems and interconnects at three different incarnations of Cray after getting his BS in electrical and computing engineering, his MS in computer science, and his PhD in computer architecture at the University of Wisconsin.

At the time he left Cray, Scott held 27 US patents in the areas of interconnection networks, processor microarchitecture, cache coherence, synchronization mechanisms, and scalable parallel architectures. He was the lead designer on Cray's X1 parallel vector machine, and was one of the key designers for the "SeaStar" interconnect used in the "Red Storm" super created by Cray for Sandia National Laboratory and commercialized in the XT line of machines. He was also one of the key designers for the follow-on "Gemini" and "Aries" interconnects funded by the US Defense Advanced Research Projects Agency and used in the XE and XC series of machines, respectively.

Scott left Cray in early August 2011, and with the hindsight of history, we know why: Cray was going to get out of the interconnect business.

In April 2012, Intel bought the intellectual property for the Gemini and Aries interconnects from Cray for $140m, and brought on board the people who worked on them as well. (Excepting Scott, who had already left the building.) At the moment, the plan is for Intel to fund the development of the components used in a follow-on system, code-named "Shasta", that was set to employ a kicker interconnect called "Pisces", slated for delivery in 2016 or so. (Cray originally thought it could get the Pisces interconnect into the field in 2015).

A spokesperson for Nvidia said that Scott left the GPU chip maker two weeks ago, and was unsure when he started at Google. Nvidia has started a search for a new CTO for the Tesla GPU coprocessor line, and in the meantime has Bill Dally, the guy from Stanford University who literally wrote the book on networking, backfilling alongside his role as chief scientist at Nvidia. Jonah Alben, who is the GPU architect at Nvidia, Ian Buck, who is the general manager of GPU computing software, and Sumit Gupta, who is general manager of the Tesla Accelerated Computing business unit, will all be kicking in to keep the Tesla roadmap on track.

By the way, the other architect of the Aries interconnect, Mike Parker, is a senior research scientist at Nvidia, and Dally had a hand in the Aries design even though he didn't work for Cray at the time and was a professor at Stanford. And El Reg has contended that a market wanting cheap floating point computing systems might push Nvidia into creating its own dragonfly-style interconnect.

So what does Google want with Scott?

In an interview with El Reg published back in April, we chatted with Scott about the future Project Denver ARM processors from Nvidia and how they might be used in future supercomputers or hyperscale data centers with a dragonfly interconnect much like the Aries that Intel controls or the "Echelon" interconnect that was to be part of a US Defense Advanced Research Projects Agency effort to create exascale systems.

Echelon was interesting in that it had a dragonfly point-to-point topology to link all server nodes directly to all other server nodes, but it also added a global address space that synchronized data as it moved on cores inside one processor or across multiple processors in the system.

When asked about the applicability of future HPC systems to general purpose systems, this is what Scott had to say, and it is telling. In fact, it lays out the case for why Scott ended up at Google:

I would be delighted if the stuff that we are creating for HPC will be useful for general purpose data centers. We definitely think about such things. And from a networking perspective, a lot of things that are good for HPC are also good for scale-out data centers.

The systems that Google, Amazon, Facebook, and others are fielding are bigger than HPC supercomputers. And as they get rid of disks and start trying to run everything in memory, all of a sudden the network latency is starting to matter in a way that it didn't.

They care about global congestion and communication, with jobs like MapReduce, and the amount of bandwidth per node is small compared to HPC. But if you build a network right, it is sliced anyway. Both networks will have good global congestion control, good separation for different types of jobs, and good global adaptive routing – all with low latency.

Google already builds its own servers and plain vanilla Ethernet switches. Maybe Mountain View has its eye on a homegrown interconnect that, in the long run, might end up being commercialized by vendors such as Nvidia as a competitive alternative to the Aries and Pisces interconnects from Intel.

That's a big maybe, of course, but Google is not averse to investing in its own hardware designs when it ends up with better data centers. And it has a guaranteed customer – and possibly more – if it does it right.

It is a very intriguing thought. ®

Providing a secure and efficient Helpdesk

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
IBM storage revenues sink: 'We are disappointed,' says CEO
Time to put the storage biz up for sale?
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Symantec backs out of Backup Exec: Plans to can appliance in Jan
Will still provide support to existing customers
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.