Feeds

Nvidia Tesla bigwig: Why you REALLY won't need x86 chips soon

Find out where Intel, AMD, ARM stand in GPU giant's roadmap

Internet Security Threat Report 2014

What's the difference between Tesla and Tegra?

TPM: For the HPC community, what will be the practical difference between Tegra and Tesla? What is to keep supercomputer shops from trying to build Tegra supercomputers out of those future Parker chips, or even Logan?

Scott: Tegra will never have a good network interface because it doesn't need one, and Tegra will not have the same amount of memory or bandwidth into the memory subsystem that Tesla has. At some point, you might have stacked memory in Tegra, like we are planning to do with the "Volta" Tesla chips, but it will be much smaller.

Tesla will never have high enough volume to justify the bulk of the engineering work that it takes to do a full solution. But it will have enough volume to justify the incremental engineering work that is necessary to take the consumer parts and make supercomputers.

TPM: How beefy will you get those Denver cores used in the Tesla products? Will it be enough to get rid of the X86 entirely?

Scott: That's the goal.

In terms of what you can do with it, there is really no difference between an ARM ISA and an X86 ISA. The ARM ISA is a little cleaner in terms of RISC, but an X86 processor is really just a RISC processor with this wrapper around it that converts X86 into a RISC ISA. I am very happy that most of the world thinks that ARM provides a great efficiency advantage over X86. The truth is, it really doesn't. It is a small second-order effect. The reality is, you squint and there is no power advantage to ARM.

The advantages that are important is that ARM is open, ARM is much higher volume, many more people are using it, and there is a lot of opportunity and ability to innovate. Historically speaking, that tends to win. It is the classic Innovator's Dilemma. I expect for ARM to do to Intel what Intel did to RISC and mainframes. You can't say for sure that it will happen, but again, volume (which means you can get by with lower margins) and openness and lots of people innovating should win.

TPM: Can you plunk a future InfiniBand port onto the Tesla package as well? Is that desirable? Can you put a ConnectX adapter or a whole switch, or a piece of a distributed switch like Calxeda is doing with its ARM server chips, down there?

Scott: You certainly could.

I think you get a lot of benefit from just doing the NICs. There are some pros and cons to trying to integrate the router as well, and it has to do with the ability to build different strengths of networks. It is easier to build fat or skinny networks this way if the ratio of processor to router silicon is not baked in. You also get pass-through traffic then, and you are using processor pins to route packets that are coming into the processor, and then out again, but are not terminating or sourcing at the processor. You are burning pins on your processor, which is fine if you always know what your configuration is going to be.

I am not going to tell you precisely what we are going to do, but it is a story of integration. We are looking out, later on in this decade, at a world where the third party HPC network ecosystem may go away.

If you look at the current Top 500, you have a bunch of Ethernet systems that are all at the bottom of the list and they have crappy efficiency, even on Linpack. Of the credible HPC systems, you have InfiniBand and custom networks from Cray, SGI, IBM, and a few others. Looking at that landscape going forward, QLogic is off the table and Cray's network is off the table because Intel has that and Cray is getting out of networks. BlueGene is off the table, from everything I understand, and there is no future BlueGene roadmap. The Tofu network on the big K machine from Fujitsu doesn't really have a commercial future. SGI is currently doing a custom network, but I don't know how long they will be able to continue to do that.

So what is left? It is Mellanox. And what does Mellanox do? Basically, it hooks up Xeon servers into clusters. And if Intel goes to a proprietary integrated network fabric, that doesn't mean that Mellanox's days are numbered – I don't want to give people the wrong impression here – but there is certainly a threat.

TPM: That's how I see it, too. And have said as much. Intel is not just buying up Fulcrum Microsystems, QLogic, and the Cray interconnects as defensive maneuvers to keep them out of enemy hands, but because it wants to build something and drive it down into the chips and into the switch ASICs that it clearly wants to sell to hit its 2016 target of $20bn in sales for the Data Center and Connected Systems group.

(I didn't think of it at the time during this interview, but maybe Nvidia should buy Mellanox and get it over with, keeping it in neutral territory?)

Scott: We are looking at a landscape where there may not be good third party networks available to interconnect GPU. We are also looking at selling processors, not just being an accelerator to other processors, so we need a network story.

TPM: Something like the Echelon development project you did for DARPA? That looked an awful lot like the Cray "Aries" interconnect in there.

Scott: The Echelon plan was to have an integrated NIC and a lot of bandwidth coming off the processor supporting a global address space across the machine – native loads and stores everywhere, and all of the synchronization that works between cores on a chip works seamlessly between cores on different chips. So you have got this very tightly integrated network fabric. And yes, our vision is a dragonfly network, same as Aries. The details are different, but there is a dragonfly topology.

Block diagram of Nvidia's Echelon exascale system

Block diagram of Nvidia's Echelon exascale system (click to enlarge)

TPM: I keep coming back to this, but do you have to do this yourself or pray that someone else figures it out?

Scott: That's a great question. At this point, we are just starting to think hard about this. When I was working at Cray, I was working with Bill Dally of Stanford University on Aries, and we are both here at Nvidia. There were two architects of the Aries router, myself and Mike Parker, who is now a senior research scientist at Nvidia. We clearly have the ability.

The question is, what do we do about it? It is not something we have figured out completely. We have ideas, and it is not something we are talking about, but we will make sure that there are good network solutions for future processors and you can imagine that there will be tighter coupling to the processor. We are talking to potential partners, too, since we really don't have any aspirations to be a standalone systems company.

TPM: And yet, Nvidia just announced the Visual Computing Appliance, so I don't necessarily buy that. (laughs)

Scott: Well, that's a specific appliance. . . .

TPM: But seriously. Sometimes you don't have the choice. Look at the choice that Cisco Systems faced when all of the server vendors started snapping up networking companies to expand their TAM and because they saw convergence and network virtualization coming. Cisco could either leverage its networking business to come into servers or lose market share. So there may come a day, because the world is a tough place, when the world may say to Nvidia that it can be an HPC systems company or not be invited to the party.

The trick will be to make whatever you do up there applicable to enterprise computing and hyperscale cloud operators like Facebook and Google. Don't pull an IBM and create BlueGene/Q and then not realize that it could be a killer microserver for running Hadoop jobs with a big price cut and some re-engineering so it fits in standard racks - or, I guess, Open Compute racks. (Laughter)

Scott: I would be delighted if the stuff that we are creating for HPC will be useful for general purpose data centers. We definitely think about such things. And from a networking perspective, a lot of things that are good for HPC are also good for scale-out data centers.

The systems that Google, Amazon, Facebook, and others are fielding are bigger than HPC supercomputers. And as they get rid of disks and start trying to run everything in memory, all of a sudden the network latency is starting to matter in a way that it didn't. They care about global congestion and communication, with jobs like MapReduce, and the amount of bandwidth per node is small compared to HPC.

But if you build a network right, it is sliced anyway. Both networks will have good global congestion control, good separation for different types of jobs, and good global adaptive routing – all with low latency. ®

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
Intel, Cisco and co reveal PLANS to keep tabs on WORLD'S MACHINES
Connecting everything to everything... Er, good idea?
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Oracle hires former SAP exec for cloudy push
'We know Larry said cloud was gibberish, and insane, and idiotic, but...'
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.