The Register® — Biting the hand that feeds IT

Feeds

Nvidia ditches homegrown C/C++ compiler for LLVM

Revs CUDA tools to 4.1

Agentless Backup is Not a Myth

Graphics processor and SoC chip maker Nvidia is hosting its GTC Asia conference in Beijing this week, and with the next-generation Kepler GPUs being pushed out to early next year, there isn't any new chippery to salivate over. But Nvidia has some new compilers and a revved up CUDA development kit to make things interesting just the same.

The big news coming out of GTC is that Nvidia is replacing its own C and C++ compilers for its GPU coprocessors and moving to the open source Low Level Virtual Machine (LLVM) toolchain.

Sumit Gupta, senior product manager of the Tesla line at Nvidia, tells El Reg that coders working with the CUDA CPU and GPU development kit and kicking out C and C++ programs will see about a 10 per cent performance boost compared to the homegrown compilers Nvidia cooked up for its GPUs. "Most users won't even know that the compilers have changed excepting this," says Gupta.

The CUDA development environment takes high-level languages like C, C++ or Fortran and compiles it into an intermediate language called Parallel Thread Execution, or PTX for short, an assembly language that gets turned into binary code for specific Nvidia GPUs so it can be executed on them.

A number of people have been experimenting with making C, C++ and Fortran CUDA programs more amenable to the LLVM toolchain: Helge Rhodin of Saaland University in Germany did a thesis on creating a PTX code generator for LLVM.

There is an interesting project called Ocelot that takes programs compiled using CUDA tools to the PTX layer for Nvidia GPUs and then allows it to be run in emulation mode or through LLVM translation on x86 processors or on either Nvidia Fermi or Advanced Micro Devices Cypress GPUs – and to be run without recompilation. Georgia Tech, with backing from IBM, Intel, Nvidia, the National Science Foundation, and LogicBlox, has done a lot of the work on Ocelot.

Nvidia has not gone that far with its changes to its compilers. Rather, says Gupta, Nvidia has embraced the C and C++ compilers in the LLVM framework – presumably he means the Clang C and C++ compilers, but he didn't say – and has put hooks into it for the CUDA parallel development environment.

Nvidia is not open sourcing the new C and C++ compiler, which is simply branded CUDA C and CUDA C++, but will offer the source code on a free but restricted basis to academic researchers and application development tool vendors.

Nvidia CUDA LLVM

Nvidia adds CUDA to LLVM

While the change in compilers is interesting and useful, the real important part, said Gupta, is that by merging CUDA and LLVM, Nvidia was positioning itself to be able to snap in support for new programming languages atop the combined LLVM-CUDA and new processors underneath it.

It's not a coincidence that Apple has loads of experience using LLVM tools on its ARM-based devices. Nvidia, of course, sells ARM-based Tegra processors for smartphones and tablets and is working on ARM processors for PCs and servers under its Project Denver

The new LLVM compilers will be bundled in the CUDA 4.1 development kit, which is being announced at the GTC event this week. Along with the new compilers, CUDA 4.1 sports over 1,000 new functions for image processing and includes an "expert system" to help programmers parallelize their code.

"Nobody wants to read the manual," says Gupta with a laugh. And so this expert system has a redesigned visual code profiler that shows bottlenecks in the code, offers hints on how to fix them, and automagically finds the right portions of the CUDA manual to help fix the problem. For instance, the code profiler can show coders how to better use the memory hierarchy in CPU-GPU hybrids, which is a tricky bit of programming. ®

Regcast training : Hyper-V 3.0, VM high availability and disaster recovery

No, Gupta gets it. You don't.

The GNU compiler collection is a monolithic POS. The new LLVM paradigm is different, maybe not from a strict computer science point of view, but from a practical point of view it's like night and day.

Clang is a nicely-written C/C++/ObjC compiler, written in a modern language, using modern techniques. There are detailed instructions for adding your own keywords. If you've tried to modify gcc you'll know that gcc is ... not like that.

LLVM is a very cool piece of middleware that takes a universal IL and either interprets, JITs or compiles it, onto a wide variety of platforms. There are directions for retargetting it. Once again, doing the same kind of job in gcc is a lot harder. Even though in principle gcc has the same kind of flexible architecture, in practice it's highly monolithic.

Also, Clang outperforms gcc by 3x (compile-time), and LLVM outperforms gcc by 10-20% (runtime).

The GNU toolchain is on its way out, and for good reason. All hail nVidia for speeding up the process.

(PS: Hand-massaging assembler is neither cost-effective nor maintainable. I can't remember the last time I saw someone tweak the assembler output of a tool - maybe 1995? There is some assembler coding going on still, but pros are mostly optimizing things at the memory heirarchy level because the compilers have been good enough at the instruction level since gcc 4, and saving a cycle to lose it again on a stalled cache read is a non-win. I haven't written or modified any assembler for years, despite working directly in the low-level code optimization space.)

7
4

I think it might be a step forward

but why not FLOSS the lot?

By not making stuff fully open you merely restrict the ability of your friends to help you by denying them the knowledge your competitors will have obtained the first time your device/software was made available.

3
0

>>Also, Clang outperforms gcc by 3x (compile-time), and LLVM outperforms gcc by 10-20% (runtime).

Did you do the bench-marking yourself? Or you are taking the clang's developers pov? As far as I know (from phronix, e.g.) the compilation was not as much faster, as you and the clang people are trying to say. Just several percent ahead sometimes. I also heard that memory footprint of gcc is worse than that of Clang.

However, the optimization of the two compilers are beyond comparison right now. Clang produces much slower binaries. I doubt that Gupta meant 10% of the compilation time performance, it was the binary's to be executed on their GPUs, right?

1
0

More from The Register

SCO vs. IBM battle resumes over ownership of Unix
Zombie lawsuit back and wants to suck the brains out of Linux
Bjarne Again: Hallelujah for C++
Plus: Now officially OK to admit you never used STL algorithms
Interwebs taunt Sir Jony over Apple eye candy makeover
Hey Ive, Ive... add more unicorns, willya?
Apple: iOS7 dayglo Barbie makeover is UNFINISHED - report
Plus: You don't like the icons? Blame marketing
Red Hat to ditch MySQL for MariaDB in RHEL 7
So long, Oracle! Don't let the door hit you on the way out
Shy? Socially inadequate? Fiddling with your phone could help
App 'tells the brutal truth' about social inadequates' chatup lines
Java EE 7 melds HTML5 with enterprise apps
New release arrives with GlassFish, NetBeans support
 breaking news
'Office Facebook' firm Tibbr wants you to PAY for mobe-meetings app
Great idea. Punters won't cough for it though
 breaking news
The only Waze is Google: Ad giant tipped to gobble map app 'for $1.3bn'
Pac-Man-satnav-ish upstart in bidding war with Apple, Facebook
 breaking news
PM Cameron calls for modern, programmable computers! (We think)
IT education musings to G8 chiefs to mystify IT industry