The Register® — Biting the hand that feeds IT

Feeds

Nvidia, Continuum team up to sling Python at GPU coprocessors

Teaching snakes to speak CUDA with forked tongue, but not forked code

Cloud storage: Lower cost and increase uptime

GTC 2013 The Tesla GPU coprocessor and its peers inside your Nvidia graphics cards will soon speak with a forked tongue. Continuum Analytics has been working with the GPU-maker to create the NumbaPro Python-to-GPU compiler.

We all call it the LAMP stack, but it should really be called LAMPPP or LAMP3 or some such because it is Linux, Apache, MySQL, Perl, PHP, and Python. And as such, given the popularity of Python, the ability to offload sorting and calculation work from CPUs to GPU coprocessors is a big deal. (If I were going to learn one programming language today, it would be Python because of its utility as both a scripting language and a nuts-and-bolts language for creating real applications. And when I find more time, I will learn it.)

For those of you who don't know the history of the language, back in December 1989, coder Guido van Rossum of the Netherlands was bored over the Christmas holidays, so he hacked together a descendant of the ABC scripting language to run on Unix machines. He called it Python in honour of the much-loved comedy troupe's Monty Python's Flying Circus.

Python has been controlled by various organizations throughout its history, but Van Rossum, fondly known as Benevolent Dictator For Life, or BDFL, was the spiritual and technical leader of the project until he created the Python Software Foundation in 2001. At that time, Van Rossum and his cohort at PythonLabs were finishing up Python 2.0 and were also getting jobs in the commercial software field.

A decade ago, the Python Software Foundation estimated that there were somewhere on the order of 170,000 and 200,000 Python programmers in the world, about half of them in Europe. Sumit Gupta, general manager of the Tesla Accelerated Computing business unit at Nvidia, tells El Reg that the company's best estimates peg global numbers of Python programmers at a whopping 3.5 million.

According to CodeEval.com, code samples show Python to be more popular than Java

According to CodeEval.com, code samples show Python to be more popular than Java

Nvidia asked CodeEval.com, which does programming projects and contests, for some sense of what hackers prefer, and the chart above shows what programming languages were in use across more than 100,000 code samples. As you can see, Python came out ahead of Java, which has nearly three times the programmers (supposedly). The conventional wisdom is that there are around 10 million Java programmers in the world.

Nvidia did not do the Python integration with its CUDA programming environment for its Tesla GPU coprocessors and various video cards. But it helped in a way when it ditched its own C and C++ compilers for its GPUs and moved to the Low Level Virtual Machine (LLVM) toolchain back in December 2011.

The new C and C++ LLVM compilers were added to the CUDA 4.1 development kit, and gave about a 10 per cent performance boost over Nvidia's own compilers. (Which Nvidia has kept under closed source wraps except for some restricted academic licensees.)

One of the purposes of making the LLVM toolchain at the heart of the CUDA environment and tossing out its own Parallel Thread Execution, or PTX, toolchain was to get more languages supporting processing directly on GPUs. The Portland Group (PGI) Fortran compilers, which were originally done with the PTX toolchain when they came out in 2009, have been shuffled to LLVM, and now Continuum has done the work to make its Python stack hook into LLVM and speak proper GPU.

The NumbaPro tool is part of Continuum's Accelerate add-on for its commercial-grade Anaconda Python distribution. The Anaconda tool is completely free and runs on 32-bit and 64-bit Linux and Windows distributions and 64-bit Mac OS operating systems running on Intel-based Apple gear.

The Python 2.6, 2.7, and 3.3 engines are all supported in Anaconda. Accelerate costs $129, and a separate feature called IOPro - which is a fast interface into databases, NoSQL data stores, and Amazon S3 files - costs $79. Accelerate doesn't just work on GPUs, but is also used to make multicore/multithreaded x86 processors do a better job ripping through Python routines.

One of the things that native Python support for GPUs will allow is for companies to throw hardware at their software problems. Vijay Pande, a professor of chemistry at Stanford University, was cited by Nvidia in its announcement of Python support for the CUDA environment that coders in the chem labs prototype applications in Python and then recode them in C or C++ to get a performance speed up.

Now they can just say the hell with it and leave it in Python, which they say is easier to maintain than C or C++. As long as the money you spend on GPUs is less than the money you spend on recoding and the performance is better, this sounds like a win.

Gupta is not making any promises about the next programming language to be supported in CUDA, and frankly, he won't know anyway. "Once we moved to LLVM, it is pretty easy for programming tool makers to go out and do it on their own," he said.

The R stats language is probably next, however, and Nvidia has caught wind of projects at Stanford and the University of Michigan in the United States who are working on exactly this. ®

Steps to Take Before Choosing a Business Continuity Partner

Noise

Unless someone funds some proper market research (and why would they) we will never really know what languages people really use. Assuming of course you can agree what counts as programming. Does using IF(logical_test,value_if_true,value_if_false) in Excel count?

Most of these measures are measures of people jumping up and down squeaking more than anything. Clearly if a, say, Fortran programmer buys one book every ten years, changes jobs every twenty and doesn't blog or post to forums they aren't going show up , whereas a 22 year old using Python on Pails and Javascript in Jails tweeting every commit to the tosspothub version control repository is.

9
0

Python and the NumPy plaform

Python along with NumPy and SciPy have been used for numerical coding applications for years. NumbaPro allows users to make NumPy code even faster. Here are some links for further information:

http://en.wikipedia.org/wiki/NumPy

http://continuum.io/blog/simple-wave-simulation-with-numba-and-pygame

http://continuum.io/blog/the-python-and-the-complied-python

3
0

Re: Noise

It could well be due to that, and you got an upvote from me for the chuckes :)

I know quite a lot of "22 year old using Python on Pails and Javascript in Jails tweeting every commit" people, and boy do they annoy me. I see them more as fad chasers than actual programmers. I remember a few years ago they were banging on about Ruby.

I will admit though, I have been a Python programmer since the early 2000's, and I do love the language. The whitespace thing is odd, but it's not the end of the world for me. Especially as I find development so fast and easy in it.

I personally have been using pyCUDA, which provides pretty good integration, minus the fact that the actual CUDA GPU code must be written in C, so I guess this is the natural progression of the technology.

You can even use the languge for FPGA programming (using myhdl), and when I have some free time I will see how that works.

Fanboyism aside, it does seem like a very flexible and useful language, which also retains easy redability (especially when multiple people work on a codebase).

3
0

More from The Register

SCO vs. IBM battle resumes over ownership of Unix
Zombie lawsuit back and wants to suck the brains out of Linux
 breaking news
What's HP got under wraps? Looks awfully flash and tape shaped
What happens in Vegas won't stay there - we've got the details
Microsoft borks botnet takedown in Citadel snafu
Stupid Redmond kicked over our honeypots, wail white hats
IBM's $1bn layoffs latest: Now axe swings in US, Canada - reports
Union claims 121 storage bods canned after dismal sales
NetApp musters muscular cluster bluster for ONTAP busters
Storage array OS overhauled to juggle more nodes, go down on you, er, less
HP adds 'Haswell' Xeon E3s to entry ProLiant servers
Gussies up MicroServer for SMBs, adds baby switches
Buffalo herds DDR3 RAMs into DriveStation's spinning rust corrals
Claims cache-packed gear keeps up with flash drives
'THINNEST EVER' spinning terabyte beauty slips out of WD fabs
Size-zero drive packs a whopping 143GB per millimetre