AMD, Oracle tag-team on GPU acceleration for Java apps

OpenJDK meets OpenCL with Project Sumatra

Boost IT visibility and business value

OpenWorld 2012 The offloading loading of calculations from CPUs to external accelerators such as GPU coprocessors is not just something that is restricted to supercomputer applications. Anything with lots of calculation that can exploit parallelism is a candidate for acceleration, and that means Java applications, not just Fortran or C++ code.

There are a number of different ways that Java applications and the Java virtual machine can be tweaked to exploit the parallelism inherent in GPU coprocessors, such as those based on FirePro GPUs from Advanced Micro Devices or Tesla GPUs from Nvidia, and potentially parallel x86 Xeon Phi coprocessors from Intel.

And as part of new Project Sumatra, announced today at the JavaOne community event hosted by Oracle in San Francisco, Larry & Company is teaming up with AMD to put the software functionality to offload inside of the Java Virtual Machine itself rather than using a two-step conversion and dispatch process that AMD has worked on until now with its own Project Aparapi.

Gary Frost, the technical lead at AMD for Project Aparapi, explained to El Reg in early 2010 that the company wanted to make it easier for Java applications to take advantage of the enormous calculation capabilities of GPUs without having to become OpenCL programmers themselves.

Coincidentally, the Aparapi project was founded just after Oracle had bought Sun Microsystems and had taken control of the stewardship of the Java programming language. The source code for Aparapi was open sourced in September 2011, and as Frost explained it at the time, offloading code to a GPU using OpenCL was not a natural act at all.

"At the time we were beginning to see Java bindings for OpenCL and CUDA (JOCL, JOpenCL and JCUDA), but most of these provided JNI wrappers around the original OpenCL or CUDA C based APIs and tended to force Java developers to do very un-Java-like things to their code," Frost wrote.

"Furthermore, coding a simple data parallel code fragment using these bindings involved creating a Kernel (in a somewhat alien C99 based syntax; exposing pointers, vector types and scary memory models) and then writing a slew of Java code to initialize the device, create data buffers, compile the OpenCL code, bind arguments to the compiled code, explicitly send buffers to the device, execute the code, and explicitly transfer buffers back again."

Performance boost

You program in Java to get away from all that hardware, so it kind of defeats the purpose. Project Aparapi put hints to where data parallelism exist in the applications, and then took Java bytecodes and converted them at runtime to OpenCL routines so they could automagically be dispatched to an AMD or Nvidia GPU that was speaking OpenCL.

Pacific Northwest National Labs, one of the big US Department of Energy supercomputer facilities, was able to get on the order of a 60X performance boost on certain Java codes when a GPU was present, so the benefits were pretty substantial.

With Project Sumatra, Oracle and AMD want to do away with having an external library and conversion process between Java and OpenCL, Frost tells El Reg. Instead, the idea is to take advantage of the data structures within the OpenJDK implementation of the Java tools and let the Java virtual machine generate and compile the OpenCL code itself based on hints in the code.

This is precisely how CUDA tells compilers when they might be able to exploit parallelism for Tesla GPU coprocessors as do Intel compilers for its Xeon Phi coprocessors when they are compiling Fortran or C++ applications on CPUs.

If not now, when?

Project Sumatra begins the process of having Oracle, AMD, and other interested Java contributors to figure out how this might be accomplished, and at what point in the OpenJDK release schedule. This is not something that is determined by AMD, which is committing programmers and any smarts and code it got from Project Aparapi to the cause.

The company most wants to ensure that its on-chip and discrete GPUs are able to accelerate Java applications, and it is particularly interesting to contemplate using "Llano" and "Trinity" Fusion APU chips being plunked into low-powered Java servers. But ultimately, the whole point is to make this transparent to users.

"The HotSpot compiler will now have the capability to compile code for the GPU," explains Frost. "We don't have to target a particular device because the JVM is making the decision at runtime."

If there isn't a coprocessor present that can accelerate the code, then the JVM knows to throw it at the CPU. The beauty is that you don't have to keep two different sets of code or do bytecode conversions. Well, that's the theory of Project Sumatra. The code is not even started yet, much less done.

The data parallelism hints that will be added to Java, which are being developed under Project Lambda for multicore central processors, are expected to be used to extend parallelism out to GPUs (and maybe Xeon Phis) through OpenJDK.

Java 8 is expected around the middle of next year, according to Frost, and so this GPU offload functionality will probably not make it there. But it could come out with Java 9, or be an update to Java 8 at some point in between. That's really up to the OpenJDK community, which is working on a completely open source and GPL-licensed implementation of Java. ®

The essential guide to IT transformation

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Flash could be CHEAPER than SAS DISK? Come off it, NetApp
Stats analysis reckons we'll hit that point in just three years
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
Object storage bods Exablox: RAID is dead, baby. RAID is dead
Bring your own disks to its object appliances
Nimble's latest mutants GORGE themselves on unlucky forerunners
Crossing Sandy Bridges without stopping for breath
prev story


5 things you didn’t know about cloud backup
IT departments are embracing cloud backup, but there’s a lot you need to know before choosing a service provider. Learn all the critical things you need to know.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.