AMD, Oracle tag-team on GPU acceleration for Java apps
OpenJDK meets OpenCL with Project Sumatra
OpenWorld 2012 The offloading loading of calculations from CPUs to external accelerators such as GPU coprocessors is not just something that is restricted to supercomputer applications. Anything with lots of calculation that can exploit parallelism is a candidate for acceleration, and that means Java applications, not just Fortran or C++ code.
There are a number of different ways that Java applications and the Java virtual machine can be tweaked to exploit the parallelism inherent in GPU coprocessors, such as those based on FirePro GPUs from Advanced Micro Devices or Tesla GPUs from Nvidia, and potentially parallel x86 Xeon Phi coprocessors from Intel.
And as part of new Project Sumatra, announced today at the JavaOne community event hosted by Oracle in San Francisco, Larry & Company is teaming up with AMD to put the software functionality to offload inside of the Java Virtual Machine itself rather than using a two-step conversion and dispatch process that AMD has worked on until now with its own Project Aparapi.
Gary Frost, the technical lead at AMD for Project Aparapi, explained to El Reg in early 2010 that the company wanted to make it easier for Java applications to take advantage of the enormous calculation capabilities of GPUs without having to become OpenCL programmers themselves.
Coincidentally, the Aparapi project was founded just after Oracle had bought Sun Microsystems and had taken control of the stewardship of the Java programming language. The source code for Aparapi was open sourced in September 2011, and as Frost explained it at the time, offloading code to a GPU using OpenCL was not a natural act at all.
"At the time we were beginning to see Java bindings for OpenCL and CUDA (JOCL, JOpenCL and JCUDA), but most of these provided JNI wrappers around the original OpenCL or CUDA C based APIs and tended to force Java developers to do very un-Java-like things to their code," Frost wrote.
"Furthermore, coding a simple data parallel code fragment using these bindings involved creating a Kernel (in a somewhat alien C99 based syntax; exposing pointers, vector types and scary memory models) and then writing a slew of Java code to initialize the device, create data buffers, compile the OpenCL code, bind arguments to the compiled code, explicitly send buffers to the device, execute the code, and explicitly transfer buffers back again."
You program in Java to get away from all that hardware, so it kind of defeats the purpose. Project Aparapi put hints to where data parallelism exist in the applications, and then took Java bytecodes and converted them at runtime to OpenCL routines so they could automagically be dispatched to an AMD or Nvidia GPU that was speaking OpenCL.
Pacific Northwest National Labs, one of the big US Department of Energy supercomputer facilities, was able to get on the order of a 60X performance boost on certain Java codes when a GPU was present, so the benefits were pretty substantial.
With Project Sumatra, Oracle and AMD want to do away with having an external library and conversion process between Java and OpenCL, Frost tells El Reg. Instead, the idea is to take advantage of the data structures within the OpenJDK implementation of the Java tools and let the Java virtual machine generate and compile the OpenCL code itself based on hints in the code.
This is precisely how CUDA tells compilers when they might be able to exploit parallelism for Tesla GPU coprocessors as do Intel compilers for its Xeon Phi coprocessors when they are compiling Fortran or C++ applications on CPUs.
If not now, when?
Project Sumatra begins the process of having Oracle, AMD, and other interested Java contributors to figure out how this might be accomplished, and at what point in the OpenJDK release schedule. This is not something that is determined by AMD, which is committing programmers and any smarts and code it got from Project Aparapi to the cause.
The company most wants to ensure that its on-chip and discrete GPUs are able to accelerate Java applications, and it is particularly interesting to contemplate using "Llano" and "Trinity" Fusion APU chips being plunked into low-powered Java servers. But ultimately, the whole point is to make this transparent to users.
"The HotSpot compiler will now have the capability to compile code for the GPU," explains Frost. "We don't have to target a particular device because the JVM is making the decision at runtime."
If there isn't a coprocessor present that can accelerate the code, then the JVM knows to throw it at the CPU. The beauty is that you don't have to keep two different sets of code or do bytecode conversions. Well, that's the theory of Project Sumatra. The code is not even started yet, much less done.
The data parallelism hints that will be added to Java, which are being developed under Project Lambda for multicore central processors, are expected to be used to extend parallelism out to GPUs (and maybe Xeon Phis) through OpenJDK.
Java 8 is expected around the middle of next year, according to Frost, and so this GPU offload functionality will probably not make it there. But it could come out with Java 9, or be an update to Java 8 at some point in between. That's really up to the OpenJDK community, which is working on a completely open source and GPL-licensed implementation of Java. ®