AMD, Oracle tag-team on GPU acceleration for Java apps

OpenJDK meets OpenCL with Project Sumatra

Choosing a cloud hosting partner with confidence

OpenWorld 2012 The offloading loading of calculations from CPUs to external accelerators such as GPU coprocessors is not just something that is restricted to supercomputer applications. Anything with lots of calculation that can exploit parallelism is a candidate for acceleration, and that means Java applications, not just Fortran or C++ code.

There are a number of different ways that Java applications and the Java virtual machine can be tweaked to exploit the parallelism inherent in GPU coprocessors, such as those based on FirePro GPUs from Advanced Micro Devices or Tesla GPUs from Nvidia, and potentially parallel x86 Xeon Phi coprocessors from Intel.

And as part of new Project Sumatra, announced today at the JavaOne community event hosted by Oracle in San Francisco, Larry & Company is teaming up with AMD to put the software functionality to offload inside of the Java Virtual Machine itself rather than using a two-step conversion and dispatch process that AMD has worked on until now with its own Project Aparapi.

Gary Frost, the technical lead at AMD for Project Aparapi, explained to El Reg in early 2010 that the company wanted to make it easier for Java applications to take advantage of the enormous calculation capabilities of GPUs without having to become OpenCL programmers themselves.

Coincidentally, the Aparapi project was founded just after Oracle had bought Sun Microsystems and had taken control of the stewardship of the Java programming language. The source code for Aparapi was open sourced in September 2011, and as Frost explained it at the time, offloading code to a GPU using OpenCL was not a natural act at all.

"At the time we were beginning to see Java bindings for OpenCL and CUDA (JOCL, JOpenCL and JCUDA), but most of these provided JNI wrappers around the original OpenCL or CUDA C based APIs and tended to force Java developers to do very un-Java-like things to their code," Frost wrote.

"Furthermore, coding a simple data parallel code fragment using these bindings involved creating a Kernel (in a somewhat alien C99 based syntax; exposing pointers, vector types and scary memory models) and then writing a slew of Java code to initialize the device, create data buffers, compile the OpenCL code, bind arguments to the compiled code, explicitly send buffers to the device, execute the code, and explicitly transfer buffers back again."

Performance boost

You program in Java to get away from all that hardware, so it kind of defeats the purpose. Project Aparapi put hints to where data parallelism exist in the applications, and then took Java bytecodes and converted them at runtime to OpenCL routines so they could automagically be dispatched to an AMD or Nvidia GPU that was speaking OpenCL.

Pacific Northwest National Labs, one of the big US Department of Energy supercomputer facilities, was able to get on the order of a 60X performance boost on certain Java codes when a GPU was present, so the benefits were pretty substantial.

With Project Sumatra, Oracle and AMD want to do away with having an external library and conversion process between Java and OpenCL, Frost tells El Reg. Instead, the idea is to take advantage of the data structures within the OpenJDK implementation of the Java tools and let the Java virtual machine generate and compile the OpenCL code itself based on hints in the code.

This is precisely how CUDA tells compilers when they might be able to exploit parallelism for Tesla GPU coprocessors as do Intel compilers for its Xeon Phi coprocessors when they are compiling Fortran or C++ applications on CPUs.

If not now, when?

Project Sumatra begins the process of having Oracle, AMD, and other interested Java contributors to figure out how this might be accomplished, and at what point in the OpenJDK release schedule. This is not something that is determined by AMD, which is committing programmers and any smarts and code it got from Project Aparapi to the cause.

The company most wants to ensure that its on-chip and discrete GPUs are able to accelerate Java applications, and it is particularly interesting to contemplate using "Llano" and "Trinity" Fusion APU chips being plunked into low-powered Java servers. But ultimately, the whole point is to make this transparent to users.

"The HotSpot compiler will now have the capability to compile code for the GPU," explains Frost. "We don't have to target a particular device because the JVM is making the decision at runtime."

If there isn't a coprocessor present that can accelerate the code, then the JVM knows to throw it at the CPU. The beauty is that you don't have to keep two different sets of code or do bytecode conversions. Well, that's the theory of Project Sumatra. The code is not even started yet, much less done.

The data parallelism hints that will be added to Java, which are being developed under Project Lambda for multicore central processors, are expected to be used to extend parallelism out to GPUs (and maybe Xeon Phis) through OpenJDK.

Java 8 is expected around the middle of next year, according to Frost, and so this GPU offload functionality will probably not make it there. But it could come out with Java 9, or be an update to Java 8 at some point in between. That's really up to the OpenJDK community, which is working on a completely open source and GPL-licensed implementation of Java. ®

Remote control for virtualized desktops

More from The Register

next story
Fat fingered geo-block kept Aussies in the dark
NASA launches new climate model at SC14
75 days of supercomputing later ...
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
Cloud unicorns are extinct so DiData cloud mess was YOUR fault
Applications need to be built to handle TITSUP incidents
NSA SOURCE CODE LEAK: Information slurp tools to appear online
Now you can run your own intelligence agency
BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
Don't worry about that cable, it's part of the config
Stop the IoT revolution! We need to figure out packet sizes first
Researchers test 802.15.4 and find we know nuh-think! about large scale sensor network ops
Trio of XSS turns attackers into admins
SanDisk vows: We'll have a 16TB SSD WHOPPER by 2016
Flash WORM has a serious use for archived photos and videos
prev story


Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
Designing and building an open ITOA architecture
Learn about a new IT data taxonomy defined by the four data sources of IT visibility: wire, machine, agent, and synthetic data sets.
How to determine if cloud backup is right for your servers
Two key factors, technical feasibility and TCO economics, that backup and IT operations managers should consider when assessing cloud backup.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.