AMD, Oracle tag-team on GPU acceleration for Java apps

OpenJDK meets OpenCL with Project Sumatra

Internet Security Threat Report 2014

OpenWorld 2012 The offloading loading of calculations from CPUs to external accelerators such as GPU coprocessors is not just something that is restricted to supercomputer applications. Anything with lots of calculation that can exploit parallelism is a candidate for acceleration, and that means Java applications, not just Fortran or C++ code.

There are a number of different ways that Java applications and the Java virtual machine can be tweaked to exploit the parallelism inherent in GPU coprocessors, such as those based on FirePro GPUs from Advanced Micro Devices or Tesla GPUs from Nvidia, and potentially parallel x86 Xeon Phi coprocessors from Intel.

And as part of new Project Sumatra, announced today at the JavaOne community event hosted by Oracle in San Francisco, Larry & Company is teaming up with AMD to put the software functionality to offload inside of the Java Virtual Machine itself rather than using a two-step conversion and dispatch process that AMD has worked on until now with its own Project Aparapi.

Gary Frost, the technical lead at AMD for Project Aparapi, explained to El Reg in early 2010 that the company wanted to make it easier for Java applications to take advantage of the enormous calculation capabilities of GPUs without having to become OpenCL programmers themselves.

Coincidentally, the Aparapi project was founded just after Oracle had bought Sun Microsystems and had taken control of the stewardship of the Java programming language. The source code for Aparapi was open sourced in September 2011, and as Frost explained it at the time, offloading code to a GPU using OpenCL was not a natural act at all.

"At the time we were beginning to see Java bindings for OpenCL and CUDA (JOCL, JOpenCL and JCUDA), but most of these provided JNI wrappers around the original OpenCL or CUDA C based APIs and tended to force Java developers to do very un-Java-like things to their code," Frost wrote.

"Furthermore, coding a simple data parallel code fragment using these bindings involved creating a Kernel (in a somewhat alien C99 based syntax; exposing pointers, vector types and scary memory models) and then writing a slew of Java code to initialize the device, create data buffers, compile the OpenCL code, bind arguments to the compiled code, explicitly send buffers to the device, execute the code, and explicitly transfer buffers back again."

Performance boost

You program in Java to get away from all that hardware, so it kind of defeats the purpose. Project Aparapi put hints to where data parallelism exist in the applications, and then took Java bytecodes and converted them at runtime to OpenCL routines so they could automagically be dispatched to an AMD or Nvidia GPU that was speaking OpenCL.

Pacific Northwest National Labs, one of the big US Department of Energy supercomputer facilities, was able to get on the order of a 60X performance boost on certain Java codes when a GPU was present, so the benefits were pretty substantial.

With Project Sumatra, Oracle and AMD want to do away with having an external library and conversion process between Java and OpenCL, Frost tells El Reg. Instead, the idea is to take advantage of the data structures within the OpenJDK implementation of the Java tools and let the Java virtual machine generate and compile the OpenCL code itself based on hints in the code.

This is precisely how CUDA tells compilers when they might be able to exploit parallelism for Tesla GPU coprocessors as do Intel compilers for its Xeon Phi coprocessors when they are compiling Fortran or C++ applications on CPUs.

If not now, when?

Project Sumatra begins the process of having Oracle, AMD, and other interested Java contributors to figure out how this might be accomplished, and at what point in the OpenJDK release schedule. This is not something that is determined by AMD, which is committing programmers and any smarts and code it got from Project Aparapi to the cause.

The company most wants to ensure that its on-chip and discrete GPUs are able to accelerate Java applications, and it is particularly interesting to contemplate using "Llano" and "Trinity" Fusion APU chips being plunked into low-powered Java servers. But ultimately, the whole point is to make this transparent to users.

"The HotSpot compiler will now have the capability to compile code for the GPU," explains Frost. "We don't have to target a particular device because the JVM is making the decision at runtime."

If there isn't a coprocessor present that can accelerate the code, then the JVM knows to throw it at the CPU. The beauty is that you don't have to keep two different sets of code or do bytecode conversions. Well, that's the theory of Project Sumatra. The code is not even started yet, much less done.

The data parallelism hints that will be added to Java, which are being developed under Project Lambda for multicore central processors, are expected to be used to extend parallelism out to GPUs (and maybe Xeon Phis) through OpenJDK.

Java 8 is expected around the middle of next year, according to Frost, and so this GPU offload functionality will probably not make it there. But it could come out with Java 9, or be an update to Java 8 at some point in between. That's really up to the OpenJDK community, which is working on a completely open source and GPL-licensed implementation of Java. ®

Internet Security Threat Report 2014

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
IBM storage revenues sink: 'We are disappointed,' says CEO
Time to put the storage biz up for sale?
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Symantec backs out of Backup Exec: Plans to can appliance in Jan
Will still provide support to existing customers
VMware's tool to harden virtual networks: a spreadsheet
NSX security guide lands in intriguing format
prev story


Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.