The 'third era' of app development will be fast, simple, and compact
Will Intel and Nvidia join the HSA party, or insist on going it alone?
HSAILing into the future
The Foundation's first spec, the HSA Programmers Reference Manual version 0.95 – which the HSA Foundation "affectionately" refers to as the HSAIL spec, or HSA Intermediate Language – was published this May, and is available for download on the Foundation's website. Additional specifications are under development for the HSA system architecture, the HSA runtime software, and tools such as a debugger, a profiler, and so forth.
HSAIL is a virtual, explicitly parallel ISA for parallel programs that's finalized by a JIT compiler that the Foundation, understandably, calls a Finalizer. HSAIL is "ISA independent by design," Rogers said, for both CPU and GPU. "There's nothing about HSAIL or HSA that constrains [independent hardware vendors] from their innovation in terms of how they implement the specification, and yet it guarantees compatibility for software."
The Foundation, Rogers said, has also defined what he characterized as a "very comprehensive" relaxed-consistency memory model for HSA. "We made sure during the design that it was compatible with all of the high-level language memory models, some of which were under development at the same time as HSA, so we tracked them in real time," he said, using as examples the C++11, Java and .NET memory models, and saying that HSA is compatible with all of them.
The HSA software model simplifies sending data to the GPU. But although applications can drive work to hardware directly, Rogers said, few application developers will choose to do that. "Many will go through optimized domain libraries and task-queuing libraries that will be optimized directly to the hardware queues," he said.
HSA and the OpenCL open source standard for parallel programming for heterogeneous systems are intended to coexist. "HSA is an optimized platform architecture for OpenCL, it's not an alternative to OpenCL," Rogers said. "It runs OpenCL applications extremely well," and doing so results in "immediate" performance improvements and efficiencies, with wasteful copies eliminated and dispatch latencies reduced.
OpenCL 2.0, by the way, was announced by Khronos at SigGraph last month, and Rogers said that its published specifications and features are in "considerable alignment" with the planned direction of the HSA platform.
When introducing a new platform such as HSA, Rogers said, it's "extremely important" to provide programmers with good libraries to take advantage of that platform. In the case of HSA, those libraries are now available in an OpenCL, C++ AMP template library called Bolt.
"It has the scan, sort, reduce, and transform routines that you'd expect," he said, along with more-advanced routines such as heterogeneous pipelines to make it simple for programmers to run pipelines back and forth from the CPU to the GPU.
HSA for Java is particularly interesting, Rogers said, due to what he called the "predominance of Java in server installations, data centers, and cloud servers." He pointed to the open source Aparapi library, a Java-bytecode-to-OpenCL runtime converter that supports parallel processing on GPUs or thread management on multi-core CPUs.
"We looked at what the roadmap should be for Java," Rogers said when displaying the slide below. "The left column shows the Aparapi stack on OpenCL, and then you can see a progression where we take Aparapi directly to the HSA Finalizer, and then through the [low level virtual machine] optimizer to that Finalizer."
As Java evolves, a new JVM will enable data-parallel code without third-party help (click to enlarge)
While each of those three steps will boost performance, Rogers says that fourth step is where HSA will really shine. "The ultimate goal," he said, "is to put heterogeneous acceleration directly into the Java virtual machine, because Java virtual machine features and core features of the Java language naturally get more adoption than third-party libraries."
That where Project Sumatra comes in, an open source, open-JDK project cosponsored by AMD and Oracle that's targeted for release in Java 9 in 2015, and which is designed to enable developers to write and execute data-parallel algorithms in Java with GPU acceleration, somewhat similar to what Java 8's Lambda feature does for multi-core CPUs. (More information on Sumatra can be found here, here, and here.)
Sponsored: IBM FlashSystem V9000 product guide