ARM64 gets better GPU support in CUDA release
CUDA 6.5 eyes HPC market
NVIDIA has launched the next upgrade to its parallel computing and programming platform, with CUDA 6.5 going live as a production release.
The free download here puts 64-bit ARM platforms on a par with x86, by letting them take advantage of GPU acceleration. As NVIDIA claims in this blog, the combination of low-power ARM64 architectures with ultra-fast GPU compute is “a compelling solution for HPC”.
Fast Fourier Transform performance is improved, NVIDIA says, with cuFFT device callbacks implemented so as to run FFTs in a single memory roundtrip: “cuFFT can transform the input and output data without extra bandwidth usage above what the FFT itself uses”, the company says.
There are tools to provide better Fortran support in its cuda-gdb debugger, nvprof command line profiler, cuda-memcheck and the NVIDIA Visual Profiler.
Host compiler support now includes Microsoft Visual Studio 2013 for Windows; various math libraries have better double precision performance; and there are various new static CUDA libraries to reduce dynamic library dependencies.
Other features include a new occupancy calculator API, so programmers don't have to configure GPU kernel launches for each architecture; and a utility called nvprune that slices out device code that's not needed in the target architecture. ®