Feeds

Revolution Analytics paints R stats Azure blue

Gooses performance, spans HPC clusters with 6.0 update

Boost IT visibility and business value

Revolution Analytics, aka "Red Hat for stats" – which commercialized the open source R programming language and statistical analysis tool – has now tweaked its R Enterprise stack and pushed out a 6.0 release.

The new R Enterprise 6.0 is based on the R 2.14.2 engine, which is the latest stable release of the open source code, according to David Smith, head of marketing at the company. This code was released on February 29, with the 2.15.0 update just coming out on May 30 and not quite ready for inclusion in R Enterprise. You can see the full release notes for R 2.14.2 here.

The big new feature this edition of the R engine gives to users is a byte compiler that has been added to the engine. Similar to Java byte codes for a Java virtual machine, the byte compiler compiles the interpreted R code down to an intermediate stage before it executes in the R engine, which can speed up the operations by the R interpreter by around 30 per cent, according to Smith. This byte compiler for the R interpreter has no effect whatsoever on any number-crunching that the R engine needs to do, since this is not done by the interpreter but by another part of the engine. So obviously the performance improvements you will see from the new R engine will depend on the nature of the statistical algorithms you are running.

The update also has support for Generalized Linear Models, or GLMs, in stat speak. These include Logistic (Binomial) Poisson, Gamma, and Tweedie models, which are all supported with a high-performance C++ implementation, according to Smith.

A new feature of the R Enterprise 6.0 is integration with IBM's new Platform LSF V8.3 scheduler for HPC grids, which allows for R routines to be parallelized and run on a cluster of x86 iron. Put the two together – GLMs and HPC clusters – and you can get a significant speedup in many cases, according to Smith.

In the case of an insurance company that was doing "tweedie distribution" analysis against 30 million claims using the SAS stats package on a big SMP server, the job took eight hours to run. During beta tests, this customer fired up R Enterprise on an eight-node x86 cluster, used Platform LSF to dispatch work to the nodes from a workstation running R Enterprise and with the nodes running R Enterprise as well, and the job finished in 10 minutes. While the shortening of the time to complete the job is important, what is perhaps more important is the ability to iterate models quickly and improve them because the job runs so much faster. You need to be running Red Hat Enterprise Linux and R Enterprise on server nodes if you want Platform LSF to dispatch work in parallel to them.

The 6.0 release also includes the ability to read SAS and SPSS native file formats directly as well as sucking in raw ASCII text data and information sucked out of relational databases using ODBC to have it analyzed. In the past, R Enterprise had to convert this data to its own XDF NoSQL-like data store, and on data that is constantly changing, this reformatting is a pain in the neck. Now, you can just use the native data sets and, perhaps more importantly, not have to worry about having a license for SAS or SPSS if you have moved off those platforms to open-core R Enterprise tools.

Revolution Analytics already supported the running of its R stack inside of Amazon's EC2 compute clouds, and Smith tells El Reg that the company has "quite a few" customers that run R Enterprise in the cloud, and that all of the proof of concepts that Revolution Analytics does with customers run on EC2 as well.

But not everyone uses EC2. Some people use Microsoft's Azure cloud, and starting with R Enterprise 6.0, you can now fire up R instances on the Azure cloud. At the moment, Revolution Analytics is only supporting the bursting features of Azure, which allows you to dispatch work from inside your firewall to the Microsoft cloud. You cannot run R Enterprise in a standalone fashion on Azure, and you have to have at least one server node running Microsoft's Windows HPC Server to dispatch R work to Azure. You can have more nodes than that in the local cluster, of course, but you need at least one. And this bursting function, which has been in beta testing for the past four months, does not work with either releases of R Enterprise.

R Enterprise comes in two flavors: workstation and server. A workstation edition which is designed for a single user on a single workstation PC costs $1,000 per machine per year for a license. The server edition, which can be used by an unlimited number of end users firing work at the cluster or cloud, costs $30,000 per year for an eight-core x86 server. ®

The essential guide to IT transformation

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
Docker kicks KVM's butt in IBM tests
Big Blue finds containers are speedy, but may not have much room to improve
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Gartner's Special Report: Should you believe the hype?
Enough hot air to carry a balloon to the Moon
Flash could be CHEAPER than SAS DISK? Come off it, NetApp
Stats analysis reckons we'll hit that point in just three years
Dell The Man shrieks: 'We've got a Bitcoin order, we've got a Bitcoin order'
$50k of PowerEdge servers? That'll be 85 coins in digi-dosh
prev story

Whitepapers

5 things you didn’t know about cloud backup
IT departments are embracing cloud backup, but there’s a lot you need to know before choosing a service provider. Learn all the critical things you need to know.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.