Feeds

Revolution Analytics paints R stats Azure blue

Gooses performance, spans HPC clusters with 6.0 update

Application security programs and practises

Revolution Analytics, aka "Red Hat for stats" – which commercialized the open source R programming language and statistical analysis tool – has now tweaked its R Enterprise stack and pushed out a 6.0 release.

The new R Enterprise 6.0 is based on the R 2.14.2 engine, which is the latest stable release of the open source code, according to David Smith, head of marketing at the company. This code was released on February 29, with the 2.15.0 update just coming out on May 30 and not quite ready for inclusion in R Enterprise. You can see the full release notes for R 2.14.2 here.

The big new feature this edition of the R engine gives to users is a byte compiler that has been added to the engine. Similar to Java byte codes for a Java virtual machine, the byte compiler compiles the interpreted R code down to an intermediate stage before it executes in the R engine, which can speed up the operations by the R interpreter by around 30 per cent, according to Smith. This byte compiler for the R interpreter has no effect whatsoever on any number-crunching that the R engine needs to do, since this is not done by the interpreter but by another part of the engine. So obviously the performance improvements you will see from the new R engine will depend on the nature of the statistical algorithms you are running.

The update also has support for Generalized Linear Models, or GLMs, in stat speak. These include Logistic (Binomial) Poisson, Gamma, and Tweedie models, which are all supported with a high-performance C++ implementation, according to Smith.

A new feature of the R Enterprise 6.0 is integration with IBM's new Platform LSF V8.3 scheduler for HPC grids, which allows for R routines to be parallelized and run on a cluster of x86 iron. Put the two together – GLMs and HPC clusters – and you can get a significant speedup in many cases, according to Smith.

In the case of an insurance company that was doing "tweedie distribution" analysis against 30 million claims using the SAS stats package on a big SMP server, the job took eight hours to run. During beta tests, this customer fired up R Enterprise on an eight-node x86 cluster, used Platform LSF to dispatch work to the nodes from a workstation running R Enterprise and with the nodes running R Enterprise as well, and the job finished in 10 minutes. While the shortening of the time to complete the job is important, what is perhaps more important is the ability to iterate models quickly and improve them because the job runs so much faster. You need to be running Red Hat Enterprise Linux and R Enterprise on server nodes if you want Platform LSF to dispatch work in parallel to them.

The 6.0 release also includes the ability to read SAS and SPSS native file formats directly as well as sucking in raw ASCII text data and information sucked out of relational databases using ODBC to have it analyzed. In the past, R Enterprise had to convert this data to its own XDF NoSQL-like data store, and on data that is constantly changing, this reformatting is a pain in the neck. Now, you can just use the native data sets and, perhaps more importantly, not have to worry about having a license for SAS or SPSS if you have moved off those platforms to open-core R Enterprise tools.

Revolution Analytics already supported the running of its R stack inside of Amazon's EC2 compute clouds, and Smith tells El Reg that the company has "quite a few" customers that run R Enterprise in the cloud, and that all of the proof of concepts that Revolution Analytics does with customers run on EC2 as well.

But not everyone uses EC2. Some people use Microsoft's Azure cloud, and starting with R Enterprise 6.0, you can now fire up R instances on the Azure cloud. At the moment, Revolution Analytics is only supporting the bursting features of Azure, which allows you to dispatch work from inside your firewall to the Microsoft cloud. You cannot run R Enterprise in a standalone fashion on Azure, and you have to have at least one server node running Microsoft's Windows HPC Server to dispatch R work to Azure. You can have more nodes than that in the local cluster, of course, but you need at least one. And this bursting function, which has been in beta testing for the past four months, does not work with either releases of R Enterprise.

R Enterprise comes in two flavors: workstation and server. A workstation edition which is designed for a single user on a single workstation PC costs $1,000 per machine per year for a license. The server edition, which can be used by an unlimited number of end users firing work at the cluster or cloud, costs $30,000 per year for an eight-core x86 server. ®

Bridging the IT gap between rising business demands and ageing tools

More from The Register

next story
Attack of the clones: Oracle's latest Red Hat Linux lookalike arrives
Oracle's Linux boss says Larry's Linux isn't just for Oracle apps anymore
THUD! WD plonks down SIX TERABYTE 'consumer NAS' fatboy
Now that's a LOT of porn or pirated movies. Or, you know, other consumer stuff
Apple fanbois SCREAM as update BRICKS their Macbook Airs
Ragegasm spills over as firmware upgrade kills machines
EU's top data cops to meet Google, Microsoft et al over 'right to be forgotten'
Plan to hammer out 'coherent' guidelines. Good luck chaps!
US judge: YES, cops or feds so can slurp an ENTIRE Gmail account
Crooks don't have folders labelled 'drug records', opines NY beak
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
Manic malware Mayhem spreads through Linux, FreeBSD web servers
And how Google could cripple infection rate in a second
prev story

Whitepapers

Top three mobile application threats
Prevent sensitive data leakage over insecure channels or stolen mobile devices.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Application security programs and practises
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
The Essential Guide to IT Transformation
ServiceNow discusses three IT transformations that can help CIO's automate IT services to transform IT and the enterprise.