Feeds

Revolution Analytics paints R stats Azure blue

Gooses performance, spans HPC clusters with 6.0 update

Remote control for virtualized desktops

Revolution Analytics, aka "Red Hat for stats" – which commercialized the open source R programming language and statistical analysis tool – has now tweaked its R Enterprise stack and pushed out a 6.0 release.

The new R Enterprise 6.0 is based on the R 2.14.2 engine, which is the latest stable release of the open source code, according to David Smith, head of marketing at the company. This code was released on February 29, with the 2.15.0 update just coming out on May 30 and not quite ready for inclusion in R Enterprise. You can see the full release notes for R 2.14.2 here.

The big new feature this edition of the R engine gives to users is a byte compiler that has been added to the engine. Similar to Java byte codes for a Java virtual machine, the byte compiler compiles the interpreted R code down to an intermediate stage before it executes in the R engine, which can speed up the operations by the R interpreter by around 30 per cent, according to Smith. This byte compiler for the R interpreter has no effect whatsoever on any number-crunching that the R engine needs to do, since this is not done by the interpreter but by another part of the engine. So obviously the performance improvements you will see from the new R engine will depend on the nature of the statistical algorithms you are running.

The update also has support for Generalized Linear Models, or GLMs, in stat speak. These include Logistic (Binomial) Poisson, Gamma, and Tweedie models, which are all supported with a high-performance C++ implementation, according to Smith.

A new feature of the R Enterprise 6.0 is integration with IBM's new Platform LSF V8.3 scheduler for HPC grids, which allows for R routines to be parallelized and run on a cluster of x86 iron. Put the two together – GLMs and HPC clusters – and you can get a significant speedup in many cases, according to Smith.

In the case of an insurance company that was doing "tweedie distribution" analysis against 30 million claims using the SAS stats package on a big SMP server, the job took eight hours to run. During beta tests, this customer fired up R Enterprise on an eight-node x86 cluster, used Platform LSF to dispatch work to the nodes from a workstation running R Enterprise and with the nodes running R Enterprise as well, and the job finished in 10 minutes. While the shortening of the time to complete the job is important, what is perhaps more important is the ability to iterate models quickly and improve them because the job runs so much faster. You need to be running Red Hat Enterprise Linux and R Enterprise on server nodes if you want Platform LSF to dispatch work in parallel to them.

The 6.0 release also includes the ability to read SAS and SPSS native file formats directly as well as sucking in raw ASCII text data and information sucked out of relational databases using ODBC to have it analyzed. In the past, R Enterprise had to convert this data to its own XDF NoSQL-like data store, and on data that is constantly changing, this reformatting is a pain in the neck. Now, you can just use the native data sets and, perhaps more importantly, not have to worry about having a license for SAS or SPSS if you have moved off those platforms to open-core R Enterprise tools.

Revolution Analytics already supported the running of its R stack inside of Amazon's EC2 compute clouds, and Smith tells El Reg that the company has "quite a few" customers that run R Enterprise in the cloud, and that all of the proof of concepts that Revolution Analytics does with customers run on EC2 as well.

But not everyone uses EC2. Some people use Microsoft's Azure cloud, and starting with R Enterprise 6.0, you can now fire up R instances on the Azure cloud. At the moment, Revolution Analytics is only supporting the bursting features of Azure, which allows you to dispatch work from inside your firewall to the Microsoft cloud. You cannot run R Enterprise in a standalone fashion on Azure, and you have to have at least one server node running Microsoft's Windows HPC Server to dispatch R work to Azure. You can have more nodes than that in the local cluster, of course, but you need at least one. And this bursting function, which has been in beta testing for the past four months, does not work with either releases of R Enterprise.

R Enterprise comes in two flavors: workstation and server. A workstation edition which is designed for a single user on a single workstation PC costs $1,000 per machine per year for a license. The server edition, which can be used by an unlimited number of end users firing work at the cluster or cloud, costs $30,000 per year for an eight-core x86 server. ®

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
NSA SOURCE CODE LEAK: Information slurp tools to appear online
Now you can run your own intelligence agency
Azure TITSUP caused by INFINITE LOOP
Fat fingered geo-block kept Aussies in the dark
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
Cloud unicorns are extinct so DiData cloud mess was YOUR fault
Applications need to be built to handle TITSUP incidents
Stop the IoT revolution! We need to figure out packet sizes first
Researchers test 802.15.4 and find we know nuh-think! about large scale sensor network ops
Turnbull should spare us all airline-magazine-grade cloud hype
Box-hugger is not a dirty word, Minister. Box-huggers make the cloud WORK
SanDisk vows: We'll have a 16TB SSD WHOPPER by 2016
Flash WORM has a serious use for archived photos and videos
Astro-boffins start opening universe simulation data
Got a supercomputer? Want to simulate a universe? Here you go
Microsoft adds video offering to Office 365. Oh NOES, you'll need Adobe Flash
Lovely presentations... but not on your Flash-hating mobe
prev story

Whitepapers

Seattle children’s accelerates Citrix login times by 500% with cross-tier insight
Seattle Children’s is a leading research hospital with a large and growing Citrix XenDesktop deployment. See how they used ExtraHop to accelerate launch times.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
Why CIOs should rethink endpoint data protection in the age of mobility
Assessing trends in data protection, specifically with respect to mobile devices, BYOD, and remote employees.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Website security in corporate America
Find out how you rank among other IT managers testing your website's vulnerabilities.