Feeds

Revolution lets R do stats on big data

Scalability boost, too

Internet Security Threat Report 2014

If you've got big data, then R will soon be able to chew on it and spit out some answers.

Revolution Analytics was formed in May to become the 'Red Hat for stats', funding development for the open source R statistical programming language and offering a commercially supported, open core variant for enterprise customers with some of the bells and whistles that are missing from the open source R package. At the time of the launch, the company said that it was working to allow R to scale better within a server and across servers and to give it extensions to analyze big data sets commonly stored in NoSQL, Hadoop and other formats.

Today, Revolution Analytics will preview Revolution R Enterprise V4, its future release which is in beta now and which should ship by the end of August, according to Jeff Erhardt, chief operating officer at the company. With the V4 update, R Enterprise gets two things.

First, the guts of the R code have been changed to understand threading better and scale across clusters if need be, not just try to work on a couple of threads and the main memory available on a single system. R Enterprise V4 has been tweaked to allow calculations normally undertaken on a single workstation in R (and usually not across very many threads) to be distributed across threads within a CPU core, multiple CPUs within a system, or multiple systems in a cluster.

David Champagne, who was the principal architect and engineer at SPSS (now owned by IBM) and is now chief technology officer at Revolution Analytics, says that on a single machine the scalability tweaks are based on the company's own threading code, not openMP or some other code. For distributed computing across a network of machines, the tweaks R relies on remote procedure calls (RPC) to communicate between the nodes as they chew on data. "We are looking at possibly changing this in the future to use something like MPI," says Champagne. MPI, of course, is the Message Passing Interface protocol that parallel supercomputers use to pass data and distribute HPC work across clusters.

The other big change coming with R Enterprise V4 is a binary big data format called XDF, which Erhardt says is loosely based on NoSQL. (Which is funny, because NoSQL is, by definition, a pretty loose definition to describe a whole bunch of non-relational data stores.) The important thing is that the XDF format for R Enterprise allows users to do data chunking and to provide very high-speed data access to arbitrary rows, columns, and blocks in the store. R Enterprise V4 has tools to pull data into the new XDF format and can also then spread calculations across multiple threads, cores, CPUs, and machines to scale up the performance of analysis on big data sets.

The new XDF data store and scalability enhancements will be in a priced feature called Revo Scale R, which is an add-on module for R Enterprise V4. Customers who have bought R Enterprise V3.X releases and who are on current maintenance contracts will get the upgrade to V4 as well as the Revo Scale R module for free, says Erhardt. New customers will have to pay an incremental fee for the new big data and scalability enhancements.

Revolution Analytics is a bit vague about pricing, but says that for a single user working at a workstation, R Enterprise runs a few thousand dollars; for a server with a reasonable number of cores and sockets, it's on the order of $25,000 for a license. R Enterprise runs on Microsoft Windows Server 2003 and 2008 and on Red Hat Enterprise Linux 5. The Revo Scale R add-on is only initially available for Windows platforms, but will be available for RHEL soon, probably early in the fourth quarter according to Erhardt.

Since the relaunch of the company in May - Revolution used to be a consultancy before it became an R distro - the company has boosted its customer base by 60 per cent, to 120. Quantitative finance and big pharma were always strong suits for the open source R language, but now Erhardt says companies in retail, telecomms, media and entertainment, and information services are all coming to Revolution Analytics to talk about R Enterprise and the extensions they are looking for.

One of the things that Revolution Analytics is cooking up is a web services platform, which will allow the part of R analysts used to create algorithms from doing analysis to be physically separated from the machines where the calculations are run. The idea is to allow for heavy calculations to be deployed to cloudy infrastructure. And because a lot of quants have built their models in Excel spreadsheets, the company has already demonstrated the ability to have R-based analytics executed from buttons in Excel but have the calculations on the data stored in spreadsheets to be done on a cloud of machines - and do the math a lot faster. ®

Choosing a cloud hosting partner with confidence

More from The Register

next story
Download alert: Nearly ALL top 100 Android, iOS paid apps hacked
Attack of the Clones? Yeah, but much, much scarier – report
You stupid BRICK! PCs running Avast AV can't handle Windows fixes
Fix issued, fingers pointed, forums in flames
Microsoft: Your Linux Docker containers are now OURS to command
New tool lets admins wrangle Linux apps from Windows
Facebook, working on Facebook at Work, works on Facebook. At Work
You don't want your cat or drunk pics at the office
Soz, web devs: Google snatches its Wallet off the table
Killing off web service in 3 months... but app-happy bonkers are fine
First in line to order a Nexus 6? AT&T has a BRICK for you
Black Screen of Death plagues early Google-mobe batch
prev story

Whitepapers

Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
Go beyond APM with real-time IT operations analytics
How IT operations teams can harness the wealth of wire data already flowing through their environment for real-time operational intelligence.
The total economic impact of Druva inSync
Examining the ROI enterprises may realize by implementing inSync, as they look to improve backup and recovery of endpoint data in a cost-effective manner.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.