Feeds

Revolution lets R do stats on big data

Scalability boost, too

High performance access to file storage

If you've got big data, then R will soon be able to chew on it and spit out some answers.

Revolution Analytics was formed in May to become the 'Red Hat for stats', funding development for the open source R statistical programming language and offering a commercially supported, open core variant for enterprise customers with some of the bells and whistles that are missing from the open source R package. At the time of the launch, the company said that it was working to allow R to scale better within a server and across servers and to give it extensions to analyze big data sets commonly stored in NoSQL, Hadoop and other formats.

Today, Revolution Analytics will preview Revolution R Enterprise V4, its future release which is in beta now and which should ship by the end of August, according to Jeff Erhardt, chief operating officer at the company. With the V4 update, R Enterprise gets two things.

First, the guts of the R code have been changed to understand threading better and scale across clusters if need be, not just try to work on a couple of threads and the main memory available on a single system. R Enterprise V4 has been tweaked to allow calculations normally undertaken on a single workstation in R (and usually not across very many threads) to be distributed across threads within a CPU core, multiple CPUs within a system, or multiple systems in a cluster.

David Champagne, who was the principal architect and engineer at SPSS (now owned by IBM) and is now chief technology officer at Revolution Analytics, says that on a single machine the scalability tweaks are based on the company's own threading code, not openMP or some other code. For distributed computing across a network of machines, the tweaks R relies on remote procedure calls (RPC) to communicate between the nodes as they chew on data. "We are looking at possibly changing this in the future to use something like MPI," says Champagne. MPI, of course, is the Message Passing Interface protocol that parallel supercomputers use to pass data and distribute HPC work across clusters.

The other big change coming with R Enterprise V4 is a binary big data format called XDF, which Erhardt says is loosely based on NoSQL. (Which is funny, because NoSQL is, by definition, a pretty loose definition to describe a whole bunch of non-relational data stores.) The important thing is that the XDF format for R Enterprise allows users to do data chunking and to provide very high-speed data access to arbitrary rows, columns, and blocks in the store. R Enterprise V4 has tools to pull data into the new XDF format and can also then spread calculations across multiple threads, cores, CPUs, and machines to scale up the performance of analysis on big data sets.

The new XDF data store and scalability enhancements will be in a priced feature called Revo Scale R, which is an add-on module for R Enterprise V4. Customers who have bought R Enterprise V3.X releases and who are on current maintenance contracts will get the upgrade to V4 as well as the Revo Scale R module for free, says Erhardt. New customers will have to pay an incremental fee for the new big data and scalability enhancements.

Revolution Analytics is a bit vague about pricing, but says that for a single user working at a workstation, R Enterprise runs a few thousand dollars; for a server with a reasonable number of cores and sockets, it's on the order of $25,000 for a license. R Enterprise runs on Microsoft Windows Server 2003 and 2008 and on Red Hat Enterprise Linux 5. The Revo Scale R add-on is only initially available for Windows platforms, but will be available for RHEL soon, probably early in the fourth quarter according to Erhardt.

Since the relaunch of the company in May - Revolution used to be a consultancy before it became an R distro - the company has boosted its customer base by 60 per cent, to 120. Quantitative finance and big pharma were always strong suits for the open source R language, but now Erhardt says companies in retail, telecomms, media and entertainment, and information services are all coming to Revolution Analytics to talk about R Enterprise and the extensions they are looking for.

One of the things that Revolution Analytics is cooking up is a web services platform, which will allow the part of R analysts used to create algorithms from doing analysis to be physically separated from the machines where the calculations are run. The idea is to allow for heavy calculations to be deployed to cloudy infrastructure. And because a lot of quants have built their models in Excel spreadsheets, the company has already demonstrated the ability to have R-based analytics executed from buttons in Excel but have the calculations on the data stored in spreadsheets to be done on a cloud of machines - and do the math a lot faster. ®

High performance access to file storage

More from The Register

next story
Android engineer: We DIDN'T copy Apple OR follow Samsung's orders
Veep testifies for Samsung during Apple patent trial
Windows 8.1, which you probably haven't upgraded to yet, ALREADY OBSOLETE
Pre-Update versions of new Windows version will no longer support patches
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Batten down the hatches, Ubuntu 14.04 LTS due in TWO DAYS
Admins dab straining server brows in advance of Trusty Tahr's long-term support landing
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Windows XP still has 27 per cent market share on its deathbed
Windows 7 making some gains on XP Death Day
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
US taxman blows Win XP deadline, must now spend millions on custom support
Gov't IT likened to 'a Model T with a lot of things on top of it'
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
HP ArcSight ESM solution helps Finansbank
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.