Feeds

Revolution lets R do stats on big data

Scalability boost, too

Build a business case: developing custom apps

If you've got big data, then R will soon be able to chew on it and spit out some answers.

Revolution Analytics was formed in May to become the 'Red Hat for stats', funding development for the open source R statistical programming language and offering a commercially supported, open core variant for enterprise customers with some of the bells and whistles that are missing from the open source R package. At the time of the launch, the company said that it was working to allow R to scale better within a server and across servers and to give it extensions to analyze big data sets commonly stored in NoSQL, Hadoop and other formats.

Today, Revolution Analytics will preview Revolution R Enterprise V4, its future release which is in beta now and which should ship by the end of August, according to Jeff Erhardt, chief operating officer at the company. With the V4 update, R Enterprise gets two things.

First, the guts of the R code have been changed to understand threading better and scale across clusters if need be, not just try to work on a couple of threads and the main memory available on a single system. R Enterprise V4 has been tweaked to allow calculations normally undertaken on a single workstation in R (and usually not across very many threads) to be distributed across threads within a CPU core, multiple CPUs within a system, or multiple systems in a cluster.

David Champagne, who was the principal architect and engineer at SPSS (now owned by IBM) and is now chief technology officer at Revolution Analytics, says that on a single machine the scalability tweaks are based on the company's own threading code, not openMP or some other code. For distributed computing across a network of machines, the tweaks R relies on remote procedure calls (RPC) to communicate between the nodes as they chew on data. "We are looking at possibly changing this in the future to use something like MPI," says Champagne. MPI, of course, is the Message Passing Interface protocol that parallel supercomputers use to pass data and distribute HPC work across clusters.

The other big change coming with R Enterprise V4 is a binary big data format called XDF, which Erhardt says is loosely based on NoSQL. (Which is funny, because NoSQL is, by definition, a pretty loose definition to describe a whole bunch of non-relational data stores.) The important thing is that the XDF format for R Enterprise allows users to do data chunking and to provide very high-speed data access to arbitrary rows, columns, and blocks in the store. R Enterprise V4 has tools to pull data into the new XDF format and can also then spread calculations across multiple threads, cores, CPUs, and machines to scale up the performance of analysis on big data sets.

The new XDF data store and scalability enhancements will be in a priced feature called Revo Scale R, which is an add-on module for R Enterprise V4. Customers who have bought R Enterprise V3.X releases and who are on current maintenance contracts will get the upgrade to V4 as well as the Revo Scale R module for free, says Erhardt. New customers will have to pay an incremental fee for the new big data and scalability enhancements.

Revolution Analytics is a bit vague about pricing, but says that for a single user working at a workstation, R Enterprise runs a few thousand dollars; for a server with a reasonable number of cores and sockets, it's on the order of $25,000 for a license. R Enterprise runs on Microsoft Windows Server 2003 and 2008 and on Red Hat Enterprise Linux 5. The Revo Scale R add-on is only initially available for Windows platforms, but will be available for RHEL soon, probably early in the fourth quarter according to Erhardt.

Since the relaunch of the company in May - Revolution used to be a consultancy before it became an R distro - the company has boosted its customer base by 60 per cent, to 120. Quantitative finance and big pharma were always strong suits for the open source R language, but now Erhardt says companies in retail, telecomms, media and entertainment, and information services are all coming to Revolution Analytics to talk about R Enterprise and the extensions they are looking for.

One of the things that Revolution Analytics is cooking up is a web services platform, which will allow the part of R analysts used to create algorithms from doing analysis to be physically separated from the machines where the calculations are run. The idea is to allow for heavy calculations to be deployed to cloudy infrastructure. And because a lot of quants have built their models in Excel spreadsheets, the company has already demonstrated the ability to have R-based analytics executed from buttons in Excel but have the calculations on the data stored in spreadsheets to be done on a cloud of machines - and do the math a lot faster. ®

HP ProLiant Gen8: Integrated lifecycle automation

More from The Register

next story
Secure microkernel that uses maths to be 'bug free' goes open source
Hacker-repelling, drone-protecting code will soon be yours to tweak as you see fit
KDE releases ice-cream coloured Plasma 5 just in time for summer
Melty but refreshing - popular rival to Mint's Cinnamon's still a work in progress
NO MORE ALL CAPS and other pleasures of Visual Studio 14
Unpicking a packed preview that breaks down ASP.NET
Cheer up, Nokia fans. It can start making mobes again in 18 months
The real winner of the Nokia sale is *drumroll* ... Nokia
Put down that Oracle database patch: It could cost $23,000 per CPU
On-by-default INMEMORY tech a boon for developers ... as long as they can afford it
Another day, another Firefox: Version 31 is upon us ALREADY
Web devs, Mozilla really wants you to like this one
Google shows off new Chrome OS look
Athena springs full-grown from Chromium project's head
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Application security programs and practises
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Securing Web Applications Made Simple and Scalable
Learn how automated security testing can provide a simple and scalable way to protect your web applications.