Feeds

Revolution lets R do stats on big data

Scalability boost, too

Combat fraud and increase customer satisfaction

If you've got big data, then R will soon be able to chew on it and spit out some answers.

Revolution Analytics was formed in May to become the 'Red Hat for stats', funding development for the open source R statistical programming language and offering a commercially supported, open core variant for enterprise customers with some of the bells and whistles that are missing from the open source R package. At the time of the launch, the company said that it was working to allow R to scale better within a server and across servers and to give it extensions to analyze big data sets commonly stored in NoSQL, Hadoop and other formats.

Today, Revolution Analytics will preview Revolution R Enterprise V4, its future release which is in beta now and which should ship by the end of August, according to Jeff Erhardt, chief operating officer at the company. With the V4 update, R Enterprise gets two things.

First, the guts of the R code have been changed to understand threading better and scale across clusters if need be, not just try to work on a couple of threads and the main memory available on a single system. R Enterprise V4 has been tweaked to allow calculations normally undertaken on a single workstation in R (and usually not across very many threads) to be distributed across threads within a CPU core, multiple CPUs within a system, or multiple systems in a cluster.

David Champagne, who was the principal architect and engineer at SPSS (now owned by IBM) and is now chief technology officer at Revolution Analytics, says that on a single machine the scalability tweaks are based on the company's own threading code, not openMP or some other code. For distributed computing across a network of machines, the tweaks R relies on remote procedure calls (RPC) to communicate between the nodes as they chew on data. "We are looking at possibly changing this in the future to use something like MPI," says Champagne. MPI, of course, is the Message Passing Interface protocol that parallel supercomputers use to pass data and distribute HPC work across clusters.

The other big change coming with R Enterprise V4 is a binary big data format called XDF, which Erhardt says is loosely based on NoSQL. (Which is funny, because NoSQL is, by definition, a pretty loose definition to describe a whole bunch of non-relational data stores.) The important thing is that the XDF format for R Enterprise allows users to do data chunking and to provide very high-speed data access to arbitrary rows, columns, and blocks in the store. R Enterprise V4 has tools to pull data into the new XDF format and can also then spread calculations across multiple threads, cores, CPUs, and machines to scale up the performance of analysis on big data sets.

The new XDF data store and scalability enhancements will be in a priced feature called Revo Scale R, which is an add-on module for R Enterprise V4. Customers who have bought R Enterprise V3.X releases and who are on current maintenance contracts will get the upgrade to V4 as well as the Revo Scale R module for free, says Erhardt. New customers will have to pay an incremental fee for the new big data and scalability enhancements.

Revolution Analytics is a bit vague about pricing, but says that for a single user working at a workstation, R Enterprise runs a few thousand dollars; for a server with a reasonable number of cores and sockets, it's on the order of $25,000 for a license. R Enterprise runs on Microsoft Windows Server 2003 and 2008 and on Red Hat Enterprise Linux 5. The Revo Scale R add-on is only initially available for Windows platforms, but will be available for RHEL soon, probably early in the fourth quarter according to Erhardt.

Since the relaunch of the company in May - Revolution used to be a consultancy before it became an R distro - the company has boosted its customer base by 60 per cent, to 120. Quantitative finance and big pharma were always strong suits for the open source R language, but now Erhardt says companies in retail, telecomms, media and entertainment, and information services are all coming to Revolution Analytics to talk about R Enterprise and the extensions they are looking for.

One of the things that Revolution Analytics is cooking up is a web services platform, which will allow the part of R analysts used to create algorithms from doing analysis to be physically separated from the machines where the calculations are run. The idea is to allow for heavy calculations to be deployed to cloudy infrastructure. And because a lot of quants have built their models in Excel spreadsheets, the company has already demonstrated the ability to have R-based analytics executed from buttons in Excel but have the calculations on the data stored in spreadsheets to be done on a cloud of machines - and do the math a lot faster. ®

3 Big data security analytics techniques

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database sqares the circle of cheap memory vs speed
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Oh no, Joe: WinPhone users already griping over 8.1 mega-update
Hang on. Which bit of Developer Preview don't you understand?
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
Batten down the hatches, Ubuntu 14.04 LTS due in TWO DAYS
Admins dab straining server brows in advance of Trusty Tahr's long-term support landing
IRS boss on XP migration: 'Classic fix the airplane while you're flying it attempt'
Plus: Condoleezza Rice at Dropbox 'maybe she can find ... weapons of mass destruction'
Ditch the sync, paddle in the Streem: Upstart offers syncless sharing
Upload, delete and carry on sharing afterwards?
prev story

Whitepapers

Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.