Feeds

Open source R in commercial Revolution

Red Hat for stats

Secure remote control for conventional and virtual desktops

Put on your eye patch and get out your parrot. The open source R programming language for statistical analysis and graphics is getting a commercial sponsor. What Red Hat did for Linux, Revolution Analytics wants to do for R, and it wants to use the open source subscription model to take on SAS Institute, SPSS (now part of IBM), and others who have been the market leaders (in terms of money) for statistical analysis for several decades.

While IT shops don't know about R, plenty of people have been using it for more than a decade to do statistical predictive analysis against all kinds of data sets and produce graphics for that analysis in a wide range of fields, including quants in financial services companies and researchers in pharmaceutical companies trying to sift new drugs from countless possibilities.

The R language was created in 1996 by Ross Ihaka and Robert Gentleman, two stat professors from the University of Auckland in New Zealand who are still core members of the R development team. In January 2008, Intel Capital kicked an undisclosed amount of money to Revolution's kitty to kick start the effort to commercialize R, which has over 2,500 plug-ins to cover all kinds of data sets and statistical analysis techniques peculiar to different industries. Last October, North Bridge Venture Partners and Intel Capital put another $9m in the war chest for Revolution and hired Norman Nie, one of the co-founders of SPSS back in 1967 and a designer of its predictive analytics software, to be the company's chief executive officer.

David Champagne, who was the principal architect and engineer at SPSS, is chief technology officer at Revolution, and David Smith, who is a statistician with a degree from the University of Adelaide, South Australia, is the head of marketing at the company. Smith worked on the closed-source S statistics programming language (now owned by Tibco Software) and literally wrote the book on how to use its open source offspring, R. ("Offspring" in the sense that Linux is a kind of open source Unix without the high price tag, but different enough not to be compatible).

According to Smith, there are approximately 2 million people who use R. "Anybody who studies statistics uses R in their research," says Smith. That user base includes loads of students and academics as well as researchers across all manner of industries. The quants at financial services companies have taken a particular shining to R, and not just because they are cheap.

Jeff Erhardt, who was a heavy R user when he worked at chip makers Advanced Micro Devices and Spansion and who is chief operating officer at Revolution, says that universities are not teaching SAS and SPSS any more. They are using R, just like proprietary and Unix operating systems have been displaced by Linux in computer science programs.

Revolution Analytics got its start in 2007 as a spinout from a Yale University incubator. For its first two years, the company (which was called Revolution Computing back then) focused on creating a parallel implementation of R, called ParallelR, and selling services for that tweaked version. With the second round of funding, new management was brought in, R co-founder Gentleman was added to the board, and the idea became to offer a full R stack with commercial support, just like Red Hat offers a full Linux stack and makes its money on support subscriptions.

The marketing tactics for the R Enterprise will be much the same, comparing Linux to Unix and proprietary operating systems. Smith says that the commercial-grade support for the R Enterprise stack will be available in a workstation version that costs $2,000, and the parallel version to run on servers will cost $10,000 for each two-socket server in a cluster. That may seem like a lot of dough for a stat and graphics package, but Smith says this is well below half the cost of similar functionality for SPSS or SAS packages.

Revolution is going to do more than certify applications and set up a tech support line to justify that money. Smith says that there are a number of problems with R that need to be addressed to help it go more mainstream. For one thing, he says that while R has a number of different graphical interfaces available, it is still fundamentally driven through a command line interface.

The R engine also does not scale well because it is memory bound and therefore can only work on relatively small data sets. And it has not had a corporate focal point for development. So Revolution is positioning itself to be that focus, and it will be putting out a development roadmap that includes a thin client interface and a "big data" engine that offers many orders of magnitude in speed as well as the ability to chew on terabyte-sized data sets.

Open source purists probably won't be all too happy to learn that Revolution is going to be employing an "open core" strategy, which means the core R programs will remain open source and be given tech support under a license model, but the key add-ons that make R more scalable will be closed source and sold under a separate license fee. Because most of those 2,500 add-ons for R were built by academics and Revolution wants to supplant SPSS and SAS as the tools used by students, Revolution will be giving the full single-user version of the R Enterprise stack away for free to academics. ®

Internet Security Threat Report 2014

More from The Register

next story
Download alert: Nearly ALL top 100 Android, iOS paid apps hacked
Attack of the Clones? Yeah, but much, much scarier – report
NSA SOURCE CODE LEAK: Information slurp tools to appear online
Now you can run your own intelligence agency
Microsoft: Your Linux Docker containers are now OURS to command
New tool lets admins wrangle Linux apps from Windows
Microsoft adds video offering to Office 365. Oh NOES, you'll need Adobe Flash
Lovely presentations... but not on your Flash-hating mobe
You stupid BRICK! PCs running Avast AV can't handle Windows fixes
Fix issued, fingers pointed, forums in flames
HTML5 vs native: Harry Coder and the mudblood mobile app princes
Developers just want their ideas to generate money
prev story

Whitepapers

10 ways wire data helps conquer IT complexity
IT teams can automatically detect problems across the IT environment, spot data theft, select unique pieces of transaction payloads to send to a data source, and more.
The total economic impact of Druva inSync
Examining the ROI enterprises may realize by implementing inSync, as they look to improve backup and recovery of endpoint data in a cost-effective manner.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security and trust: The backbone of doing business over the internet
Explores the current state of website security and the contributions Symantec is making to help organizations protect critical data and build trust with customers.