Original URL: http://www.theregister.co.uk/2010/05/06/revolution_commercial_r/

Open source R in commercial Revolution

Red Hat for stats

By Timothy Prickett Morgan

Posted in Developer, 6th May 2010 20:16 GMT

Put on your eye patch and get out your parrot. The open source R programming language for statistical analysis and graphics is getting a commercial sponsor. What Red Hat did for Linux, Revolution Analytics wants to do for R, and it wants to use the open source subscription model to take on SAS Institute, SPSS (now part of IBM), and others who have been the market leaders (in terms of money) for statistical analysis for several decades.

While IT shops don't know about R, plenty of people have been using it for more than a decade to do statistical predictive analysis against all kinds of data sets and produce graphics for that analysis in a wide range of fields, including quants in financial services companies and researchers in pharmaceutical companies trying to sift new drugs from countless possibilities.

The R language was created in 1996 by Ross Ihaka and Robert Gentleman, two stat professors from the University of Auckland in New Zealand who are still core members of the R development team. In January 2008, Intel Capital kicked an undisclosed amount of money to Revolution's kitty to kick start the effort to commercialize R, which has over 2,500 plug-ins to cover all kinds of data sets and statistical analysis techniques peculiar to different industries. Last October, North Bridge Venture Partners and Intel Capital put another $9m in the war chest for Revolution and hired Norman Nie, one of the co-founders of SPSS back in 1967 and a designer of its predictive analytics software, to be the company's chief executive officer.

David Champagne, who was the principal architect and engineer at SPSS, is chief technology officer at Revolution, and David Smith, who is a statistician with a degree from the University of Adelaide, South Australia, is the head of marketing at the company. Smith worked on the closed-source S statistics programming language (now owned by Tibco Software) and literally wrote the book on how to use its open source offspring, R. ("Offspring" in the sense that Linux is a kind of open source Unix without the high price tag, but different enough not to be compatible).

According to Smith, there are approximately 2 million people who use R. "Anybody who studies statistics uses R in their research," says Smith. That user base includes loads of students and academics as well as researchers across all manner of industries. The quants at financial services companies have taken a particular shining to R, and not just because they are cheap.

Jeff Erhardt, who was a heavy R user when he worked at chip makers Advanced Micro Devices and Spansion and who is chief operating officer at Revolution, says that universities are not teaching SAS and SPSS any more. They are using R, just like proprietary and Unix operating systems have been displaced by Linux in computer science programs.

Revolution Analytics got its start in 2007 as a spinout from a Yale University incubator. For its first two years, the company (which was called Revolution Computing back then) focused on creating a parallel implementation of R, called ParallelR, and selling services for that tweaked version. With the second round of funding, new management was brought in, R co-founder Gentleman was added to the board, and the idea became to offer a full R stack with commercial support, just like Red Hat offers a full Linux stack and makes its money on support subscriptions.

The marketing tactics for the R Enterprise will be much the same, comparing Linux to Unix and proprietary operating systems. Smith says that the commercial-grade support for the R Enterprise stack will be available in a workstation version that costs $2,000, and the parallel version to run on servers will cost $10,000 for each two-socket server in a cluster. That may seem like a lot of dough for a stat and graphics package, but Smith says this is well below half the cost of similar functionality for SPSS or SAS packages.

Revolution is going to do more than certify applications and set up a tech support line to justify that money. Smith says that there are a number of problems with R that need to be addressed to help it go more mainstream. For one thing, he says that while R has a number of different graphical interfaces available, it is still fundamentally driven through a command line interface.

The R engine also does not scale well because it is memory bound and therefore can only work on relatively small data sets. And it has not had a corporate focal point for development. So Revolution is positioning itself to be that focus, and it will be putting out a development roadmap that includes a thin client interface and a "big data" engine that offers many orders of magnitude in speed as well as the ability to chew on terabyte-sized data sets.

Open source purists probably won't be all too happy to learn that Revolution is going to be employing an "open core" strategy, which means the core R programs will remain open source and be given tech support under a license model, but the key add-ons that make R more scalable will be closed source and sold under a separate license fee. Because most of those 2,500 add-ons for R were built by academics and Revolution wants to supplant SPSS and SAS as the tools used by students, Revolution will be giving the full single-user version of the R Enterprise stack away for free to academics. ®