Feeds

Oracle tucks R stats language into database

R-acle 11g, Quant Edition

Secure remote control for conventional and virtual desktops

Relational database juggernaut Oracle has embedded the R programming language used by more than 2 million statisticians and quants the world over into its 11g relational database. Call it R-acle 11g, Quant Edition.

R, of course, is the open source statistical analysis programming language and is also the name of the runtime engine for that language. R is a bit like the Red Hat for stats, with its main competitors being the closed source analytic tools from SAS Institute and IBM's SPSS unit, among others. The R language was created in 1996 by Ross Ihaka and Robert Gentleman, two stats professors from the University of Auckland in New Zealand.

Nearly two years ago, Revolution Analytics burst on the scene with an effort to commercialize R and its runtime engine, as well as to do proprietary extensions that allowed it to scale across bigger iron than the open source implementation. Since that time, Revolution Analytics has upgraded its Enterprise R so it can read and write data natively in the SAS file format and has parallelized R so it can run on the nodes in a Hadoop cluster, doing statistical analysis on each node's data sets and then reducing them down to a final answer.

Oracle is not doing anything like this, and it certainly is not rolling up its own distribution of R and providing tech support and tweaks to it, as it has done with Red Hat's Enterprise Linux operating system and the Xen hypervisor. That's not saying that Oracle won't ever make its own R-acle distribution someday, or even acquire Revolution Analytics, if it thinks its innovations with R are important enough to want to control.

What Oracle is doing is a bit simpler, and will nonetheless be useful for many Oracle database shops. Advanced Analytics, as the R tools are called, is a new option for the Oracle 11g R2 database.

In the past, Oracle sold a data mining suite as an add-on to its eponymous database, called Oracle Data Mining, for $23,000 per processor core. It had about a dozen data mining routines. The Advanced Analytics add-on that Oracle is now shipping is a superset of this code, and now includes a version of the R programming language and runtime. The is the open source version with no proprietary extensions, George Lumpkin, vice president of product development for data warehousing at Oracle, tells El Reg.

As it turns out, Oracle had already embedded a broad set of statistical algorithms, coded in SQL, inside of the Oracle 11g database. And with the Advanced Analytics add-on, quants working from the R client on their desktops can run their analyses and where possible, an R function will invoke one of these embedded SQL functions to do the same calculations on the data stored in the Oracle database.

For those stat algorithms that can't be invoked with SQL, Oracle has put an "embedded R" engine in the database tier and they run inside of this engine. This engine understands the parallel nature of Oracle RAC and Exadata database clusters and can chew on data across multiple nodes then present summary data back to the quant sitting at an R client console.

"What the statisticians want is to not know the database is there," says Lumpkin. "We are taking the scalability of the database and making it transparent."

Moreover, once you have statistical algorithms coded up in R, any program that runs against the Oracle database can invoke that code and run it as well. All you have to call it, and the R will come running.

R-iding an elephant

The Advanced Analytics add-on for Oracle 11g is not the only R product that Oracle is distributing and supporting. In conjunction with its Big Data Appliance, launched back in October 2010, and more thoroughly fleshed out in January of this year, includes a little something called the R Connector for Hadoop, which has hooks to let R talk to the HDFS and NoSQL (BerkeleyDB) data stores that underpin the Cloudera CDH3 distribution Oracle is putting on its x86 server cluster (similar to but not the same as the Exadata database machine). The set of connectors, including the R connector, costs $2,000 per core used on the Hadoop platform.

Dave Rich, the new CEO at Revolution Analytics who just joined from the analytics unit of Accenture, didn't think the Oracle approach to R would have an adverse impact on his business. "There's plenty of room in the market, and if anything, it helps us," Rich tells El Reg. "It legitimizes R as enterprise-class, and raises all ships."

Rich added that many customers are leery of becoming a one-vendor shop and want alternatives. Oracle would argue just the opposite, as its engineered systems are designed to work best with an Oracle stack tuned to work better together than any alternatives that might plug into the stack.

Oracle, says Rich, had to add R functionality because IBM's Netezza and Teradata's eponymous appliances have it, and there is still a possibility that Oracle could partner with Revolution Analytics, much as it has with Cloudera for its Hadoop distro. ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Munich considers dumping Linux for ... GULP ... Windows!
Give a penguinista a hug, the Outlook's not good for open source's poster child
Intel's Raspberry Pi rival Galileo can now run Windows
Behold the Internet of Things. Wintel Things
Linux Foundation says many Linux admins and engineers are certifiable
Floats exam program to help IT employers lock up talent
Microsoft cries UNINSTALL in the wake of Blue Screens of Death™
Cache crash causes contained choloric calamity
Eat up Martha! Microsoft slings handwriting recog into OneNote on Android
Freehand input on non-Windows kit for the first time
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 10 endpoint backup mistakes
Avoid the ten endpoint backup mistakes to ensure that your critical corporate data is protected and end user productivity is improved.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.