Feeds

Oracle tucks R stats language into database

R-acle 11g, Quant Edition

Secure remote control for conventional and virtual desktops

Relational database juggernaut Oracle has embedded the R programming language used by more than 2 million statisticians and quants the world over into its 11g relational database. Call it R-acle 11g, Quant Edition.

R, of course, is the open source statistical analysis programming language and is also the name of the runtime engine for that language. R is a bit like the Red Hat for stats, with its main competitors being the closed source analytic tools from SAS Institute and IBM's SPSS unit, among others. The R language was created in 1996 by Ross Ihaka and Robert Gentleman, two stats professors from the University of Auckland in New Zealand.

Nearly two years ago, Revolution Analytics burst on the scene with an effort to commercialize R and its runtime engine, as well as to do proprietary extensions that allowed it to scale across bigger iron than the open source implementation. Since that time, Revolution Analytics has upgraded its Enterprise R so it can read and write data natively in the SAS file format and has parallelized R so it can run on the nodes in a Hadoop cluster, doing statistical analysis on each node's data sets and then reducing them down to a final answer.

Oracle is not doing anything like this, and it certainly is not rolling up its own distribution of R and providing tech support and tweaks to it, as it has done with Red Hat's Enterprise Linux operating system and the Xen hypervisor. That's not saying that Oracle won't ever make its own R-acle distribution someday, or even acquire Revolution Analytics, if it thinks its innovations with R are important enough to want to control.

What Oracle is doing is a bit simpler, and will nonetheless be useful for many Oracle database shops. Advanced Analytics, as the R tools are called, is a new option for the Oracle 11g R2 database.

In the past, Oracle sold a data mining suite as an add-on to its eponymous database, called Oracle Data Mining, for $23,000 per processor core. It had about a dozen data mining routines. The Advanced Analytics add-on that Oracle is now shipping is a superset of this code, and now includes a version of the R programming language and runtime. The is the open source version with no proprietary extensions, George Lumpkin, vice president of product development for data warehousing at Oracle, tells El Reg.

As it turns out, Oracle had already embedded a broad set of statistical algorithms, coded in SQL, inside of the Oracle 11g database. And with the Advanced Analytics add-on, quants working from the R client on their desktops can run their analyses and where possible, an R function will invoke one of these embedded SQL functions to do the same calculations on the data stored in the Oracle database.

For those stat algorithms that can't be invoked with SQL, Oracle has put an "embedded R" engine in the database tier and they run inside of this engine. This engine understands the parallel nature of Oracle RAC and Exadata database clusters and can chew on data across multiple nodes then present summary data back to the quant sitting at an R client console.

"What the statisticians want is to not know the database is there," says Lumpkin. "We are taking the scalability of the database and making it transparent."

Moreover, once you have statistical algorithms coded up in R, any program that runs against the Oracle database can invoke that code and run it as well. All you have to call it, and the R will come running.

R-iding an elephant

The Advanced Analytics add-on for Oracle 11g is not the only R product that Oracle is distributing and supporting. In conjunction with its Big Data Appliance, launched back in October 2010, and more thoroughly fleshed out in January of this year, includes a little something called the R Connector for Hadoop, which has hooks to let R talk to the HDFS and NoSQL (BerkeleyDB) data stores that underpin the Cloudera CDH3 distribution Oracle is putting on its x86 server cluster (similar to but not the same as the Exadata database machine). The set of connectors, including the R connector, costs $2,000 per core used on the Hadoop platform.

Dave Rich, the new CEO at Revolution Analytics who just joined from the analytics unit of Accenture, didn't think the Oracle approach to R would have an adverse impact on his business. "There's plenty of room in the market, and if anything, it helps us," Rich tells El Reg. "It legitimizes R as enterprise-class, and raises all ships."

Rich added that many customers are leery of becoming a one-vendor shop and want alternatives. Oracle would argue just the opposite, as its engineered systems are designed to work best with an Oracle stack tuned to work better together than any alternatives that might plug into the stack.

Oracle, says Rich, had to add R functionality because IBM's Netezza and Teradata's eponymous appliances have it, and there is still a possibility that Oracle could partner with Revolution Analytics, much as it has with Cloudera for its Hadoop distro. ®

Choosing a cloud hosting partner with confidence

More from The Register

next story
Euro Parliament VOTES to BREAK UP GOOGLE. Er, OK then
It CANNA do it, captain.They DON'T have the POWER!
Download alert: Nearly ALL top 100 Android, iOS paid apps hacked
Attack of the Clones? Yeah, but much, much scarier – report
NSA SOURCE CODE LEAK: Information slurp tools to appear online
Now you can run your own intelligence agency
Post-Microsoft, post-PC programming: The portable REVOLUTION
Code jockeys: count up and grab your fabulous tablets
Twitter App Graph exposes smartphone spyware feature
You don't want everyone to compile app lists from your fondleware? BAD LUCK
Microsoft adds video offering to Office 365. Oh NOES, you'll need Adobe Flash
Lovely presentations... but not on your Flash-hating mobe
prev story

Whitepapers

Driving business with continuous operational intelligence
Introducing an innovative approach offered by ExtraHop for producing continuous operational intelligence.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
How to determine if cloud backup is right for your servers
Two key factors, technical feasibility and TCO economics, that backup and IT operations managers should consider when assessing cloud backup.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Internet Security Threat Report 2014
An overview and analysis of the year in global threat activity: identify, analyze, and provide commentary on emerging trends in the dynamic threat landscape.