Feeds

Oracle tucks R stats language into database

R-acle 11g, Quant Edition

Secure remote control for conventional and virtual desktops

Relational database juggernaut Oracle has embedded the R programming language used by more than 2 million statisticians and quants the world over into its 11g relational database. Call it R-acle 11g, Quant Edition.

R, of course, is the open source statistical analysis programming language and is also the name of the runtime engine for that language. R is a bit like the Red Hat for stats, with its main competitors being the closed source analytic tools from SAS Institute and IBM's SPSS unit, among others. The R language was created in 1996 by Ross Ihaka and Robert Gentleman, two stats professors from the University of Auckland in New Zealand.

Nearly two years ago, Revolution Analytics burst on the scene with an effort to commercialize R and its runtime engine, as well as to do proprietary extensions that allowed it to scale across bigger iron than the open source implementation. Since that time, Revolution Analytics has upgraded its Enterprise R so it can read and write data natively in the SAS file format and has parallelized R so it can run on the nodes in a Hadoop cluster, doing statistical analysis on each node's data sets and then reducing them down to a final answer.

Oracle is not doing anything like this, and it certainly is not rolling up its own distribution of R and providing tech support and tweaks to it, as it has done with Red Hat's Enterprise Linux operating system and the Xen hypervisor. That's not saying that Oracle won't ever make its own R-acle distribution someday, or even acquire Revolution Analytics, if it thinks its innovations with R are important enough to want to control.

What Oracle is doing is a bit simpler, and will nonetheless be useful for many Oracle database shops. Advanced Analytics, as the R tools are called, is a new option for the Oracle 11g R2 database.

In the past, Oracle sold a data mining suite as an add-on to its eponymous database, called Oracle Data Mining, for $23,000 per processor core. It had about a dozen data mining routines. The Advanced Analytics add-on that Oracle is now shipping is a superset of this code, and now includes a version of the R programming language and runtime. The is the open source version with no proprietary extensions, George Lumpkin, vice president of product development for data warehousing at Oracle, tells El Reg.

As it turns out, Oracle had already embedded a broad set of statistical algorithms, coded in SQL, inside of the Oracle 11g database. And with the Advanced Analytics add-on, quants working from the R client on their desktops can run their analyses and where possible, an R function will invoke one of these embedded SQL functions to do the same calculations on the data stored in the Oracle database.

For those stat algorithms that can't be invoked with SQL, Oracle has put an "embedded R" engine in the database tier and they run inside of this engine. This engine understands the parallel nature of Oracle RAC and Exadata database clusters and can chew on data across multiple nodes then present summary data back to the quant sitting at an R client console.

"What the statisticians want is to not know the database is there," says Lumpkin. "We are taking the scalability of the database and making it transparent."

Moreover, once you have statistical algorithms coded up in R, any program that runs against the Oracle database can invoke that code and run it as well. All you have to call it, and the R will come running.

R-iding an elephant

The Advanced Analytics add-on for Oracle 11g is not the only R product that Oracle is distributing and supporting. In conjunction with its Big Data Appliance, launched back in October 2010, and more thoroughly fleshed out in January of this year, includes a little something called the R Connector for Hadoop, which has hooks to let R talk to the HDFS and NoSQL (BerkeleyDB) data stores that underpin the Cloudera CDH3 distribution Oracle is putting on its x86 server cluster (similar to but not the same as the Exadata database machine). The set of connectors, including the R connector, costs $2,000 per core used on the Hadoop platform.

Dave Rich, the new CEO at Revolution Analytics who just joined from the analytics unit of Accenture, didn't think the Oracle approach to R would have an adverse impact on his business. "There's plenty of room in the market, and if anything, it helps us," Rich tells El Reg. "It legitimizes R as enterprise-class, and raises all ships."

Rich added that many customers are leery of becoming a one-vendor shop and want alternatives. Oracle would argue just the opposite, as its engineered systems are designed to work best with an Oracle stack tuned to work better together than any alternatives that might plug into the stack.

Oracle, says Rich, had to add R functionality because IBM's Netezza and Teradata's eponymous appliances have it, and there is still a possibility that Oracle could partner with Revolution Analytics, much as it has with Cloudera for its Hadoop distro. ®

Providing a secure and efficient Helpdesk

More from The Register

next story
Microsoft on the Threshold of a new name for Windows next week
Rebranded OS reportedly set to be flung open by Redmond
Business is back, baby! Hasta la VISTA, Win 8... Oh, yeah, Windows 9
Forget touchscreen millennials, Microsoft goes for mouse crowd
SMASH the Bash bug! Apple and Red Hat scramble for patch batches
'Applying multiple security updates is extremely difficult'
Apple: SO sorry for the iOS 8.0.1 UPDATE BUNGLE HORROR
Apple kills 'upgrade'. Hey, Microsoft. You sure you want to be like these guys?
ARM gives Internet of Things a piece of its mind – the Cortex-M7
32-bit core packs some DSP for VIP IoT CPU LOL
Lotus Notes inventor Ozzie invents app to talk to people on your phone
Imagine that. Startup floats with voice collab app for Win iPhone
'Google is NOT the gatekeeper to the web, as some claim'
Plus: 'Pretty sure iOS 8.0.2 will just turn the iPhone into a fax machine'
prev story

Whitepapers

A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Beginner's guide to SSL certificates
De-mystify the technology involved and give you the information you need to make the best decision when considering your online security options.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.