Feeds

Oracle tucks R stats language into database

R-acle 11g, Quant Edition

Protecting against web application threats using SSL

Relational database juggernaut Oracle has embedded the R programming language used by more than 2 million statisticians and quants the world over into its 11g relational database. Call it R-acle 11g, Quant Edition.

R, of course, is the open source statistical analysis programming language and is also the name of the runtime engine for that language. R is a bit like the Red Hat for stats, with its main competitors being the closed source analytic tools from SAS Institute and IBM's SPSS unit, among others. The R language was created in 1996 by Ross Ihaka and Robert Gentleman, two stats professors from the University of Auckland in New Zealand.

Nearly two years ago, Revolution Analytics burst on the scene with an effort to commercialize R and its runtime engine, as well as to do proprietary extensions that allowed it to scale across bigger iron than the open source implementation. Since that time, Revolution Analytics has upgraded its Enterprise R so it can read and write data natively in the SAS file format and has parallelized R so it can run on the nodes in a Hadoop cluster, doing statistical analysis on each node's data sets and then reducing them down to a final answer.

Oracle is not doing anything like this, and it certainly is not rolling up its own distribution of R and providing tech support and tweaks to it, as it has done with Red Hat's Enterprise Linux operating system and the Xen hypervisor. That's not saying that Oracle won't ever make its own R-acle distribution someday, or even acquire Revolution Analytics, if it thinks its innovations with R are important enough to want to control.

What Oracle is doing is a bit simpler, and will nonetheless be useful for many Oracle database shops. Advanced Analytics, as the R tools are called, is a new option for the Oracle 11g R2 database.

In the past, Oracle sold a data mining suite as an add-on to its eponymous database, called Oracle Data Mining, for $23,000 per processor core. It had about a dozen data mining routines. The Advanced Analytics add-on that Oracle is now shipping is a superset of this code, and now includes a version of the R programming language and runtime. The is the open source version with no proprietary extensions, George Lumpkin, vice president of product development for data warehousing at Oracle, tells El Reg.

As it turns out, Oracle had already embedded a broad set of statistical algorithms, coded in SQL, inside of the Oracle 11g database. And with the Advanced Analytics add-on, quants working from the R client on their desktops can run their analyses and where possible, an R function will invoke one of these embedded SQL functions to do the same calculations on the data stored in the Oracle database.

For those stat algorithms that can't be invoked with SQL, Oracle has put an "embedded R" engine in the database tier and they run inside of this engine. This engine understands the parallel nature of Oracle RAC and Exadata database clusters and can chew on data across multiple nodes then present summary data back to the quant sitting at an R client console.

"What the statisticians want is to not know the database is there," says Lumpkin. "We are taking the scalability of the database and making it transparent."

Moreover, once you have statistical algorithms coded up in R, any program that runs against the Oracle database can invoke that code and run it as well. All you have to call it, and the R will come running.

R-iding an elephant

The Advanced Analytics add-on for Oracle 11g is not the only R product that Oracle is distributing and supporting. In conjunction with its Big Data Appliance, launched back in October 2010, and more thoroughly fleshed out in January of this year, includes a little something called the R Connector for Hadoop, which has hooks to let R talk to the HDFS and NoSQL (BerkeleyDB) data stores that underpin the Cloudera CDH3 distribution Oracle is putting on its x86 server cluster (similar to but not the same as the Exadata database machine). The set of connectors, including the R connector, costs $2,000 per core used on the Hadoop platform.

Dave Rich, the new CEO at Revolution Analytics who just joined from the analytics unit of Accenture, didn't think the Oracle approach to R would have an adverse impact on his business. "There's plenty of room in the market, and if anything, it helps us," Rich tells El Reg. "It legitimizes R as enterprise-class, and raises all ships."

Rich added that many customers are leery of becoming a one-vendor shop and want alternatives. Oracle would argue just the opposite, as its engineered systems are designed to work best with an Oracle stack tuned to work better together than any alternatives that might plug into the stack.

Oracle, says Rich, had to add R functionality because IBM's Netezza and Teradata's eponymous appliances have it, and there is still a possibility that Oracle could partner with Revolution Analytics, much as it has with Cloudera for its Hadoop distro. ®

The next step in data security

More from The Register

next story
New 'Cosmos' browser surfs the net by TXT alone
No data plan? No WiFi? No worries ... except sluggish download speed
'Windows 9' LEAK: Microsoft's playing catchup with Linux
Multiple desktops and live tiles in restored Start button star in new vids
iOS 8 release: WebGL now runs everywhere. Hurrah for 3D graphics!
HTML 5's pretty neat ... when your browser supports it
Mathematica hits the Web
Wolfram embraces the cloud, promies private cloud cut of its number-cruncher
Google extends app refund window to two hours
You now have 120 minutes to finish that game instead of 15
Intel: Hey, enterprises, drop everything and DO HADOOP
Big Data analytics projected to run on more servers than any other app
Mozilla shutters Labs, tells nobody it's been dead for five months
Staffer's blog reveals all as projects languish on GitHub
SUSE Linux owner Attachmate gobbled by Micro Focus for $2.3bn
Merger will lead to mainframe and COBOL powerhouse
iOS 8 Healthkit gets a bug SO Apple KILLS it. That's real healthcare!
Not fit for purpose on day of launch, says Cupertino
prev story

Whitepapers

Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.
Protecting users from Firesheep and other Sidejacking attacks with SSL
Discussing the vulnerabilities inherent in Wi-Fi networks, and how using TLS/SSL for your entire site will assure security.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.