Oracle rolls its own NoSQL and Hadoop

A supremely confident Ellison mounts the Big Data elephant

Boost IT visibility and business value

OpenWorld There's no shortage of ego at Oracle, as evidenced by the effusion of confidence behind the company's OpenWorld announcement of the not-so-humbly named Big Data Appliance.

And then there were the o'erweening keynote presentations by some of the software giant and systems player's top brass on Monday, which included a montage of what Oracle has done so far this year, and hints at things ahead, and which ended with co-president and CFO Safra Catz intoning: "We are big data. And we're also the cloud."

Well, with Oracle owning big chunks of the database, middleware, application, and operating system markets, Catz's pronouncement settles any doubt about the future of information technology and Oracle's place in it. Just send your checks and surrender terms to Larry Ellison care of Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065.

But before Oracle swallows the entire IT market, it has to prove that it actually is big data. Or, more precisely, that its engineered systems can do the kind of MapReduce work that enterprises are increasingly using to cope with their unstructured data.

And so Oracle is creating yet another engineered system to put in its arsenal of things for its direct sales force to sell: the Big Data Appliance.

Although Thomas Kurian, executive vice president of product development, announced the Big Data Appliance as part of his keynote in the wake of some brief peppy talk by co-president Mark Hurd, details are a little sparse. Conceptually, here's what the Big Data Appliance looks like:

Oracle Big Data Appliance

Block diagram of Oracle's Big Daddy, er Data, Appliance

The underlying hardware for the Big Data Appliance is Oracle's Exadata x86 clusters, which support a parallel implementation of the Oracle 11g R2 database running on top of Oracle's RHEL-ish clone of Linux. Oracle Enterprise Linux and Oracle's twist on the open source Xen hypervisor are the appliance's underlying layer.

Oracle is grabbing the open source Hadoop MapReduce tool from the Apache Software Foundation and doing its own distribution for this specific machine; Hadoop is a Java program and runs atop Oracle's own JVMs in the Exadata.

Kurian did not say if Oracle was using the HDFS file system that is normally paired with Hadoop. He had very little to say about the Oracle NoSQL Database, which is a distributed key-value store, although Oracle's announcement predictably says that it is "easy to install, configure and manage, supports a broad set of workloads, and delivers enterprise-class reliability backed by enterprise-class Oracle support."

The announcement, however, neglects to explain, as did Kurian, exactly what Oracle's NoSQL Database is. The company's sparse page outlining the NoSQL Database says that it "scales horizontally to hundreds of nodes with high availability and transparent load balancing."

Only two weeks ago, in a conference call with Wall Street analysts going over Oracle's first quarter financial results for fiscal 2012, Ellison, the company's cofounder and CEO, didn't seem to think that Oracle needed any other kind of database for dealing with unstructured data.

"Oracle has always stored both structured and unstructured data. This is really nothing new. We are constantly adding features to our database to support the storage and searching of unstructured as well as structured data. Autonomy was a shock to us," he said at the time, referring to the company that HP is in the process of buying for $10.3bn for its expertise in coping with unstructured data.

"We looked at the price and thought it was absurdly high," Ellison continued. "We had no interest in making the Autonomy acquisition. We think we're much better off with a couple of smaller acquisitions and continuing to innovate in that area so that the unstructured data and the structured data both find their way into an Oracle database, where it's secure, it's scalable, it runs on Exadata. We think we really don't want to have two separate databases."

Listen, my children, and you shall hear...

Ellison then gave a brief history of the database business, explaining that first there were relational databases, then object relational databases, and now we need to cope with unstructured data.

He neglected to point out, however, that Oracle has a collection of databases that rivals those from IBM (in number, if not in functionality), including the Essbase and TimesTen databases that are now at the heart of its new Exalytics BI appliance, announced yesterday, plus MySQL, Berkeley DB, and rdb from the Alphas and VAXes.

Oracle is not adverse to adding a NoSQL database to the collection, but Ellison sure gave the impression that what the company wanted to do was keep everything inside of the Oracle database – by which he meant 11g R2. As it turns out, Oracle's NoSQL is based on the Berkeley DB key/value database, Oracle confirmed separately today.

The Big Data Appliance stack also includes the Oracle Data Integrator Application Adapter for Hadoop, which links the NoSQL database and the Oracle database to applications, and Oracle Loader for Hadoop, which transforms datasets created by MapReduce-crunching into formats that are native to Oracle databases so they can be sucked into 11g R2.

Although the chart above doesn't show it, the Big Data Appliance also includes the R programming language, a popular open source statistical-analysis tool. This R engine will integrate with 11g R2, so presumably if you want to do statistical analysis on unstructured data stored in and chewed by Hadoop, you will have to move it to Oracle after the chewing has subsided.

This approach to R-Hadoop integration is different from that announced last week between Revolution Analytics, the so-called Red Hat for stats that is extending and commercializing the R language and its engine, and Cloudera, which sells a commercial Hadoop setup called CDH3 and which was one of the early companies to offer support for Hadoop. Both Revolution Analytics and Cloudera now have Oracle as their competitor, which was no doubt no surprise to either.

In any event, the way they do it, the R engine is put on each node in the Hadoop cluster, and those R engines just see the Hadoop data as a native format that they can do analysis on individually. As statisticians do analyses on data sets, the summary data from all the nodes in the Hadoop cluster is sent back to their R workstations; they have no idea that they are using MapReduce on unstructured data.

Oracle did not supply configuration and pricing information for the Big Data Appliance, and also did not say when it would be for sale or shipping to customers. The company did say that it would sell the individual software components in the appliance separately, as it does for other elements of its engineered systems. ®

The essential guide to IT transformation

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
Docker kicks KVM's butt in IBM tests
Big Blue finds containers are speedy, but may not have much room to improve
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Gartner's Special Report: Should you believe the hype?
Enough hot air to carry a balloon to the Moon
Flash could be CHEAPER than SAS DISK? Come off it, NetApp
Stats analysis reckons we'll hit that point in just three years
Dell The Man shrieks: 'We've got a Bitcoin order, we've got a Bitcoin order'
$50k of PowerEdge servers? That'll be 85 coins in digi-dosh
prev story


5 things you didn’t know about cloud backup
IT departments are embracing cloud backup, but there’s a lot you need to know before choosing a service provider. Learn all the critical things you need to know.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.