Oracle upgrades Big Data Appliance with Xeon E5s
Berkeley DB NoSQL database gets 2.0 rev
Database giant Oracle is trying to keep the myriad NoSQL and alternative data stores and big data munchers like Hadoop at bay by commercializing and integrating a bunch of proprietary and open source software onto preconfigured x86-based servers that it sells in appliance fashion. Oracle has not talked about how well or poorly these machines are selling, but the company has upgraded the underlying iron in the machines and the NoSQL database that is at the heart of the software stack – and that's a good indication that Oracle thinks the Big Data Appliance is worth continued investment.
The Big Data Appliance was previewed back in October 2011  at the OpenWorld extravaganza, and began shipping in January of this year. It is a complement to the Exadata database cluster, the Exalogic application server cluster, and the Exalytics in-memory appliance. All of these are so-called "engineered systems," by which Oracle means systems that are tuned up to run very specific workloads.
All of Oracle's appliances come preconfigured with a single hardware price. These systems may have a specific software stack on them, but the base price does not generally cover that stack, and – perhaps more importantly and not at all unexpectedly – the software is often two or three times as expensive as the underlying servers, storage, and switching that makes up the appliance. But in the case of the Big Data Appliance, the base software stack is bundled into the price.
The Big Data Appliance cluster is essentially a Hadoop big data muncher that uses Oracle's own Berkeley DB NoSQL data store underneath the CDH3 Hadoop distribution from Cloudera . The server nodes run Oracle's own riff on Red Hat Enterprise Linux, and have data loading and integration tools to move information into and out of Oracle's 11g R2 database. Oracle has also integrated the open source R statistical programming language and runtime in the software, so each node in the cluster can run R as well as MapReduce data-munching routines.
The first-generation Big Data Appliance had eighteen two-socket Xeon 5600 servers in a rack, with a total of 216 cores, 864GB of main memory, and 648TB of disk capacity across those nodes. Each node had two Xeon X5675 processors running at 3.06GHz with 48GB (that's 4GB per core in the node), and a dozen 3TB 7.2K RPM SAS disk drives in 3.5-inch form factors. One 36-port InfiniBand switch running at the QDR (40Gb/sec) speed linked the server nodes to each other in the cluster, and there are two other switches with eight 10 Gigabit Ethernet ports and 32 InfiniBand QDR ports for linking the rack-based system to other Exa systems from Oracle, more Big Data Appliance racks, and to the outside world.
Oracle's next-gen Big Data Appliance
As Hadoop and NoSQL clusters grow, you link multiple racks together using that spare InfiniBand switch capacity, and the Oracle NoSQL data store and Cloudera Hadoop software scales across those additional nodes. The switches that Oracle has chosen (developed in conjunction with partner Mellanox Technologies) allow for up to eighteen racks, or a total of 324 nodes, to be linked together in a flat, non-blocking InfiniBand fabric. If you want to go bigger than that, you will need more and larger switches.
The Big Data Appliance comes preconfigured with the freebie NoSQL Community Edition, but if you want all the bells and whistles you can also use the NoSQL Enterprise Edition. You can also run the Hadoop Distributed File System on the cluster for storing certain kinds of unstructured data – HDFS and Oracle NoSQL are not mutually exclusive.
The first-gen Big Data Appliance cost $450,000 per rack, including a "lifetime OEM license" to Cloudera's CDH3 Hadoop distribution; a premier support contract for the stack costs $54,000 per year. That works out to around $28,000 per node, which is not too shabby if you can get customers to pay it.
With the Big Data Appliance X3-2 announced on Monday, both the hardware and the software are being gussied up, but the price is holding steady at $450,000 per rack.
Oracle is moving to server nodes based on the most recent Xeon E5 processors from Intel. Specifically, Oracle's two-socket nodes now employ the Xeon E5-2660 processors, which spin at 2.2GHz. Oracle's announcement  says this provides 33 per cent more processing power, but this may not be strictly true.
Yes, moving from six-core to eight-core processors gives you 33 per cent more cores, but the cores' clock speeds also run 28 per cent slower. If you look at SPECint2006 CPU tests for machines using the X5675 and E5-2660 processors, the latter delivers about 10 per cent more oomph. It's hard to say how this translates into more NoSQL or Hadoop work, but generally speaking larger cache and main memory as well as more threads helps for these kinds of workloads.
Oracle is fattening up the main memory in the rack by 33 per cent to 1.1TB, at 64GB per node, which should help boost performance. The new Xeon E5 iron takes 30 per cent less power and cooling than the Xeon 5600-based machines, as well, so you get other benefits from going with the newer iron. The number and capacity of the disk drives remains the same at 648TB per rack.
On the software side, the Big Data Appliance X3-2 appliance includes the latest Oracle Enterprise Linux 5.8 with Oracle's homegrown Unbreakable Linux kernel and its own upgraded Hotspot Java virtual machine for running Java. (Hadoop is written in Java, so this matters.) Oracle is also stepping up to the latest CDH4 Hadoop distro  from Cloudera, which came out in June.
Oracle has also rolled up a 2.0 release of its NoSQL database, which now sports an API for C programs, support for JSON to pour documents into the data store, and another API for managing large objects stored in NoSQL format. There is also an external table link in the software so that SQL queries running in Oracle's 11g relational database can view and query records in the NoSQL database from inside of 11g. The open source R distro has also been revved, and Oracle Enterprise Manager now has a plug-in so it can control-freak the Big Data Appliance stack. ®