IBM takes on Oracle with PureData appliances
'Watch out, Larry, here we come'
Big Blue is getting sick of Larry Ellison taking up all of the oxygen in the data center when it comes to appliance servers tuned for specific workloads, and so it is expanding its line of PureSystems preconfigured machines with a family of boxes called PureData that take on some of the same work that Oracle is chasing with its Exadata parallel database engines.
You might need a decoder ring to keep track of the new product line that IBM started rolling out in April of this year, so let El Reg help you out.
The initial PureSystems machines, developed under the code-name "Project Troy" came in two flavors. The PureFlex system  is raw infrastructure comprised of modular servers jammed horizontally into a 10U rack chassis with integrated switching and storage and management software for keeping it all monitored, patched and humming along.
The PureApplication system  takes this converged hardware as a foundation and adds cloudy software and application deployment and management smarts to it so third party and homegrown applications can be deployed more quickly. In both cases, the PureFlex and PureApplication iron is sold in a number of different configurations, all racked up and ready to go so customers don't have to do integration.
With the "Project Sparta" machines announced today at a big data event in Singapore, IBM is introducing the PureData brand and it is not, as you might expect, limiting itself to using only the new converged PureFlex infrastructure. This seems a bit odd to El Reg, but there you have it.
Two of the three new PureData appliances are, in fact, based on existing BladeCenter blade or Power Systems rack servers. But it stands to reason that eventually these machines will be brought into the PureFlex fold, given all of the management costs that Big Blue is touting for the new PureSystems iron relative to traditional rack servers with external switching and software.
The important thing is that the PureData boxes are integrated, use the same system management software as other PureSystems machines, and are sold by the rack, all assembled, with a single price for the whole shebang – hardware and software.
That is something that Oracle is not doing with its Exadata and Exalogic machines, which have operating systems but are missing database or middleware licenses. To Oracle's credit, pricing for all of its wares is out there, including how it discounts at volume, and it is utterly transparent in a way that most IT vendors are not.
In fact, Oracle's co-founder and CEO Larry Ellison has said more than once that he wants Oracle to be like the IBM of the mainframe era, in terms of catering completely to IT departments, and Oracle's transparency about pricing has come about without Oracle having to be sued for antitrust violations.
IBM would do well to return to its past, which it abandoned after it wiggled out of its consent decree with the US government, and provided absolute transparency on configurations and pricing. People don't trust vendors who don't supply list prices, which are a ceiling even if they are not a floor.
There are three new PureData Systems coming out of Big Blue today: The PureData System for Transactions, the PureData System for Analytics, and the PureData System for Operational Analytics.
These names, like so many used in the IT racket, are absolutely forgettable and too long. Again, credit to Oracle. After two years, most people who follow systems know the difference between Exadata, Exalogic, and Exalytics appliances, and the names even tell you it is the x86-based server clusters Oracle is peddling.
It would have been sufficient for IBM to just focus on PureData as a brand, since it implies database, and then use a product number (as it has in the formal catalog) to signal if it is a transactional or analytical box.
The word "system" is absolutely redundant. Of course it is a system. Otherwise we would just be talking about a naked server without switching, storage, and other software goodies all built in. These may seem like small things, but if you can't keep track of what IBM or Citrix Systems or VMware or Microsoft are selling, do you think their sales force can?
It is always easy to pick on IT naming conventions. The real issues with the PureData appliances are simple. They have to be easier to install and use and provide better value for the dollar than the existing machinery from Big Blue and alternatives from competitors. This is not immediately obvious, but that is clearly what the goal is.
And Nancy Kopp, director of big data strategy at IBM, is not shy about taking the fight right to Ellison. "This is going to be our Exadata killer in the marketplace, and we are going after Oracle with this one," she tells El Reg with a laugh. "Watch out, Larry, here we come."
IBM's PureData parallel database
appliance for OLTP
The PureData T1500, as El Reg will call it, is aimed at online transaction processing. It bears some resemblance to IBM's prior Exadata killers, which were initially based on Power6-based Power 550 servers running AIX  back in the fall of 2009, when Oracle announced its first Exadata V2 systems based on Sun Microsystems iron.
IBM upgraded the PureScale appliances, which are based on a parallel implementation of IBM's DB2 database, to the PureScale Application System with Power7 nodes  in early 2010. This box added the WebSphere application server and was kind of a rough pass on what has become the PureApplication System.
And in September last year, IBM cooked up yet another variant of the PureScale DB2 cluster, still based on Power iron and AIX, called WebSphere Transaction Cluster Facility , for very intensive transaction processing environments.
With the PureData T1500, IBM is using its new PureFlex iron and in particular is shifting the PureScale DB2 clustering from Power to Xeon E5 processors from Intel. IBM is also moving away from InfiniBand switching and to 10 Gigabit Ethernet switches and adapters that are equipped with Remote Direct Memory Access over Converged Ethernet, or RoCE as it is called. RDMA on InfiniBand and RoCE on Ethernet is important because it is how the PureScale database cluster keeps itself synchronized over the network linking server nodes.
IBM is loading up the Flex chassis with x240 server nodes, which are two-socket servers, equipped with eight-core E5-2670 processors running at 2.6GHz. Each node has a two-port 10GE adapter for cross-coupling database nodes and a two-port 8Gb/sec Fibre Channel adapter to link out to storage. A full rack PureData T1500 configuration has two Flex System enclosures and 24 x240 server nodes, plus two 48-port IBM RackSwitch G8264 switches.
The PureData T1500 setup has 384 x86 cores, 6.2TB of main memory, plus four Storwize V7000 disk arrays with four expansion units. These arrays have a total of 19.2TB of flash-based SDDs and 128TB of disk capacity, which yields 74.4TB of user capacity for the PureScale DB2 clustered database.
As for software, the PureData T1500 runs Red Hat Enterprise Linux 6 and uses the Storwize Easy Tier automatic tiering to move data from SSDs to disk as it gets cold and back to SSDs when it gets hot. The cluster runs DB2 10.1 Enterprise Server Edition with the PureScale clustering, and InfoSphere Optim Query Workload Tuner 3, PureQuery Runtime 3, Optim Performance Manager 5, Optim Configuration Manager 2, and Tivoli Storage Manager Client are also loaded up on the boxes. InfoSphere Data Architect 8 and Data Studio 3 development tools are add-ons not included in the base system.
The PureData T1500 comes in three sizes: a full rack, a half rack, and a quarter rack. And because IBM has long-since licensed the Oracle database skinning features from PostgreSQL vendor EnterpriseDB, IBM says  (PDF) that the OLTP appliance "supports Oracle Database applications with minimal changes."
The quarter-rack configuration of the PureData T1500 costs less than $500,000, according to IBM, and that includes all of the database and systems software mentioned above. When Oracle quotes Exadata prices, that is literally for the iron and the Oracle Linux operating system, not including the Oracle database or the Exadata storage server software licenses. So while a rack of Exadata X2-2 iron (as El Reg detailed last year ) costs $1.1m, add in the database and Exadata software costs, and you are up to $4.47m per rack. IBM's rack is weighing in at under $2m.
We have no idea how the new Exadata X3-2 system, announced last week , performs against the PureData T1500 on the same OLTP work, and that is the next set of questions that needs to be answered once the PureData T1500 is in the field. The PureData T1500 will ship on October 26.
Rebadging Netezza warehouses and Smartie boxes
The PureData brand is also being slapped on some existing appliances that IBM has crafted for running analytics workloads, called Smart Analytics Systems or Smarties here at El Reg, as well as on the data warehouse appliances based on IBM blade servers that Big Blue got its hands on when it acquired Netezza  back in September 2010.
The rebadged Smartie box is a cluster based on IBM's Power7-based Power 740 servers running AIX and DB2 10, InfoSphere Warehouse 10, and Cognos 8 analytics, while the rebadged Netezza box still uses the TwinFin architecture that marries x86 blade server with blades with special FPGA's designed to compress and decompress data on storage arrays and pre-chew it for database servers running on those x86 blades. (This is similar to what Oracle has done with the Exadata machines.)
You will recognize the PureData System for Operational Analytics A1791 as the Smartie 7700 , and the PureData System for Analytics N1001 as the Netezza 1000 . PureData A1791 and PureData N1001 are reasonably short names we might remember. "The first is good at doing thousands of queries per second," explains Kopp, "while the latter is good at doing big and complex queries in seconds."
These two rebadged appliances will be available on October 26 as well, including integration with the Flex System Manager and the "expert integration" templating system that is part of the PureApplication systems. And with that funky new rack that the PureData T1500 comes in, too.
IBM did not launch a PureData machine expressly to run its BigInsights variant of the Hadoop data muncher, but obviously it could whip up such a product in pretty short order. ®