SGI munches MarkLogic database, hatches Dataraptor appliance

Having big data for lunch

3 Big data security analytics techniques

Silicon Graphics knows a thing or two about handling huge amounts of data, but it doesn't have its own database or NoSQL data-store software. It needs to partner to be able to feast on the big-data carcass, and its latest partnership is with NoSQL database maker MarkLogic to create a big-data appliance called Dataraptor.

Not every big-data job out there can be handled by the batch-oriented Hadoop data muncher, which SGI has been peddling on top of its energy-efficient, dense-packed Rackable servers for the past year under a partnership with commercial Hadoop distie Cloudera.

Similarly, a big fat SQL database doesn't do all jobs either, although SGI has had some success pushing Microsoft's SQL Server relational database on top of its "big brain" UV 2 supercomputers, because this shared-memory system can push Windows Server and SQL Server to its scalability limits, with room to spare.

To make another big-data appliance focused on unstructured data and with more real-time capabilities than Hadoop offers, SGI had plenty of choices. There's MongoDB from 10gen, BerkeleyDB from Oracle, Cassandra from DataStax, Riak from Basho, CouchBase from the company of the same name, and the open source Redis. In this case, however, SGI has tapped MarkLogic for its eponymous NoSQL data store because of some of the unique attributes it brings to the big-data party.

MarkLogic was founded in 2001 by Paul Pedersen, a professor of computer science at Cornell University and the University of California Los Angeles, and Christopher Lindblad, chief architect at search engine Infoseek. The company has raised $45.5m in four rounds of funding from the now-defunct Lehman Brothers, plus Sequoia Capital and Tenaya Capital. The company has more than 400 customers using its database on various big-data jobs, and has over 250 employees.

A rack of Dataraptor

A rack of Dataraptor

The MarkLogic Server, as it used to be called, is an XML-based database that has search functions built into its DNA, and also a shared-nothing architecture (like most NoSQL data stores) so it can be scaled far and fast. Interestingly, MarkLogic is architected to adhere to the atomicity, consistency, isolation, and durability – the so called ACID tests – that relational databases and their online transaction processing systems require and that many NoSQL data stores do not entirely support.

The latest MarkLogic 6 data store has tight integration with Hadoop, and you can have Hadoop pre-chew data before it gets dumped into MarkLogic through a connector for further searches and queries, or you can slide MarkLogic underneath Hadoop, replacing the Hadoop Distributed File System.

The Dataraptor appliance that SGI and MarkLogic have hatched is, like SGI's Hadoop clusters, based on the full-depth Rackable server nodes, not the funky half-depth nodes that made Rackable Systems famous before it bought Silicon Graphics and took its name.

SGI is making two distinct kinds of rack configurations available, one aimed at performance, using two dozen 2.5-inch 15K RPM disks, and one aimed at capacity, using a dozen 3.5-inch SATA drives spinning at 7.2K RPM. The rack has 21 nodes in it, with each server having 16 Xeon E5 cores across its two sockets and 128GB of main memory, for a total of 336 Xeon E5 cores and 2.6TB of main memory to chew on data.

The performance-tuned configuration of the Dataraptor has 300TB of formatted user capacity and is based on 600GB SAS drives; it can handle 144,000 disk I/O operations per second and deliver 47GB/sec of bandwidth off the disks (on uncompressed data). It has four flash drives for storing a Linux operating system and speeding up data accesses on the node, and you can choose sizes from 80GB to 200GB.

Dataraptor server nodes

Dataraptor server nodes

The capacity configuration using fatter 1TB SATA drives has a total of 504TB of usable formatted capacity per rack, and delivers 32,500 IOPS and bandwidth off the disks at 26GB/sec. The capacity configuration has two flash drives for the OS.

You link your servers into the Dataraptor appliance through 10GbE links, and the whole shebang is cross-coupled to run the MarkLogic database using two 48-port 10GbE switches.

SGI has inked an OEM agreement with MarkLogic to peddle its eponymous database, and will be providing the Dataraptor under a single SKU with full support coming from SGI and MarkLogic backstopping on issues relating to its software, much as SGI is doing in its relationship with Cloudera for Hadoop clusters.

The Dataraptor racks are assembled in SGI's factory in Chippewa Falls, Wisconsin, and you can order them in full racks (21 nodes), half rack (10 nodes), and quarter rack (5 nodes) configurations

SGI is tossing its Foundation extensions for Linux on each node, and its SGI Management Center cluster-management tools run on a separate 1U server in the rack to control and monitor the database nodes.

SGI is taking orders for the Dataraptor appliance now, and will start shipping on October 22. Pricing for the appliance was not divulged, since MarkLogic does not providing public pricing for its products. ®

SANS - Survey on application security programs

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Bored with trading oil and gold? Why not flog some CLOUD servers?
Chicago Mercantile Exchange plans cloud spot exchange
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
IT bods: How long does it take YOU to train up on new tech?
I'll leave my arrays to do the hard work, if you don't mind
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
Cisco reps flog Whiptail's Invicta arrays against EMC and Pure
Storage reseller report reveals who's selling what
prev story


Designing a defence for mobile apps
In this whitepaper learn the various considerations for defending mobile applications; from the mobile application architecture itself to the myriad testing technologies needed to properly assess mobile applications risk.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.