SGI munches MarkLogic database, hatches Dataraptor appliance

Having big data for lunch

Application security programs and practises

Silicon Graphics knows a thing or two about handling huge amounts of data, but it doesn't have its own database or NoSQL data-store software. It needs to partner to be able to feast on the big-data carcass, and its latest partnership is with NoSQL database maker MarkLogic to create a big-data appliance called Dataraptor.

Not every big-data job out there can be handled by the batch-oriented Hadoop data muncher, which SGI has been peddling on top of its energy-efficient, dense-packed Rackable servers for the past year under a partnership with commercial Hadoop distie Cloudera.

Similarly, a big fat SQL database doesn't do all jobs either, although SGI has had some success pushing Microsoft's SQL Server relational database on top of its "big brain" UV 2 supercomputers, because this shared-memory system can push Windows Server and SQL Server to its scalability limits, with room to spare.

To make another big-data appliance focused on unstructured data and with more real-time capabilities than Hadoop offers, SGI had plenty of choices. There's MongoDB from 10gen, BerkeleyDB from Oracle, Cassandra from DataStax, Riak from Basho, CouchBase from the company of the same name, and the open source Redis. In this case, however, SGI has tapped MarkLogic for its eponymous NoSQL data store because of some of the unique attributes it brings to the big-data party.

MarkLogic was founded in 2001 by Paul Pedersen, a professor of computer science at Cornell University and the University of California Los Angeles, and Christopher Lindblad, chief architect at search engine Infoseek. The company has raised $45.5m in four rounds of funding from the now-defunct Lehman Brothers, plus Sequoia Capital and Tenaya Capital. The company has more than 400 customers using its database on various big-data jobs, and has over 250 employees.

A rack of Dataraptor

A rack of Dataraptor

The MarkLogic Server, as it used to be called, is an XML-based database that has search functions built into its DNA, and also a shared-nothing architecture (like most NoSQL data stores) so it can be scaled far and fast. Interestingly, MarkLogic is architected to adhere to the atomicity, consistency, isolation, and durability – the so called ACID tests – that relational databases and their online transaction processing systems require and that many NoSQL data stores do not entirely support.

The latest MarkLogic 6 data store has tight integration with Hadoop, and you can have Hadoop pre-chew data before it gets dumped into MarkLogic through a connector for further searches and queries, or you can slide MarkLogic underneath Hadoop, replacing the Hadoop Distributed File System.

The Dataraptor appliance that SGI and MarkLogic have hatched is, like SGI's Hadoop clusters, based on the full-depth Rackable server nodes, not the funky half-depth nodes that made Rackable Systems famous before it bought Silicon Graphics and took its name.

SGI is making two distinct kinds of rack configurations available, one aimed at performance, using two dozen 2.5-inch 15K RPM disks, and one aimed at capacity, using a dozen 3.5-inch SATA drives spinning at 7.2K RPM. The rack has 21 nodes in it, with each server having 16 Xeon E5 cores across its two sockets and 128GB of main memory, for a total of 336 Xeon E5 cores and 2.6TB of main memory to chew on data.

The performance-tuned configuration of the Dataraptor has 300TB of formatted user capacity and is based on 600GB SAS drives; it can handle 144,000 disk I/O operations per second and deliver 47GB/sec of bandwidth off the disks (on uncompressed data). It has four flash drives for storing a Linux operating system and speeding up data accesses on the node, and you can choose sizes from 80GB to 200GB.

Dataraptor server nodes

Dataraptor server nodes

The capacity configuration using fatter 1TB SATA drives has a total of 504TB of usable formatted capacity per rack, and delivers 32,500 IOPS and bandwidth off the disks at 26GB/sec. The capacity configuration has two flash drives for the OS.

You link your servers into the Dataraptor appliance through 10GbE links, and the whole shebang is cross-coupled to run the MarkLogic database using two 48-port 10GbE switches.

SGI has inked an OEM agreement with MarkLogic to peddle its eponymous database, and will be providing the Dataraptor under a single SKU with full support coming from SGI and MarkLogic backstopping on issues relating to its software, much as SGI is doing in its relationship with Cloudera for Hadoop clusters.

The Dataraptor racks are assembled in SGI's factory in Chippewa Falls, Wisconsin, and you can order them in full racks (21 nodes), half rack (10 nodes), and quarter rack (5 nodes) configurations

SGI is tossing its Foundation extensions for Linux on each node, and its SGI Management Center cluster-management tools run on a separate 1U server in the rack to control and monitor the database nodes.

SGI is taking orders for the Dataraptor appliance now, and will start shipping on October 22. Pricing for the appliance was not divulged, since MarkLogic does not providing public pricing for its products. ®

Eight steps to building an HP BladeSystem

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
Amazon Reveals One Weird Trick: A Loss On Almost $20bn In Sales
Investors really hate it: Share price plunge as growth SLOWS in key AWS division
US judge: YES, cops or feds so can slurp an ENTIRE Gmail account
Crooks don't have folders labelled 'drug records', opines NY beak
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
Manic malware Mayhem spreads through Linux, FreeBSD web servers
And how Google could cripple infection rate in a second
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
prev story


Top three mobile application threats
Prevent sensitive data leakage over insecure channels or stolen mobile devices.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.