SGI munches MarkLogic database, hatches Dataraptor appliance

Having big data for lunch

Choosing a cloud hosting partner with confidence

Silicon Graphics knows a thing or two about handling huge amounts of data, but it doesn't have its own database or NoSQL data-store software. It needs to partner to be able to feast on the big-data carcass, and its latest partnership is with NoSQL database maker MarkLogic to create a big-data appliance called Dataraptor.

Not every big-data job out there can be handled by the batch-oriented Hadoop data muncher, which SGI has been peddling on top of its energy-efficient, dense-packed Rackable servers for the past year under a partnership with commercial Hadoop distie Cloudera.

Similarly, a big fat SQL database doesn't do all jobs either, although SGI has had some success pushing Microsoft's SQL Server relational database on top of its "big brain" UV 2 supercomputers, because this shared-memory system can push Windows Server and SQL Server to its scalability limits, with room to spare.

To make another big-data appliance focused on unstructured data and with more real-time capabilities than Hadoop offers, SGI had plenty of choices. There's MongoDB from 10gen, BerkeleyDB from Oracle, Cassandra from DataStax, Riak from Basho, CouchBase from the company of the same name, and the open source Redis. In this case, however, SGI has tapped MarkLogic for its eponymous NoSQL data store because of some of the unique attributes it brings to the big-data party.

MarkLogic was founded in 2001 by Paul Pedersen, a professor of computer science at Cornell University and the University of California Los Angeles, and Christopher Lindblad, chief architect at search engine Infoseek. The company has raised $45.5m in four rounds of funding from the now-defunct Lehman Brothers, plus Sequoia Capital and Tenaya Capital. The company has more than 400 customers using its database on various big-data jobs, and has over 250 employees.

A rack of Dataraptor

A rack of Dataraptor

The MarkLogic Server, as it used to be called, is an XML-based database that has search functions built into its DNA, and also a shared-nothing architecture (like most NoSQL data stores) so it can be scaled far and fast. Interestingly, MarkLogic is architected to adhere to the atomicity, consistency, isolation, and durability – the so called ACID tests – that relational databases and their online transaction processing systems require and that many NoSQL data stores do not entirely support.

The latest MarkLogic 6 data store has tight integration with Hadoop, and you can have Hadoop pre-chew data before it gets dumped into MarkLogic through a connector for further searches and queries, or you can slide MarkLogic underneath Hadoop, replacing the Hadoop Distributed File System.

The Dataraptor appliance that SGI and MarkLogic have hatched is, like SGI's Hadoop clusters, based on the full-depth Rackable server nodes, not the funky half-depth nodes that made Rackable Systems famous before it bought Silicon Graphics and took its name.

SGI is making two distinct kinds of rack configurations available, one aimed at performance, using two dozen 2.5-inch 15K RPM disks, and one aimed at capacity, using a dozen 3.5-inch SATA drives spinning at 7.2K RPM. The rack has 21 nodes in it, with each server having 16 Xeon E5 cores across its two sockets and 128GB of main memory, for a total of 336 Xeon E5 cores and 2.6TB of main memory to chew on data.

The performance-tuned configuration of the Dataraptor has 300TB of formatted user capacity and is based on 600GB SAS drives; it can handle 144,000 disk I/O operations per second and deliver 47GB/sec of bandwidth off the disks (on uncompressed data). It has four flash drives for storing a Linux operating system and speeding up data accesses on the node, and you can choose sizes from 80GB to 200GB.

Dataraptor server nodes

Dataraptor server nodes

The capacity configuration using fatter 1TB SATA drives has a total of 504TB of usable formatted capacity per rack, and delivers 32,500 IOPS and bandwidth off the disks at 26GB/sec. The capacity configuration has two flash drives for the OS.

You link your servers into the Dataraptor appliance through 10GbE links, and the whole shebang is cross-coupled to run the MarkLogic database using two 48-port 10GbE switches.

SGI has inked an OEM agreement with MarkLogic to peddle its eponymous database, and will be providing the Dataraptor under a single SKU with full support coming from SGI and MarkLogic backstopping on issues relating to its software, much as SGI is doing in its relationship with Cloudera for Hadoop clusters.

The Dataraptor racks are assembled in SGI's factory in Chippewa Falls, Wisconsin, and you can order them in full racks (21 nodes), half rack (10 nodes), and quarter rack (5 nodes) configurations

SGI is tossing its Foundation extensions for Linux on each node, and its SGI Management Center cluster-management tools run on a separate 1U server in the rack to control and monitor the database nodes.

SGI is taking orders for the Dataraptor appliance now, and will start shipping on October 22. Pricing for the appliance was not divulged, since MarkLogic does not providing public pricing for its products. ®

Remote control for virtualized desktops

More from The Register

next story
Just don't blame Bono! Apple iTunes music sales PLUMMET
Cupertino revenue hit by cheapo downloads, says report
The DRUGSTORES DON'T WORK, CVS makes IT WORSE ... for Apple Pay
Goog Wallet apparently also spurned in NFC lockdown
Desktop Linux users beware: the boss thinks you need to be managed
VMware reveals VDI for Linux desktops plan, plus China lab to do the development
IBM, backing away from hardware? NEVER!
Don't be so sure, so-surers
Hey - who wants 4.8 TERABYTES almost AS FAST AS MEMORY?
China's Memblaze says they've got it in PCIe. Yow
Microsoft brings the CLOUD that GOES ON FOREVER
Sky's the limit with unrestricted space in the cloud
This time it's SO REAL: Overcoming the open-source orgasm myth with TODO
If the web giants need it to work, hey, maybe it'll work
'ANYTHING BUT STABLE' Netflix suffers BIG Europe-wide outage
Friday night LIVE? Nope. The only thing streaming are tears down my face
Google roolz! Nest buys Revolv, KILLS new sales of home hub
Take my temperature, I'm feeling a little bit dizzy
Storage array giants can use Azure to evacuate their back ends
Site Recovery can help to move snapshots around
prev story


Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
Protecting users from Firesheep and other Sidejacking attacks with SSL
Discussing the vulnerabilities inherent in Wi-Fi networks, and how using TLS/SSL for your entire site will assure security.