SGI munches MarkLogic database, hatches Dataraptor appliance

Having big data for lunch

Internet Security Threat Report 2014

Silicon Graphics knows a thing or two about handling huge amounts of data, but it doesn't have its own database or NoSQL data-store software. It needs to partner to be able to feast on the big-data carcass, and its latest partnership is with NoSQL database maker MarkLogic to create a big-data appliance called Dataraptor.

Not every big-data job out there can be handled by the batch-oriented Hadoop data muncher, which SGI has been peddling on top of its energy-efficient, dense-packed Rackable servers for the past year under a partnership with commercial Hadoop distie Cloudera.

Similarly, a big fat SQL database doesn't do all jobs either, although SGI has had some success pushing Microsoft's SQL Server relational database on top of its "big brain" UV 2 supercomputers, because this shared-memory system can push Windows Server and SQL Server to its scalability limits, with room to spare.

To make another big-data appliance focused on unstructured data and with more real-time capabilities than Hadoop offers, SGI had plenty of choices. There's MongoDB from 10gen, BerkeleyDB from Oracle, Cassandra from DataStax, Riak from Basho, CouchBase from the company of the same name, and the open source Redis. In this case, however, SGI has tapped MarkLogic for its eponymous NoSQL data store because of some of the unique attributes it brings to the big-data party.

MarkLogic was founded in 2001 by Paul Pedersen, a professor of computer science at Cornell University and the University of California Los Angeles, and Christopher Lindblad, chief architect at search engine Infoseek. The company has raised $45.5m in four rounds of funding from the now-defunct Lehman Brothers, plus Sequoia Capital and Tenaya Capital. The company has more than 400 customers using its database on various big-data jobs, and has over 250 employees.

A rack of Dataraptor

A rack of Dataraptor

The MarkLogic Server, as it used to be called, is an XML-based database that has search functions built into its DNA, and also a shared-nothing architecture (like most NoSQL data stores) so it can be scaled far and fast. Interestingly, MarkLogic is architected to adhere to the atomicity, consistency, isolation, and durability – the so called ACID tests – that relational databases and their online transaction processing systems require and that many NoSQL data stores do not entirely support.

The latest MarkLogic 6 data store has tight integration with Hadoop, and you can have Hadoop pre-chew data before it gets dumped into MarkLogic through a connector for further searches and queries, or you can slide MarkLogic underneath Hadoop, replacing the Hadoop Distributed File System.

The Dataraptor appliance that SGI and MarkLogic have hatched is, like SGI's Hadoop clusters, based on the full-depth Rackable server nodes, not the funky half-depth nodes that made Rackable Systems famous before it bought Silicon Graphics and took its name.

SGI is making two distinct kinds of rack configurations available, one aimed at performance, using two dozen 2.5-inch 15K RPM disks, and one aimed at capacity, using a dozen 3.5-inch SATA drives spinning at 7.2K RPM. The rack has 21 nodes in it, with each server having 16 Xeon E5 cores across its two sockets and 128GB of main memory, for a total of 336 Xeon E5 cores and 2.6TB of main memory to chew on data.

The performance-tuned configuration of the Dataraptor has 300TB of formatted user capacity and is based on 600GB SAS drives; it can handle 144,000 disk I/O operations per second and deliver 47GB/sec of bandwidth off the disks (on uncompressed data). It has four flash drives for storing a Linux operating system and speeding up data accesses on the node, and you can choose sizes from 80GB to 200GB.

Dataraptor server nodes

Dataraptor server nodes

The capacity configuration using fatter 1TB SATA drives has a total of 504TB of usable formatted capacity per rack, and delivers 32,500 IOPS and bandwidth off the disks at 26GB/sec. The capacity configuration has two flash drives for the OS.

You link your servers into the Dataraptor appliance through 10GbE links, and the whole shebang is cross-coupled to run the MarkLogic database using two 48-port 10GbE switches.

SGI has inked an OEM agreement with MarkLogic to peddle its eponymous database, and will be providing the Dataraptor under a single SKU with full support coming from SGI and MarkLogic backstopping on issues relating to its software, much as SGI is doing in its relationship with Cloudera for Hadoop clusters.

The Dataraptor racks are assembled in SGI's factory in Chippewa Falls, Wisconsin, and you can order them in full racks (21 nodes), half rack (10 nodes), and quarter rack (5 nodes) configurations

SGI is tossing its Foundation extensions for Linux on each node, and its SGI Management Center cluster-management tools run on a separate 1U server in the rack to control and monitor the database nodes.

SGI is taking orders for the Dataraptor appliance now, and will start shipping on October 22. Pricing for the appliance was not divulged, since MarkLogic does not providing public pricing for its products. ®

Internet Security Threat Report 2014

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
IBM storage revenues sink: 'We are disappointed,' says CEO
Time to put the storage biz up for sale?
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
prev story


Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.