Feeds

SGI munches MarkLogic database, hatches Dataraptor appliance

Having big data for lunch

Designing a Defense for Mobile Applications

Silicon Graphics knows a thing or two about handling huge amounts of data, but it doesn't have its own database or NoSQL data-store software. It needs to partner to be able to feast on the big-data carcass, and its latest partnership is with NoSQL database maker MarkLogic to create a big-data appliance called Dataraptor.

Not every big-data job out there can be handled by the batch-oriented Hadoop data muncher, which SGI has been peddling on top of its energy-efficient, dense-packed Rackable servers for the past year under a partnership with commercial Hadoop distie Cloudera.

Similarly, a big fat SQL database doesn't do all jobs either, although SGI has had some success pushing Microsoft's SQL Server relational database on top of its "big brain" UV 2 supercomputers, because this shared-memory system can push Windows Server and SQL Server to its scalability limits, with room to spare.

To make another big-data appliance focused on unstructured data and with more real-time capabilities than Hadoop offers, SGI had plenty of choices. There's MongoDB from 10gen, BerkeleyDB from Oracle, Cassandra from DataStax, Riak from Basho, CouchBase from the company of the same name, and the open source Redis. In this case, however, SGI has tapped MarkLogic for its eponymous NoSQL data store because of some of the unique attributes it brings to the big-data party.

MarkLogic was founded in 2001 by Paul Pedersen, a professor of computer science at Cornell University and the University of California Los Angeles, and Christopher Lindblad, chief architect at search engine Infoseek. The company has raised $45.5m in four rounds of funding from the now-defunct Lehman Brothers, plus Sequoia Capital and Tenaya Capital. The company has more than 400 customers using its database on various big-data jobs, and has over 250 employees.

A rack of Dataraptor

A rack of Dataraptor

The MarkLogic Server, as it used to be called, is an XML-based database that has search functions built into its DNA, and also a shared-nothing architecture (like most NoSQL data stores) so it can be scaled far and fast. Interestingly, MarkLogic is architected to adhere to the atomicity, consistency, isolation, and durability – the so called ACID tests – that relational databases and their online transaction processing systems require and that many NoSQL data stores do not entirely support.

The latest MarkLogic 6 data store has tight integration with Hadoop, and you can have Hadoop pre-chew data before it gets dumped into MarkLogic through a connector for further searches and queries, or you can slide MarkLogic underneath Hadoop, replacing the Hadoop Distributed File System.

The Dataraptor appliance that SGI and MarkLogic have hatched is, like SGI's Hadoop clusters, based on the full-depth Rackable server nodes, not the funky half-depth nodes that made Rackable Systems famous before it bought Silicon Graphics and took its name.

SGI is making two distinct kinds of rack configurations available, one aimed at performance, using two dozen 2.5-inch 15K RPM disks, and one aimed at capacity, using a dozen 3.5-inch SATA drives spinning at 7.2K RPM. The rack has 21 nodes in it, with each server having 16 Xeon E5 cores across its two sockets and 128GB of main memory, for a total of 336 Xeon E5 cores and 2.6TB of main memory to chew on data.

The performance-tuned configuration of the Dataraptor has 300TB of formatted user capacity and is based on 600GB SAS drives; it can handle 144,000 disk I/O operations per second and deliver 47GB/sec of bandwidth off the disks (on uncompressed data). It has four flash drives for storing a Linux operating system and speeding up data accesses on the node, and you can choose sizes from 80GB to 200GB.

Dataraptor server nodes

Dataraptor server nodes

The capacity configuration using fatter 1TB SATA drives has a total of 504TB of usable formatted capacity per rack, and delivers 32,500 IOPS and bandwidth off the disks at 26GB/sec. The capacity configuration has two flash drives for the OS.

You link your servers into the Dataraptor appliance through 10GbE links, and the whole shebang is cross-coupled to run the MarkLogic database using two 48-port 10GbE switches.

SGI has inked an OEM agreement with MarkLogic to peddle its eponymous database, and will be providing the Dataraptor under a single SKU with full support coming from SGI and MarkLogic backstopping on issues relating to its software, much as SGI is doing in its relationship with Cloudera for Hadoop clusters.

The Dataraptor racks are assembled in SGI's factory in Chippewa Falls, Wisconsin, and you can order them in full racks (21 nodes), half rack (10 nodes), and quarter rack (5 nodes) configurations

SGI is tossing its Foundation extensions for Linux on each node, and its SGI Management Center cluster-management tools run on a separate 1U server in the rack to control and monitor the database nodes.

SGI is taking orders for the Dataraptor appliance now, and will start shipping on October 22. Pricing for the appliance was not divulged, since MarkLogic does not providing public pricing for its products. ®

The Power of One eBook: Top reasons to choose HP BladeSystem

More from The Register

next story
Apple fanbois SCREAM as update BRICKS their Macbook Airs
Ragegasm spills over as firmware upgrade kills machines
Attack of the clones: Oracle's latest Red Hat Linux lookalike arrives
Oracle's Linux boss says Larry's Linux isn't just for Oracle apps anymore
THUD! WD plonks down SIX TERABYTE 'consumer NAS' fatboy
Now that's a LOT of porn or pirated movies. Or, you know, other consumer stuff
EU's top data cops to meet Google, Microsoft et al over 'right to be forgotten'
Plan to hammer out 'coherent' guidelines. Good luck chaps!
US judge: YES, cops or feds so can slurp an ENTIRE Gmail account
Crooks don't have folders labelled 'drug records', opines NY beak
Manic malware Mayhem spreads through Linux, FreeBSD web servers
And how Google could cripple infection rate in a second
FLAPE – the next BIG THING in storage
Find cold data with flash, transmit it from tape
prev story

Whitepapers

Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Reducing security risks from open source software
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Consolidation: the foundation for IT and business transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.