DataStax slurps graph database biz Titan's brains
Hopes Facebook-tastic tech will be big data's Next Big Thing™
NoSQL database-start up DataStax is today expected to announce its plans to start developing and selling a graph database for enterprises.
DataStax will announce it has bought Aurelius, whose team is behind the Apache-licensed Titan distributed graph database for Cassandra. Financial terms were not revealed.
DataStax said the Aurelius team would help build something called the DataStax Enterprise Graph – a new graph database built on top of DataStax Enterprise.
This will integrate with Cassandra, the Apache-open-source NoSQL distributed storage system used by DataStax, and with DSE Search and Analytics.
The ASF-Cassandra project was born of the NoSQL distributed storage system of Facebook and is built on Google’s Big Table and Amazon’s Dynamo.
NoSQL databases emerged from web giants such as Facebook and Twitter and were pitched at enterprises. Next to emerge were graph databases: systems that claim to understand the relationships between entities using unique identifiers.
Graph databases differ from relational databases in that they dispense with rows and columns to store and query data, and thereby establish relationships and answer queries. Such databases work by assigning a unique identifier to a node – such as a person – and a set of edges that link nodes to each other, such as love of a certain type of music. Links are expressed as key/value pairs.
Like most things in this world, they are built to compute across massive server clusters – even different data centres – running on commodity hardware.
The idea is graph databases work faster and are more flexible than relational databases; they are used by social networks including LinkedIn to find connections between members, the CIA to identify links between members of terrorist networks, and financial services to detect fraud.
The Herculean lifter can query graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. It's claimed the system can run an unlimited number of concurrent connections with the addition of new machines to a cluster.
It works with HBase and BerkeleyDB plus Cassandra for back-end data storage, ElasticSearch, Hadoop and Lucene.
It has also claimed Titan supports Atomicity, Consistency, Isolation and Durability (ACID) – a big stumbling block for most throwing NoSQL at the enterprise. ACID is a feature of relational databases that many in NoSQL threw out but are slowly clawing their way back towards.
Titan features connection pooling with failover for deployment across servers, clusters and data centres with metrics for management. DataStax reckons its resulting Titan-based product will be a database that can be used in recommendation engines, as a component of identity and access management systems, in network impact analysis, logistics, network and device management, and – yes – financial fraud detection.
DataStax, which was founded in 2010, claims 400 customers, including Netflix, with Aurelius’s customers incuding Cisco and Los Alamos National Laboratory, historical home of the US nuclear bomb. ®