Feeds

DataStax adds search to Cassandra NoSQL

Solr powered distributed database

A new approach to endpoint data protection

Structure Data 2012 At the Structure Data 2012 conference in New York this week, DataStax, which as commercialized the Apache Cassandra NoSQL database originally created by Facebook and open sourced as an Apache project, has bolted on search to the data store and a plug in that lets it also search and index application logs.

The new search functions come thanks to the welding of the open source Solr search engine, which like Cassandra is written in Java. Solr is a variant of the Lucene search engine and is an Apache project as well. It adds REST and JSON APIs to the Lucene search engine.

DataStax chose Solr as its search engine to run atop the Cassandra distributed database inside of its DataStax Enterprise 2.0 release in part because it is somewhere on the order of five to six times more popular than the Apache Hadoop batch-oriented MapReduce data muncher.

Sometimes, rather than crunching data to do filtering and correlations, you just want to search it, and Solr is fast and now can ride atop Cassandra. Companies that already like Lucene and want something that is faster and that now has no single point of failure, thanks to the replication and clustering inherent in Cassandra, can now go with DSE 2.0.

In addition to Solr, DSE 2.0 has rifled around the other Apache projects and snapped in Log4j, a logging services layer, into Cassandra as well. The application logs, which help programmers debug their code, are now stored in Cassandra and are fully indexable and searchable.

The updated release also includes what DataStax calls elastic workload partitioning, which allows for x86 server nodes to be stood up running either real-time Cassandra apps or batch-oriented Hadoop MapReduce jobs. As the workloads go through their peaks and valleys, you dial up Cassandra nodes and dial back Hadoop nodes, usually during the day, and then reverse the process at night when you are in batch mode sifting through all of the data you collected during the day for correlations and associations.

Finally, the DSE 2.0 release comes with Sqoop, another Apache project (this one is in incubation phase), which is used to import data from corporate relational databases into HBase, a quasi-relational data store that runs atop the Hadoop Distributed File System, or the Hive data-warehousing system that also runs atop HDFS, and that allows SQL-like ad-hoc querying of information inside HDFS. Now, Sqoop can also speak Cassandra and suck data from relational databases to this alternative, distributed NoSQL database.

"This is a bit of a developer's paradise," DataStax CEO Billy Bosworth tells El Reg. "You have real-time, batch analytics, search, and logs all in one place. Now developers can focus on building applications and not worry about the back end."

The original DSE 1.0 launched last fall took the Cassandra datastore and added in the Hadoop batch analytics on top of it – MapReduce, Hive, Pig, and so forth – eliminating the master-slave node bottleneck and single point of failure of the kosher Hadoop-HDFS combination without breaking API compatibility with Hadoop. For all Hadoop knows, it is running on HDFS when it is running atop Cassandra.

The DSE distribution also includes a number of closed source elements, such as features that guarantee workload isolation between Hadoop algorithms and Cassandra code, the OpsCenter visual management tool, and other analytics features – but you can only get these by buying a license and paying for support for the DSE variant. However, if you want to just use the Apache Cassandra database, DataStax is happy to sell you a support contract for that.

DSE 2.0 is available now, and while DataStax does not provide list pricing, Bosworth said that it was on the order of a couple of thousand dollars per server node. "It tends to be an order of magnitude less than a relational database," says Bosworth. ®

7 Elements of Radically Simple OS Migration

More from The Register

next story
PEAK LANDFILL: Why tablet gloom is good news for Windows users
Sinofsky's hybrid strategy looks dafter than ever
Leaked Windows Phone 8.1 Update specs tease details of Nokia's next mobes
New screen sizes, dual SIMs, voice over LTE, and more
POW! Apple smites Macbook Air EFI firmware update borkage
Fruity firm provides digital balm for furious fanbois
Fiendishly complex password app extension ships for iOS 8
Just slip it in, won't hurt a bit, 1Password makers urge devs
Mozilla keeps its Beard, hopes anti-gay marriage troubles are now over
Plenty on new CEO's todo list – starting with Firefox's slipping grasp
Apple: We'll unleash OS X Yosemite beta on the MASSES on 24 July
Starting today, regular fanbois will be guinea pigs, it tells Reg
Another day, another Firefox: Version 31 is upon us ALREADY
Web devs, Mozilla really wants you to like this one
Secure microkernel that uses maths to be 'bug free' goes open source
Hacker-repelling, drone-protecting code will soon be yours to tweak as you see fit
prev story

Whitepapers

7 Elements of Radically Simple OS Migration
Avoid the typical headaches of OS migration during your next project by learning about 7 elements of radically simple OS migration.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Solving today's distributed Big Data backup challenges
Enable IT efficiency and allow a firm to access and reuse corporate information for competitive advantage, ultimately changing business outcomes.
A new approach to endpoint data protection
What is the best way to ensure comprehensive visibility, management, and control of information on both company-owned and employee-owned devices?