Feeds

DataStax adds search to Cassandra NoSQL

Solr powered distributed database

Boost IT visibility and business value

Structure Data 2012 At the Structure Data 2012 conference in New York this week, DataStax, which as commercialized the Apache Cassandra NoSQL database originally created by Facebook and open sourced as an Apache project, has bolted on search to the data store and a plug in that lets it also search and index application logs.

The new search functions come thanks to the welding of the open source Solr search engine, which like Cassandra is written in Java. Solr is a variant of the Lucene search engine and is an Apache project as well. It adds REST and JSON APIs to the Lucene search engine.

DataStax chose Solr as its search engine to run atop the Cassandra distributed database inside of its DataStax Enterprise 2.0 release in part because it is somewhere on the order of five to six times more popular than the Apache Hadoop batch-oriented MapReduce data muncher.

Sometimes, rather than crunching data to do filtering and correlations, you just want to search it, and Solr is fast and now can ride atop Cassandra. Companies that already like Lucene and want something that is faster and that now has no single point of failure, thanks to the replication and clustering inherent in Cassandra, can now go with DSE 2.0.

In addition to Solr, DSE 2.0 has rifled around the other Apache projects and snapped in Log4j, a logging services layer, into Cassandra as well. The application logs, which help programmers debug their code, are now stored in Cassandra and are fully indexable and searchable.

The updated release also includes what DataStax calls elastic workload partitioning, which allows for x86 server nodes to be stood up running either real-time Cassandra apps or batch-oriented Hadoop MapReduce jobs. As the workloads go through their peaks and valleys, you dial up Cassandra nodes and dial back Hadoop nodes, usually during the day, and then reverse the process at night when you are in batch mode sifting through all of the data you collected during the day for correlations and associations.

Finally, the DSE 2.0 release comes with Sqoop, another Apache project (this one is in incubation phase), which is used to import data from corporate relational databases into HBase, a quasi-relational data store that runs atop the Hadoop Distributed File System, or the Hive data-warehousing system that also runs atop HDFS, and that allows SQL-like ad-hoc querying of information inside HDFS. Now, Sqoop can also speak Cassandra and suck data from relational databases to this alternative, distributed NoSQL database.

"This is a bit of a developer's paradise," DataStax CEO Billy Bosworth tells El Reg. "You have real-time, batch analytics, search, and logs all in one place. Now developers can focus on building applications and not worry about the back end."

The original DSE 1.0 launched last fall took the Cassandra datastore and added in the Hadoop batch analytics on top of it – MapReduce, Hive, Pig, and so forth – eliminating the master-slave node bottleneck and single point of failure of the kosher Hadoop-HDFS combination without breaking API compatibility with Hadoop. For all Hadoop knows, it is running on HDFS when it is running atop Cassandra.

The DSE distribution also includes a number of closed source elements, such as features that guarantee workload isolation between Hadoop algorithms and Cassandra code, the OpsCenter visual management tool, and other analytics features – but you can only get these by buying a license and paying for support for the DSE variant. However, if you want to just use the Apache Cassandra database, DataStax is happy to sell you a support contract for that.

DSE 2.0 is available now, and while DataStax does not provide list pricing, Bosworth said that it was on the order of a couple of thousand dollars per server node. "It tends to be an order of magnitude less than a relational database," says Bosworth. ®

Build a business case: developing custom apps

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Linux turns 23 and Linus Torvalds celebrates as only he can
No, not with swearing, but by controlling the release cycle
China hopes home-grown OS will oust Microsoft
Doesn't much like Apple or Google, either
Sin COS to tan Windows? Chinese operating system to debut in autumn – report
Development alliance working on desktop, mobe software
Apple promises to lift Curse of the Drained iPhone 5 Battery
Have you tried turning it off and...? Never mind, here's a replacement
Why has the web gone to hell? Market chaos and HUMAN NATURE
Tim Berners-Lee isn't happy, but we should be
Eat up Martha! Microsoft slings handwriting recog into OneNote on Android
Freehand input on non-Windows kit for the first time
Linux kernel devs made to finger their dongles before contributing code
Two-factor auth enabled for Kernel.org repositories
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Endpoint data privacy in the cloud is easier than you think
Innovations in encryption and storage resolve issues of data privacy and key requirements for companies to look for in a solution.
Scale data protection with your virtual environment
To scale at the rate of virtualization growth, data protection solutions need to adopt new capabilities and simplify current features.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?