Feeds

DataStax adds search to Cassandra NoSQL

Solr powered distributed database

Website security in corporate America

Structure Data 2012 At the Structure Data 2012 conference in New York this week, DataStax, which as commercialized the Apache Cassandra NoSQL database originally created by Facebook and open sourced as an Apache project, has bolted on search to the data store and a plug in that lets it also search and index application logs.

The new search functions come thanks to the welding of the open source Solr search engine, which like Cassandra is written in Java. Solr is a variant of the Lucene search engine and is an Apache project as well. It adds REST and JSON APIs to the Lucene search engine.

DataStax chose Solr as its search engine to run atop the Cassandra distributed database inside of its DataStax Enterprise 2.0 release in part because it is somewhere on the order of five to six times more popular than the Apache Hadoop batch-oriented MapReduce data muncher.

Sometimes, rather than crunching data to do filtering and correlations, you just want to search it, and Solr is fast and now can ride atop Cassandra. Companies that already like Lucene and want something that is faster and that now has no single point of failure, thanks to the replication and clustering inherent in Cassandra, can now go with DSE 2.0.

In addition to Solr, DSE 2.0 has rifled around the other Apache projects and snapped in Log4j, a logging services layer, into Cassandra as well. The application logs, which help programmers debug their code, are now stored in Cassandra and are fully indexable and searchable.

The updated release also includes what DataStax calls elastic workload partitioning, which allows for x86 server nodes to be stood up running either real-time Cassandra apps or batch-oriented Hadoop MapReduce jobs. As the workloads go through their peaks and valleys, you dial up Cassandra nodes and dial back Hadoop nodes, usually during the day, and then reverse the process at night when you are in batch mode sifting through all of the data you collected during the day for correlations and associations.

Finally, the DSE 2.0 release comes with Sqoop, another Apache project (this one is in incubation phase), which is used to import data from corporate relational databases into HBase, a quasi-relational data store that runs atop the Hadoop Distributed File System, or the Hive data-warehousing system that also runs atop HDFS, and that allows SQL-like ad-hoc querying of information inside HDFS. Now, Sqoop can also speak Cassandra and suck data from relational databases to this alternative, distributed NoSQL database.

"This is a bit of a developer's paradise," DataStax CEO Billy Bosworth tells El Reg. "You have real-time, batch analytics, search, and logs all in one place. Now developers can focus on building applications and not worry about the back end."

The original DSE 1.0 launched last fall took the Cassandra datastore and added in the Hadoop batch analytics on top of it – MapReduce, Hive, Pig, and so forth – eliminating the master-slave node bottleneck and single point of failure of the kosher Hadoop-HDFS combination without breaking API compatibility with Hadoop. For all Hadoop knows, it is running on HDFS when it is running atop Cassandra.

The DSE distribution also includes a number of closed source elements, such as features that guarantee workload isolation between Hadoop algorithms and Cassandra code, the OpsCenter visual management tool, and other analytics features – but you can only get these by buying a license and paying for support for the DSE variant. However, if you want to just use the Apache Cassandra database, DataStax is happy to sell you a support contract for that.

DSE 2.0 is available now, and while DataStax does not provide list pricing, Bosworth said that it was on the order of a couple of thousand dollars per server node. "It tends to be an order of magnitude less than a relational database," says Bosworth. ®

Protecting against web application threats using SSL

More from The Register

next story
New 'Cosmos' browser surfs the net by TXT alone
No data plan? No WiFi? No worries ... except sluggish download speed
'Windows 9' LEAK: Microsoft's playing catchup with Linux
Multiple desktops and live tiles in restored Start button star in new vids
iOS 8 release: WebGL now runs everywhere. Hurrah for 3D graphics!
HTML 5's pretty neat ... when your browser supports it
Mathematica hits the Web
Wolfram embraces the cloud, promies private cloud cut of its number-cruncher
Google extends app refund window to two hours
You now have 120 minutes to finish that game instead of 15
Mozilla shutters Labs, tells nobody it's been dead for five months
Staffer's blog reveals all as projects languish on GitHub
SUSE Linux owner Attachmate gobbled by Micro Focus for $2.3bn
Merger will lead to mainframe and COBOL powerhouse
iOS 8 Healthkit gets a bug SO Apple KILLS it. That's real healthcare!
Not fit for purpose on day of launch, says Cupertino
Profitless Twitter: We're looking to raise $1.5... yes, billion
We'll spend the dosh on transactions, biz stuff 'n' sh*t
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Beginner's guide to SSL certificates
De-mystify the technology involved and give you the information you need to make the best decision when considering your online security options.