Feeds

DataStax adds search to Cassandra NoSQL

Solr powered distributed database

Top three mobile application threats

Structure Data 2012 At the Structure Data 2012 conference in New York this week, DataStax, which as commercialized the Apache Cassandra NoSQL database originally created by Facebook and open sourced as an Apache project, has bolted on search to the data store and a plug in that lets it also search and index application logs.

The new search functions come thanks to the welding of the open source Solr search engine, which like Cassandra is written in Java. Solr is a variant of the Lucene search engine and is an Apache project as well. It adds REST and JSON APIs to the Lucene search engine.

DataStax chose Solr as its search engine to run atop the Cassandra distributed database inside of its DataStax Enterprise 2.0 release in part because it is somewhere on the order of five to six times more popular than the Apache Hadoop batch-oriented MapReduce data muncher.

Sometimes, rather than crunching data to do filtering and correlations, you just want to search it, and Solr is fast and now can ride atop Cassandra. Companies that already like Lucene and want something that is faster and that now has no single point of failure, thanks to the replication and clustering inherent in Cassandra, can now go with DSE 2.0.

In addition to Solr, DSE 2.0 has rifled around the other Apache projects and snapped in Log4j, a logging services layer, into Cassandra as well. The application logs, which help programmers debug their code, are now stored in Cassandra and are fully indexable and searchable.

The updated release also includes what DataStax calls elastic workload partitioning, which allows for x86 server nodes to be stood up running either real-time Cassandra apps or batch-oriented Hadoop MapReduce jobs. As the workloads go through their peaks and valleys, you dial up Cassandra nodes and dial back Hadoop nodes, usually during the day, and then reverse the process at night when you are in batch mode sifting through all of the data you collected during the day for correlations and associations.

Finally, the DSE 2.0 release comes with Sqoop, another Apache project (this one is in incubation phase), which is used to import data from corporate relational databases into HBase, a quasi-relational data store that runs atop the Hadoop Distributed File System, or the Hive data-warehousing system that also runs atop HDFS, and that allows SQL-like ad-hoc querying of information inside HDFS. Now, Sqoop can also speak Cassandra and suck data from relational databases to this alternative, distributed NoSQL database.

"This is a bit of a developer's paradise," DataStax CEO Billy Bosworth tells El Reg. "You have real-time, batch analytics, search, and logs all in one place. Now developers can focus on building applications and not worry about the back end."

The original DSE 1.0 launched last fall took the Cassandra datastore and added in the Hadoop batch analytics on top of it – MapReduce, Hive, Pig, and so forth – eliminating the master-slave node bottleneck and single point of failure of the kosher Hadoop-HDFS combination without breaking API compatibility with Hadoop. For all Hadoop knows, it is running on HDFS when it is running atop Cassandra.

The DSE distribution also includes a number of closed source elements, such as features that guarantee workload isolation between Hadoop algorithms and Cassandra code, the OpsCenter visual management tool, and other analytics features – but you can only get these by buying a license and paying for support for the DSE variant. However, if you want to just use the Apache Cassandra database, DataStax is happy to sell you a support contract for that.

DSE 2.0 is available now, and while DataStax does not provide list pricing, Bosworth said that it was on the order of a couple of thousand dollars per server node. "It tends to be an order of magnitude less than a relational database," says Bosworth. ®

3 Big data security analytics techniques

More from The Register

next story
OpenBSD founder wants to bin buggy OpenSSL library, launches fork
One Heartbleed vuln was too many for Theo de Raadt
Got Windows 8.1 Update yet? Get ready for YET ANOTHER ONE – rumor
Leaker claims big release due this fall as Microsoft herds us into the CLOUD
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Ubuntu 14.04 LTS: Great changes, but sssh don't mention the...
Why HELLO Amazon! You weren't here last time
Patch iOS, OS X now: PDFs, JPEGs, URLs, web pages can pwn your kit
Plus: iThings and desktops at risk of NEW SSL attack flaw
Next Windows obsolescence panic is 450 days from … NOW!
The clock is ticking louder for Windows Server 2003 R2 users
Batten down the hatches, Ubuntu 14.04 LTS due in TWO DAYS
Admins dab straining server brows in advance of Trusty Tahr's long-term support landing
Red Hat to ship RHEL 7 release candidate with a taste of container tech
Grab 'near-final' version of next Enterprise Linux next week
Apple inaugurates free OS X beta program for world+dog
Prerelease software now open to anyone, not just developers – as long as you keep quiet
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mainstay ROI - Does application security pay?
In this whitepaper learn how you and your enterprise might benefit from better software security.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.