Feeds

Cassandra can FINALLY predict the future

Via Spark-based real-time analytics, that is

New hybrid storage solutions

The company behind the Cassandra database has partnered with a big-brained computer science company to add real-time analytics to its technology.

Datastax announced on Thursday that it had partnered with Apache Spark's steward Databricks to bring the service to Cassandra, giving users of the system a way to rapidly generate insights over ingested data. This could potentially let them spot the warning signs for bank fraud, along with other hard-to-spot occurrences.

Spark is an open source data-processing system that was developed at UC Berkeley's computer science hothouse AMPLab in 2009, then published as open source in 2010.

The technology "provides high-level APIs in Scala, Java, and Python that make parallel jobs easy to write, and an optimized engine that supports general computation graphs. It also supports a rich set of higher-level tools including Shark (Hive on Spark), MLlib for machine learning, GraphX for graph processing, and Spark Streaming," according to an FAQ.

When it launched, it was designed to be run on top of the Hadoop File System (HDFS) and was therefore chiefly used as an extremely fast batch processing system.

Now, Databricks has worked with Datastax to layer Spark on top of Cassandra, bringing a capable data processing engine right on top of a database management system, giving companies involved in ecommerce, fraud detection, and others a handy tool.

"Our customer base and community concentrates on real-time apps that needs a real-time database," explained Datastacks' executive vice president of engineering, Martin van Ryswyk, in a chat with El Reg. "There are a lot of use cases where you're processing data very quickly that is immediately fed back to users in a web application. A database like Cassandra under the covers is the most appropriate choice."

To get Spark to work with Cassandra, the two company's worked to make sure that Spark's underlying storage interface, the Resilient Distributed Dataset (RDD), could make sense of Cassandra-stored data.

"You have to worry about connections like connecting to the database and being efficient and doing threading. We're using [Cassandra Query Language] – making sure it's up to date, [and there's been] a lot of work in making sure that things like datatypes map between the two, and how to do that," van Ryswyk said.

Datastax plans to publish the technology as open source soon, and may develop a paid option as well.

"We have a candidate we are working on that is well into development. It will be released into open source and you do not have to pay anybody for it," he said.

The tie-up between Spark and Cassandra mirrors a similar partnership that was announced last week between MongoDB and Cloudera.

That deal saw the companies agree to work together to more tightly integrate Hadoop and associated analytics technologies with MongoDB, and to build on existing work like MongoDB's "Hadoop Connector".

By partnering with Databricks, Cassandra-company Datastax is hoping to skip the use of HDFS altogether, and load analytics directly on top of its DBMS. It's an interesting move and one that highlights how the worlds of data analysis and data storage are, as they have many times in the past, dancing closer together. ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
'Windows 9' LEAK: Microsoft's playing catchup with Linux
Multiple desktops and live tiles in restored Start button star in new vids
Not appy with your Chromebook? Well now it can run Android apps
Google offers beta of tricky OS-inside-OS tech
New 'Cosmos' browser surfs the net by TXT alone
No data plan? No WiFi? No worries ... except sluggish download speed
Greater dev access to iOS 8 will put us AT RISK from HACKERS
Knocking holes in Apple's walled garden could backfire, says securo-chap
NHS grows a NoSQL backbone and rips out its Oracle Spine
Open source? In the government? Ha ha! What, wait ...?
Google extends app refund window to two hours
You now have 120 minutes to finish that game instead of 15
Intel: Hey, enterprises, drop everything and DO HADOOP
Big Data analytics projected to run on more servers than any other app
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.