Talena wants to be distributed database data management rock
Second app-aware deduper for distributed databases
Analysis Talena was founded in 2013 to develop data management software for Big Data and non-relational database applications. Its data management software provides backup, recovery, test data management and archiving capabilities for Cassandra, Couchbase, Hadoop, and Vertica. The company claims its software integrates machine learning with unique storage optimisation technology to dramatically reduce the costs associated with backup, recovery, and other data management functions across NoSQL, Hadoop, and modern data warehouse products.
It features ActiveRx predictive analytics, which includes a machine-learning component, FastFind metadata catalogue and single pane of glass management, data-aware deduplication and compression, a scale-out architecture and user-defined policies for backup, recovery, mirroring, and archiving workflows. Masking algorithms can prevent sensitive data exposure.
The global, data-aware, variable-length deduplication engine identifies the data that is to be deduplicated. It may be stored in many formats, such as compressed files (e.g. GZ, Snappy, LZO, and so on) or application-specific structures (e.g. RCFile, Parquet and ORC for Hive and Impala or SSTable for Cassandra). Once deduplicated and compressed, the output is saved on to the Talena file system and erasure coded to increase its durability.
Talena's* software architecture involves a loosely coupled core, deep insight into each supported (Big Data application) product, and platform-specific data movers that utilise each of the vendor-supplied APIs. This design makes it straightforward, Talena says, to integrate new Big Data products.
Its software is priced as an annual subscription, based on the amount of unique data managed. Chief marketeer Sanjay Sarathy says: "For example, if you have 90TB across three production replicas we charge on the basis of the 30 unique TB that we are backing up, archiving, etc.”
The four founders were:
- Nitin Donde – CEO and ex-Aster, EMC and Kazeon
- Srinivas Vadlamani – Chief Architect and ex-Couchbase and Aster
- Hari Mankude – CTO and ex-Hortonworks
- Shalesh Parulekar – Director India Operations (head of engineering) and ex-Marvell and Atempo
There was a $12 million A-round of funding in 2015 led by Onset Ventures with participation from Canaan Partners, Intel Capital and Wipro. The four-man board is formed by Nitin Donde, Mark Davis (CEO of now-closed ClusterHQ), Deepak Kamra, a general partner at Canaan, and Shomit Ghose, a partner at ONSET Ventures.
With regard to Datos IO
Talena has 25 customers so far, 15 more than nearest competitor Datos IO. How does Talena's product and technology differ from that of Datos IO? Sarathy gave us seven points of difference:
- Data source support – The only overlap at this point is Cassandra/DataStax Enterprise. Beyond that, we support Hadoop, Couchbase and Vertica, and Datos IO supports MongoDB.
- We focus on a broader range of use cases: backup/recovery, test data management, and archiving.
- We've incorporated data masking**, sampling and filtering into our workflow engine around test data management.
- We've incorporated machine learning and the ability to turn your passive backup data asset into an active compute cluster (ActiveRX).
- Our architecture allows us to scale to as many nodes as your production needs require – our understanding is that Datos only provides a one or three-node configuration.
- Our understanding is that Datos requires a separate NFS server for the storage component while Talena integrates compute and storage and is storage-agnostic – supporting direct-attached, SAN or NAS environments. We both support cloud-based storage.
- Both companies have content-aware/semantic deduplication, but we've added the additional layers of compression and erasure coding on top of it.
Talena marketing coverage slide
He says there are three main directions on Talena's roadmap. First it will expanding the data platforms it supports. There will be continued investment in their machine learning algorithms to support more use cases, and the third thread is about deepening the compliance story by, for example, adding extra capabilities to its data-masking capability.
Like Datos IO, Talena relies on a detailed awareness of the data workflows and treatment inside the distributed applications it supports. It is inherently not a generalised distributed database/application backup company.
Could existing incumbent backup suppliers extend their data coverage to include distributed, NoSQL-type databases, meaning Commvault, Veeam, Veritas, etc? Yes, obviously, assuming they can understand the data structures and use the app's facilities, such as snapshots. Whether they can scale out like Talena is another matter.
Commvault already backs up MongoDB databases, and Hadoop. Veeam supports MongoDB as does Veritas Backup Exec. But Talena would say that script-level support is not enough and, possibly, that its application-aware deduplication is better than incumbent backup products' technology.
This comparison could rapidly head towards feature tick-list hell, your mileage may vary considerations and other complexities. Best to have comparison bake-offs using your data on your site.
Check out a PDF white paper describing Talena's software technology here. ®
* Talena is apparently an ancient Celtic expression meaning "the rock".
** Datos IO provides data masking as well.