We grill high-end backup kid on its cloudy data protection stake
Taking the PB out of PBBA? Datos IO hopes you're ready for this jelly
Interview Datos IO has a unique approach to data protection that is at odds with legacy media-server-based data protection and the use of purpose-built backup appliances.
It says its emphasis on how applications store and manage their data makes it different from newer data protection vendors such as Cohesity, Rubrik and Veeam.
We asked its CEO and co-founder Tarun Thakur a series of questions looking to bring out the details of these differences. And we have edited his replies to gain a little brevity, not that you might notice – he gave some fairly thorough answers and clearly wants to get his points across.
El Reg: Why is the purpose-built backup appliance (PBBA) market declining?
Tarun Thakur: Our founding premise has been that “applications” define the choices of the IT stack: from the choice of databases, to choice of storage, to choice of data management. And data management includes everything from backup and recovery software, to purpose-built backup filesystems/appliances, to archival software, to storage management software, to archival storage, et al.
- Traditional applications are moving to the cloud (for all non-recovery use cases such as test/dev and primary application instancing),
- Third platform applications (analytics, IoT, et al) are being born in the cloud,
- Application data is now highly pre-compressed in nature thus rendering traditional deduplication useless,
- [We have the] shared nothing architectures of non-relational and cloud native databases,
- Cloud infrastructure being rooted in cloud-native layers of compute and storage rather than LUNs or VMs,
- Rich analytical services (such as Amazon Athena) [are] now natively and in-place available for cloud storage.
All these tectonic forces render classical variable length or fixed-block deduplication, the secret sauce of purpose-built backup appliances (PBBA), a non-starter for the next-generation era of applications and multi-cloud.
These forces all combined are the reason why PBBA market is declining.
El Reg: How is the public cloud affecting the data protection market?
Tarun Thakur: Multiple studies have confirmed that [Amazon/Azure] cloud deployments deliver greater investment returns with a shorter payback period when compared to the traditional on-premise delivery model.
Because of this and coupled with primary consumers of IT now becoming application owners and DevOps, traditional applications are moving to the cloud and next-generation applications are being born natively in the cloud.
Because of the architectural differences and new constraints (no hypervisor access, no SAN/NAS storage device primitives, etc.) successful data protection solutions in the cloud require a completely re-thought data protection architecture and design.
... Traditional data protection solutions simply lifted to the cloud become heavy users of locally-attached storage which is an order of magnitude more expensive than the public cloud scalable object storage offerings (e.g., Amazon S3) that cloud-born solutions like Datos IO are optimised for.
Most importantly, the traditional architecture of legacy backup software products ruin the native formats of the data thus rendering customers hostage from their “own” data with little capability to monetise the data and mobilise the data across cloud boundaries.
El Reg: Do you think customers will go all-in to the public cloud or head towards a combination of on-premises and public cloud IT with some form of data management umbrella covering the two environments?
Tarun Thakur: We strongly believe that any enterprise that has a multitude of applications and databases is living in a “multi-cloud” world: part of their applications will run on-premises while other application run in public cloud environments.
So, no we don’t think that it will be an “all-in” world.
Enterprises have had and will continue to have on-premises infrastructure that they use for some of their business critical applications. That said, for all the secondary use cases of these applications such as DR, test/dev, performance staging, CI/CD, etc.
These enterprises want the ability to have a version of their application available on public cloud platforms enabling developer agility/productivity while also reducing their IT expense by going from a “buy” to the “rent-based” mode of public cloud.
El Reg: Are you saying that the data protection business is basically a data copy business, and customers should be able to do more with this data copy than keeping it just in case the source system fails or is compromised?
Tarun Thakur: ... In on-premises environments, data protection means data copy, as most applications are tied to a single monolithic database. Cloud is redefining this paradigm; data protection in the cloud cannot just be data copy, as that would be like old wine in a new bottle (the bottle being the cloud).
... Our replication is not at the LUN layer, or VM replication or VM file backup; instead Datos IO aims to understand your application’s data at a table, record (row), and at a columnar level. Thus it allows you faster recovery, databases maintained as databases on secondary storage (aka, native formats!), application state to be distributed in different stores, and, finally, highly efficient migration from on-premises to cloud or multi-cloud environments.
El Reg: What's the problem if the PBBA suppliers make their software run in the cloud? Why doesn't that answer the need?
Tarun Thakur: The PBBA suppliers have adopted an architecture that is based on writing to their “appliance nodes”. Moving these “nodes” to expensive software compute nodes mounting expensive fast EC2 storage to handle the write workload does not deliver an effective or economically viable “cloud” solution - even if they then eventually get the data to S3.
... Cloud-lifted PBBAs and media servers are based on attached storage which is an order of magnitude more expensive than public cloud storage infrastructure such as Amazon S3.
All protected data must flow from the end application nodes through media servers. This does not scale in a geo-distributed multi-cloud world. PBBAs become a performance choke point and are not able to handle the large scale data that is widely prevalent in the cloud.
... Many kinds of data types, e.g. scale-out database transactions, are inherently bit-wise unique and do not effectively de-duplicate using bit-wise data segment/data block deduplication...
El Reg: Several other companies are in what's called the cloud data management space and provide data protection and other services, such as Veeam, Cohesity and Rubrik. How does Datos IO's technology differ from Veeam, Cohesity and Rubrik?
Tarun Thakur: Veeam is based on a media server architecture where data is piped from VMWare (or equivalent servers) to target agents on the media server with attached storage. The deduplication is done partly on the source and partly on the target.
Here are the key differences with Veeam:
- All protected data must flow through Veeam media servers and this approach will not scale in a geo-distributed multi-cloud world. The media servers will become a choke point and the use of attached storage is 10 times more expensive in a cloud deployment model.
- Veeam primarily deals with opaque blocks and files and thereby cannot provide fine-grained data protection or advanced data management services such as search.
- While Veeam benefits from application transactional consistency from the use of VSS, this cannot be termed application-centric as Veeam does not have any insight into the structure of the data being protected.
- As data is increasingly compressed to increase the computing throughput of both traditional and next-generation applications, the use of variable length deduplication is ineffective for both space compression and WAN acceleration.
- Veeam treats cloud data stores as long-term retention targets while Datos IO uses semantic deduplication technology to use these data stores as secondary storage.
El Reg; And Rubrik and Cohesity?
Tarun Thakur: Both Rubrik and Cohesity (aka, “the PBBA Replacement Vendors”) are based on a media server architecture where data is piped from VMWare (or equivalent servers) to target agents on a hyper-converged media server with integrated storage.
Here are the key differences with Rubrik and Cohesity:
- Due to the hyper-converged nature of the offering, all protected data must flow to the media servers. Like Veeam, this approach will not scale in a geo-distributed multi-cloud world and will become a choke point.
- The integrated storage offering is an order of magnitude more expensive in a cloud deployment model.
- The approach of protecting opaque blocks and files cannot provide fine-grained data protection or advanced data management services such as search. While VSS provides application consistency, both Rubrik and Cohesity lack any insight into the structure of the data being protected.
- As data is increasingly compressed to increase the computing throughput of both traditional and next-generation applications, the use of variable length de-duplication is ineffective for both space compression and WAN acceleration.
- Both Rubrik and Cohesity treat cloud data stores as long-term retention targets while Datos IO uses these data stores as secondary storage using our semantic de-duplication technology.
Both Rubrik and Cohesity are fundamentally going after the same workloads of 2nd platform or Mode 1 applications (SQL server-based or VM-based) rather than cloud-native applications, non-relational databases, distributed applications, and workloads - the 3rd platform era.
El Reg: Where does Datos IO want to be in five years' time in terms of its product’s capabilities?
Tarun Thakur: ... We intend to continue ... building out data protection and mobility use cases for a rich set of operational data sources - both traditional such as MySQL and cloud-oriented such as Amazon DynamoDB, among others.
A rich variety of data sources allow us to capture an increasing volume of data and this in turn will allow us to deliver advanced data management services to application and business owners. This allows us to realise the vision of a “cloud-scale data hub” that allows a wide spectrum of services:
... We want to provide a globally-distributed metadata catalog that spans silos across both public and private clouds.
... In the specific example of data protection and mobility, the underlying use case for the data administrator would be: back up anywhere, recover anywhere, migrate anywhere.
Datos IO tech is for apps and data that are in or make use of the public cloud, particularly if they are cloud-native. Its CEO claims it is a far better data protection product than traditional or legacy backup software, purpose-built backup appliances such as Data Domain, virtual server-focussed products such as Veeam, and newer products from Cohesity and Rubrik.
He talks a good talk, but does his product walk a good walk? That’s for you to find out with a proof-of-concept pilot. ®