Storage upstart Coho Data decloaks from stealth, slurps $25m
Xen(Source) and the art of starting a storage startup
Ex-stealthy startup Coho Data, developer of a “flash-tuned scale-out storage architecture designed for the private cloud that delivers unparalleled performance at public cloud capacity pricing,” has collared $25m in B-round funding.
The round was led by Ignition Partners with original investor Andreesen Horowitz contributing too. They're getting excited about Coho Data so let's take a look at the firm and suss out what makes it attractive to its backers.
Coho Data says it is developing software-based storage - but it also ships hardware, the DataStream 1000 array. It tells the world that its storage software will enable businesses “to build their own high performance Amazon-style storage for their data.”
No small claim, that.
The DataStream 1000 array is a 2U dual-controller, using OEM’d commodity-based server hardware, which can be added to, scale-out fashion, with its distributed system software. The idea is to deliver public cloud pay-as-you-grow economics in a private cloud built from an on-premise set of boxes.
4 DataStream 1000 MicroArrays with a switch on top
The chassis contains two so-called MicroArray modules fitted with two Xeon servers, two NICs, two 800GB Intel 910 PCIe flash cards, six 3TB disk drives and networking, an OpenFlow-enabled 10GbitE facility. MicroArrays are clustered with a 52-port OpenFlow-enabled DataStream 10GbitE switch. Each MicroArray pair provides active:active protection against failure.
Total capacity is 39TB and the controllers implement a Bare Metal Object Store. We’re told “Each MicroArray allows the creation of arbitrarily many sparse objects and ensures that these objects can only be accessed by their owners. Individual objects on each MicroArray are coarse-grained containers that can be used as primitives by the data personalities to build more complex storage abstractions.”
The implication is that all data is stored as objects with data access protocols, file or iSCSI block initially, layered on top and a translation software laying doing the conversion.
The MicroArray software, a storage hypervisor, as it’s called (Coho wants to import the server virtualisation magic) supports application-specific profiles which describe data access protocol, endurance and performance needs.
Coho says of this software: “The data hypervisor is concerned only with isolation and performance. It doesn’t add the fat of additional layered file systems, volume managers, RAID parity calculation, or prescriptive protocol implementations. Its job is just to allow you to safely and efficiently share scalable storage hardware across multiple applications.”
Each MicroArray pair delivers 180,000 IOPS (random 80/20 read/write, 4K block size).
Data access protocols include NFS, SMB, iSCSI and DataStream’s DirectConnect API. Ih the future Coho says you will be able to run Hadoop jobs “directly within the storage platform and run [them] in the background behind production workloads.”
HTTP-based key/value APIs will be supported. Other access protocols can be added.
A feature of the array-switch design, according to the marketing bumf, is that the “DataStream Switch [uses] SDN capabilities to intelligently route client connections to multiple backend servers over a single logical IP address, making it easy for data profiles to scale legacy protocols,” like NFS. In fact the design seems to parallelise NFS operations across MicroArrays.
The scale-out growth can be started using a single MicroArray. A clustered group of MicroArrays work together “to store and replicate data with full hardware and software resilience. The switch provides “a single namespace with automated tiering and load balancing based on workload profiles.”
The system alerts you when it's running out of performance or capacity for your workload, so that you can buy another MicroArray to scale-out the cluster as needed. Different generation MicroArrays can work together in a cluster.
Coho object storage
Data (objects) are placed across distributed nodes, routed, to provide load-balancing. Data is also tiered inside a MicroArray, from flash to disk, using a combination of profiles (personalities) and data access frequency. The implication here is that primary data is stored on flash with cool data going down to the disk layer.
Data protection uses the idea of object replicas on separate MicroArray nodes. All MicroArray may be involved in lost object reconstruction. Coho documentation we have seen does not talk about erasure codes and object hashes.
In fact, Coho states: “Old approaches to storage - such as RAID and the erasure coding techniques that are common in object storage systems - involve an opaque statistical assignment that tries to evenly balance data across multiple devices. This approach is fine if you have large numbers of devices and data that is accessed very uniformly. It is less useful if, as in the case of PCIe flash, you are capable of building a very high- performance system with even a relatively small number of devices or if you have data that has severe hot spots on a subset of very popular data.”
We would think object storage startups like Amplidata, Caringo, and others wouldn’t be best pleased to have erasure coding classed as an old approach to storage data protection like RAID.
Price and performance
An 11U Coho system consists of a switch plus five MicroArrays providing 190TB of capacity and 900,000 IOPS, with a list price of $530,000. It’s not cheap. Coho asserts that you’d need to spend $1.5m at list price to get a traditional 190TB, monolithic array. It would need 50U of rackspace and deliver only 250,000 IOPS.
Coho says its system provides:
- 18 times the IOPS/U
- 66 per cent lower $/GB
- Needs a fifth less rack space
It also claims you can scale a MicroArray cluster linearly with no limits in performance or capacity by simply adding more boxes. Each additional box needs fifteen minutes from power-on for it to be up and fully running in the cluster. The cluster auto-discovers the new box, configures itself and rebalances the workload across the expanded cluster in the background.
The networking is intelligent, with Coho saying “Software-defined networking (SDN) features on the DataStream Switch enable key storage system logic including data placement, routing, load balancing and a distributed protocol service to be delivered from the network itself.”
It states that the Bare Metal Object store “enables extensibility to support any application, regardless of protocol, durability and performance requirements.”
Coho says of storage profiles: “These personalities integrate directly with the SDN switch and may be hosted in isolated containers directly on the individual MicroArrays … The hosted NFS implementation in our initial product runs on every single MicroArray, but interacts with the switch to present a single external IP address … the switch provides a private, internal interconnect between personalities and the individual MicroArrays. A reusable library of dispatch logic allows new clients to integrate onto this data-path protocol with direct and configurable support for striping, replication, snapshots, and object range remapping.”
This seems a unique attribute of the Coho system.
Of course, Coho is not exactly being realistic in its comparison of its MicroArrays against monolithic storage and ignoring dual-controller scale out arrays, such as Clustered ONTAP from NetApp. There are also other hybrid array startups such as Nimble Storage, Tegile and Tintri to add to the equation.
Our thinking is that Coho actually does want to take on EMC’s VMAX, HDS’ VSP and IBM’s DS8000 and be a high-end monolithic array replacement. That’s unlikely to happen, as monolithic array customers want a full set of storage performance and management features appropriate to mission-critical storage needs. No-one is going to replace one of these storage powerhouses with a bunch of unproven boxes from a storage startup.
What is more realistic is that customers will be buying MicroArrays as alternatives to NetApp, EMC VNX and HP 3PAR storage, and Dell Compellent and HDS HUS and … well, you get the picture – it’s a crowded field, particularly with the other hybrid array startups in there.
Founding and founders
The startup was founded in 2011 by industry veterans of Veritas and XenSource. Initial A-round funding of $10m came from Andreesen Horowitz. Total funding is now $35m. The new cash will "accelerate Coho Data’s R&D and go-to-market efforts as the company prepares for general availability of Coho DataStream later this year."
There were three co-founders:
- CEO Ramana Jonnala previously with XenSource after a stint at Veritas
- Andrew Warfield, CTO, one of the original authors of the Xen hypervisor and coming from Xen and Citrix
- Keir Fraser, chief architect and principal author of the Xen hypervisor
The three are on the board along with Peter Levine of Andreesen Horowitz and, now, Frank Artale, a Managing Director from Ignition Partners. Levine was previously at XenSource and then a CEO at Citrix following its acquisition of XenSource. Coho points out that “Amazon's EC2 service was based on XenSource's server and storage virtualisation.”
Okay, we get the message, a XenSource background is a good thing.
Download an ESG report on the DataStream 1000 here (pdf). Its conclusion is:
“As Coho Data is just coming out of stealth, it remains to be seen how its technology will prove itself in the field over the long haul, but with good execution, we believe the solution provides a promising and intriguing new approach for enterprises to deliver internal storage services in powerful, scalable, and efficient ways that bring the lessons learned from the public cloud into the private or hybrid cloud model.”
There is also a white paper (pdf). It is a moderately complex and worthwhile read; grab a coffee and read it in a quiet place. ®