Storage newbie: You need COTS to really rock that NVMe baby
NVMe drives need NVMe fabrics - two sides of the same NVMe coin
NVMe drives need NVMe fabrics to give shared arrays the data access latency killing benefits of NVMe. Unlike Nimble architect Dimitris Krekoukias, storage startup E8 believes putting NVMe SSDs in today’s all-flash arrays will be futile; it claims we need NVMe fabrics to get the NVMe performance boost.
And NVMe over fabrics-attached arrays can be built using today’s controllers.
E8's co-founder and CEO Zivan Ori explained the firm's thinking in this interview.
El Reg: Will simply moving from SAS/SATA SSDs to NVMe drives bottleneck existing array controllers?
Zivan Ori (ZO): It seems 100 per cent certain to us that merely replacing existing SAS/SATA SSDs with NVMe will do little to augment performance of existing AFAs, since even today these AFAs don't saturate the SAS pipes between their controllers and their SSDs. Thus moving to NVMe is almost futile and will probably only be pursued due to competitive pressure/marketing.
El Reg: Must we wait for next-generation controllers with much faster processing?
ZO: Since NVMe drives are almost x10 faster than older generation SSDs, and since Moore's law caps the strength of the controllers at a far slower pace of improvement, it doesn't seem like waiting for next-generation controllers will solve the problem.
One option is to go down the hardware approach like DSSD did and apparently many other vendors are doing.
But E8's approach is to focus on off-the-shelf hardware and a clean software architecture. The idea behind E8 is that not 100 per cent of what an AFA does inside the array must be done in the AFA itself, and since NVMf requires a high-bandwidth low-latency network anyway, there will be no performance hit if those things are done outside the array.
With this approach, E8 has been able to demonstrate 10M (4KB 100% random read) IOPs out of a plain 2U server enclosure, including the E8 features like RAID-6, dynamic provisioning, LUN sharing, etc.
The premise of building dedicated and unique hardware to solve storage needs always finds merit initially (e.g. IBM TMS, Skyera, Violin Memory, FusionIO) and always loses out to the software arrays, and very quickly if you look at the shift from 2013 to 2015 towards XtremIO, Pure Storage and Solidfire, [which are] all using commodity parts.
Our position at E8 has been that while hardware solutions have some initial benefit like [that of] DSSDs, they will quickly lose out to off-the-shelf approaches like ours. Can you really compete with Samsung in designing SSDs, or with Intel and Mellanox in designing network interface cards?
So the short answer: you can wait forever for stronger controllers, or think outside the box and break the boundaries of the traditional controller, which is what E8 is doing.
El Reg: Will we need affordable dual-port NVMe drives so array controllers can provide HA?
ZO: Yes and by now almost all NVMe SSD vendors have such offerings in their near-term roadmaps.
El Reg: What does affordable mean?
ZO: Dual-ported drives should cost the same as single-ported drives. From a pure componentry aspect, they're not more expensive so no reason why they should cost more.
El Reg: Are customers ready to adapt NVMeF array-accessing servers with new HBAs and, for ROCE, DCB switches and dealing with end-to-end congestion management?
ZO: Customers are certainly exploring this and have a high interest in getting the converged Ethernet to work for NVMf arrays. The approach DSSD and Apeiron have chosen requires customers to open their servers to install unique NICs/HBAs and wire them back to back with specialised cabling - this is absolutely untenable.
We would like to see a bigger ecosystem around NVMf-enabled Ethernet NICs for customer servers. All data-centre/enterprise grade Ethernet switches have supported the features required for RoCE for many years due to the earlier requirement of supporting FCoE.
We have never encountered an Ethernet switch at a customer's that doesn't support it. Deploying E8 is as easy as hooking up our 2U appliance with 4 or 8 Ethernet cables to the TOR Ethernet switch, and customers absolutely love that.
El Reg: Do they need routability with ROCE?
ZO: You're hinting at the difference between RoCEv2 (over UDP/IP) vs. RoCEv1 (encapsulated straight onto Ethernet). E8 has implemented RoCEv2 for IP manageability features like IP address authorisation (and other security aspects), subnets, common tools for management and logging, etc, and this seems the preferred network mode of operation at customers (rather than RoCEv1).
El Reg: Could we cache inside the existing array controllers to augment existing RAM buffers and so drive up array performance? With flash DIMMs say? Or XPoint DIMMs in the future?
ZO: E8 alreadys caches inside the E8 controller. No reason why we couldn't use 3DX DIMMs in the future.
El Reg: Does having an NVMe over fabrics connection to an array which is not using NVMe drives make sense?
ZO: It is possible but unlikely. As NVMf is a very new technology, one would expect to have it deployed where it's needed the most - which is with low latency, high bandwidth drives. As such, it would make more sense to have NVMe drives in the backend.
Considering NVMe drives are closing the gap in price towards SATA, and have become ubiquitous, it would seem strange [that] you would hook up a SAS array or JBOD with NVMf. (You will need an entity within that array to translate the NVMf queues to SAS/SATA anyway, which would impede performance.)
El Reg: When will NVMeF arrays filled with NVMe drives and offering enterprise data services be ready? What is necessary for them to be ready?
If existing AFAs add support for NVMe drives in their backend, and add support for NVMf protocol at their front-end, what would happen? Arguably, nothing. That is not their bottleneck.
I would ask this question: when will NVMeF arrays filled with NVMe drives and offering enterprise data services operating at the speeds of NVMe be ready?
ZO: E8 is the first such product to the market. And it is able to overcome the bottlenecks of the existing architecture of AFA because of its radically different architecture. Existing AFAs will offer NVMe and NVMf with little performance improvement, probably no performance improvement at all in IOPs and bandwidth and a slight improvement in latency. ®
Ori said: “It is actually these very questions that led to the inception of E8 two years ago,” and his views on proprietary vs COTS hardware that led to E8's strategy of offloading some array functionality to enable today's controllers to be used. Will E8's relative plug-and-play implementation design outweigh specialised hardware and the massive Dell-EMC marketing machine behind DSSD? Interesting questions. Let's see in 12 months' time.