No envy for NVMe: Hardened newbie talks to the Reg
Claims servers find its array data in an instant
Interview An array without limits sounds a great idea and the ADS 1000 is built by a startup whose name, Apeiron, is Greek for without limits. Its ADS1000 uses hardened Ethernet to have array network access latency cut to <3 µS for a round trip. That’s fast.
Fast Ethernet with no stack obstruction is Apeiron's tech approach.
We’ve been asking NVMe over fabrics vendors and startups about their fast fabric access, giving them the chance to show off their technology, and so we interviewed Apeiron to find out how it sees the fast array access world.
A crucial question for fast access array storage buyers is this: hardened Ethernet technology or NVMe over fabrics RDMA technology? This interview is Apeiron presenting its case for hardened Ethernet.
El Reg: Will simply moving from SAS/SATA SSDs to NVMe drives bottleneck existing array controllers?
Apeiron: Absolutely. Architectures based upon classic storage controllers will see little/no gain from NVMe. The storage controller is the bottleneck, so providing faster raw storage behind the bottleneck will not do much. Classic storage controller based arrays will show no performance improvement with the next generation of storage class memory (Intel’s 3D XPoint technology) either. The Apeiron architecture provides a platform with less than 1.5 µS of protocol overhead (today’s NAND based SSD solutions are typically 100 µS).
The Apeiron ADS1000 storage architecture eliminates the controller performance bottleneck by leveraging the data management capabilities inherent in today’s scale-out applications. Apeiron storage management (Virtual Volumes, mirrors, etc.) uses a small amount of CPU on each application server. Data storage transfers are hardware-accelerated to minimise CPU loading. This architecture exposes all of the performance available in NVMe SSDs directly to the applications, while scaling with no performance degradation.
El Reg: Must we wait for next-generation controllers with much faster processing?
Apeiron: That would help, although, the classic storage controller architecture is the real problem. CPU performance is still only doubling every 18 months while storage technology performance has increased by orders of magnitude. HDDs achieved latencies in the 100ms range, NVMe flash SSDs offer 100us and storage class memory based SSDs will be in the 10us range. The only way the application can really take advantage of this performance in a networked storage solution is to get the CPU complex out of the data path. That’s the approach Apeiron has taken. We architected the solution to support extremely fast storage devices such as Intel’s 3D XPoint technology.
El Reg: Will we need affordable dual-port NVMe drives so array controllers can provide HA?
Apeiron: No. Today’s scale-out applications provide storage HA through replication. Dual-port drives will limit the performance, and therefore the value of NVMe. Scale-out applications such as Splunk have the inherent capability to manage their own replication and protection. No storage controller is required.
El Reg: Why did Apeiron take this very different design approach?
Apeiron: We recognised several years ago that the world of storage was heading for a huge disruption. Several forces were converging that would demand a new storage architecture. The first was the industry pivot from developing scale-up to scale-out applications. These applications are architected for loosely coupled parallel processing. The server is the basic scale-out hardware element and many of these applications are designed for real-time performance. They manage their own storage.
The second was the dramatic increase in persistent storage performance through flash and storage class memory components running over NVMe.
Lastly, since storage management is handled by the application, a storage controller just adds cost and gets in the way of device performance. We realised that the value proposition for the classic storage array in these markets was dead and that a new, light-weight, very simple, very fast network architecture was required.
El Reg: How does your storage network compare to NVMeF?
Apeiron: These solutions are complementary, but address different market segments. Apeiron is working to address the need to provide externally attached, pooled NVMe storage features to what was traditionally Direct Attached Storage (DAS or scale-out).
NVMeF is addressing the need for data centres to disaggregate storage and access disparate storage silos/tiers. It evolved from need to “include” the data centre storage solutions in the market today, and therefore must cover a larger swath of these legacy solutions. While they will need to have common IT management capabilities they are very different in how they are implemented.
NVMeF is making use of RDMA over various transport protocols. It is designed to work over any transport layer, and defines a rich feature set. It is a more complex protocol which requires more storage processing. This architecture still leads to “storage box” centred solutions. NVMe SSD commands must be rebuilt on the storage side but provides additional storage capabilities. For Apeiron these capabilities are provided by the application, OS and application CPU complex (versus a storage controller).
Apeiron’s products are designed around server or application storage management. We simply tunnel native NVMe commands over a hardened layer 2-40Gbit Ethernet network. Apeiron is moving PCIe TLPs (Transaction Layer Packets), the same NVMe commands a SSD sees if installed internally on the PCIe bus. Each server must track only a small number of connections, enabling the system to scale to 1000’s of drives without performance degradation.
To realise the potential of NVMe, and especially technologies such as Intel’s 3D XPoint technology, you must have an ultra-fast network and lightweight protocol. Apeiron’s total induced latency is <3 µS round trip. Server-class NVMe drives on the horizon will be under 10 µS of latency. Apeiron passes this entire performance gain directly to the application.
These products will co-exist in the market and may overlap in some use cases in the future as capabilities expand. Of course, the most important point is that Apeiron is shipping product today.
El Reg: How does your storage network scale? How big can your storage network get?
Apeiron: The Apeiron storage network is designed for low cost scale-out. Each ADS1000 storage enclosure integrates 32 x 40Gbit Ethernet switch ports for external connections. To add a server you simply connect HBA ports to the ADS1000 switch ports. Adding ADS1000s is just a matter of connecting cables between boxes. The system scales to 100s of servers and Petabytes of storage before any external switches would be required for rack to rack connectivity.
We have a partner who is using these top of rack switches to network thousands of servers to our storage.
El Reg: How do you see the NVMe external storage solutions evolving over the next few years?
Apeiron: Increasingly applications are moving towards real-time analysis of massive datasets using artificial intelligence. To avoid data centre sprawl external NVMe solutions must be able to scale while continuing to increase IOPs and lower latencies.
Apeiron saw this need coming back in 2013 with the growth of NOSQL in-memory databases. While these solutions were very successful, once datasets grew to roughly 10 TB or ~100 servers, direct-attached storage led to excessive hardware and management costs. This is the challenge of scale-out architectures and the reason many are moving to using external NVMe storage solutions.
These applications are now evolving to use multiple types of NVMe storage. In one application, you may need to use server-class memory for metadata, MLC (2 bits/cell) flash for hot working data, and TLC (3bits/cell) or QLC (4bits/cell) for longer term read-only data. The ADS1000 was designed to support all of these NVMe SSD classes in one environment.
Apeiron’s array is easier to adopt than an NVMe over fabrics array, in general, but its differentiation versus the expected flood of NVMe over fabrics arrays will need maintaining so customers see the difference, and the worth of the difference.
If we come across a customer story that shows this, particularly one about the partner who is using these top of rack switches to network thousands of servers to our storage, then we’ll run that. ®