Vexata's zippy storage array architecture poses vexatious questions
Startup lugs $54m war chest on mission to switch up the industry
Analysis Newcomer Vexata has dropped out of stealth with a $54m* funding war chest and aims to blow the fusty old storage array business to smithereens with its 7 million IOPS Active Data Fabric box. How so?
Its VX-100 box** is a 6U enclosure with two active:active controllers using either up to 64 NVMe SSDs (VX-100F) or Optane SSDs (VX-100M) that interacts with host servers using up 16 x 32Gbit/s Fibre Channel ports. The system alternatively supports 16 x 40GitbE NVMe over Fabric ports.
The effective capacity is 90TB with 2TB SSDs (128TB raw) or 155TB with 3.2TB SSDs (204.8TB raw).
The SSDs or Optane drives are mounted on up to 16 intelligent and hot-swap Enterprise Storage Module (ESM) blades (four per blade) and the storage is RAID 5 or 6 protected. The system scales from 4 to 16 ESMs.
Vexata VX-100 box (Source: ESG)
The Optane-based VX-100M has much lower capacities; 16TB with 375GB Optane drives (24TB raw) or 32TB with 750GB Optane drives (48TB raw).
There is a VX-OS operating system which features thin provisioning, data-at-rest encryption, space-efficient snapshots and clones. It does not have deduplication or replication.
Vexata says its system delivers massive throughput with ultra-low latency targeted toward transaction processing and analytics workloads. It delivers 7 million IOPs (8KB 70 per cent read/30 per cent write) with a 200μs latency (flash) or 40μs (Optane). The bandwidth is 70GB/sec (50GB/sec read; 20GB/sec write).
How did it arrive at the design of this system?
First of all it says traditional all-flash dual-controller arrays have a controller and internal switch-limited architecture. The responsiveness of the SSDs is wasted by an inefficient system design. This diagram shows what it means:
Vexata's view of trad AFA architecture
It says the array needs a much more scalable controller-to-SSD switch scheme and one with parallelised access:
Vexata view of re-architected AFA with lossless Ethernet switching
But this only solves part of the problem. The controllers also need separate control and data paths to release their choke hold on IOs, and the SSDs need more intelligent access:
Vexata AFA scheme with split-path controllers and intelligent storage blades
Here we see the controllers each split into two parts; VX-OS Control and VX-OS Router. VX-OS Control runs the control path for data services with the VX-OS Router implementing hardware assists for storage service tasks such as cut-through I/O interaction and data movement with the ECMs, RAID, encryption, metadata search and system-level garbage collection.
Control plane tasks include high availability failover, thin provisioning, and space-efficient snaps and clones.
Analysts ESG says the routers are FPGA-based and VX-OS Router code is firmware.
The ESMs have a front-end processing facility to run VX-OS Data code for SSD I/O scheduling and metadata management. This helps enable the SSDs to be accessed in parallel by the two IO Controllers.
Therefore the VX-OS software is distributed across three processing engines: Control, Router and Data.
There is a VX-OS management facility for the system with a GUI, a CLI, and a REST API.
If a flash-based storage array does 7 million IOPS with a 200μs response time and it is then implemented using faster-responding Optane SSDs then it should perform faster, meaning more IOPS. Yet Vexata's Optane array does the same 7 million IOPS with a 40μs response time. In a minute that should mean 7 million (200-40)μs saved; that's 1.12bn microseconds saved, time enough for a good few more IOPS.
Storage architect Chris Evans says: "At 200us that's 5,000 IOPS if they were processed serially [and] 25,000 for Optane. Clearly the IO is not serial and being managed in parallel, so queue depth, number of parallel data streams etc, is being manipulated to give high IO throughput.
"Clearly there are other processes within the architecture that are limiting the IO throughput, so even though latency is low, the system can't get past that 7 million IOPS limitation.
"The question is, does it matter? For a raw throughput perspective, flash looks more cost-effective, but from a transactional perspective where an IO needs to be complete before the next piece of work is done, then Optane wins because the application latency is 1/5th of flash.
"It would be better if Vexata quoted some transactional workload benchmarks rather than raw numbers as this would help explain the benefit. I've only had a brief chat with them on the architecture and I think that will be their challenge – getting past raw storage performance numbers to performance figures that represent real-world applications."
The ESG White Paper (PDF) describes VX-100 performance using a synthetic OLTP workload and Oracle RAC running on servers connected by Fibre Channel to the VX-100. It said the VX-100F serviced up to 5.17 million IOPS with an average latency of 414.5μs, an outstanding result for any all-flash array, but the VX-100M is operating on a completely different level, servicing 6 million IOPS with an average response time of just 45 μs.
Here there is an IOPS difference between the NAND and Optane-based VX-100 arrays, but the Optane version's 6 million IOPS is somewhat short of Vexata's stated 7 million number. The system's real-world performance needs more clearly delimiting and explaining.
It would also be interesting to understand how the VX-100F and M perform when connected by NVMe to host servers.
Possibly some form of clustering to provide scale-out capability is in Vexata's development road map.
What we have here is a hyper-hot box in the Apeiron/DSSD/E8/Excelero array product space. ®
*Vexata storage is also available as an appliance, an array with IBM's Spectrum Scale (PDF), or as software running on commodity servers and switches.
** Updated on 12 October to add: This was originally reported as $103m. An industry source told us: "It was confusing because they re-announced their total funding in their recent press release, so people assumed it was a new round of financing on top of what they raised before."