Pure Storage's would-be Data Domain killer out in March – but it's still shy about the internals
Go on, flash storage bandits – spill your guts
Pure Storage will introduce a shiny new backup box - dubbed ObjectEngine - from as early as next month to pinch sales from veterans including EMC-owned sectoral kingpin Data Domain.
The ObjectEngine uses acquired StorReduce variable-length deduplication tech which runs on ObjectEngine//A on-premises hardware and in ObjectEngine//Cloud instances running in AWS. Backup apps from Commvault, Oracle, Veeam and Veritas can use it as a backup target.
This means Pure's pitch is a flash-to-flash-to-cloud (F2F2C) backup scheme – in contrast to EMC-owned Data Domain's D2D (disk-to-disk) scheme and other D2D2T (disk-to-disk-to-tape) models.
Pure's F2F2C scheme would provide restores from local flash, cloud economics and cloud apps could re-use the in-cloud data. FlashBlade with ObjectEngine could act as an on-premises data (distributing) hub as well.
Pure has a 3U OE//A270 on-premises system which has four nodes – the minimum cluster size – in two boxes. It provide 25TB/hour backup and 15TB/hour restore, has 15PB front-end capacity and can send data to the public cloud.
It said: "The 15PB limit comes from the amount of metadata that can be stored in the 4 x OE//A nodes. The actual deduplicated data is stored in a combination of the FlashBlade and Amazon S3. The most recently stored data being rapidly accessible from the FlashBlade and older data accessed from S3."
Pure added: "We don't tier data to cloud. When ObjectEngine is configured to store to cloud, it stores a copy of all the data in the cloud and keeps a local cache of the data that is on-premises."
The company was reticent about the box internals, not revealing the CPUs, the DRAM, any internal storage details, the system's architecture, its connectivity – treating it as a black box – which The Register found most curious.
It did say the system has a massively parallel architecture.
4-node, twin box OE//A270 with single FlashBlade backend box underneath.
We can deduce each 3U OE//A270 box contains two nodes, and, because they are 3U in size, there must be more components inside than a commodity 1U, 2-socket rack server would possess.
StorReduce documentation states:
The StorReduce server runs on its own physical or virtual machine. StorReduce recommends using local SSD storage. Each StorReduce server can handle up to 40 Petabytes of raw data, depending on the deduplication ratio achieved and the amount of SSD storage available for index information.
For lower data volumes, magnetic disk can be used instead of SSD. StorReduce supports the creation of multiple storage buckets, with global deduplication performed across all buckets.
We assess that the OE//A270 boxes contain local flash storage for the dedupe indices.
An OE//A270 system with 2 x FlashBlade boxes has a 1PB usable capacity, 0.5PB/FlashBlade, and takes up a third of a rack. A rack with 4 x OE//A270s and 4 x FlashBlades offers 2PB of usable capacity, 100TB/hour backup and 60TB/hour restore.
These it compares to a Data Domain DD9800 (PDF) -body =>
These it compares to a Data Domain DD9800 (PDF) taking up 4U of space, and with 1PB of usable capacity, 68TB/hour backup with CloudBoost and no public restore performance. Dell, by the way, says a DD9800 occupies less than one full rack when using standard 60 drive DS60 disk packs.
Pure said an OE//A restore of two databases from Amazon would be 1TB/hour. OE//A restoring two databases from FlashBlade is 7TB/hour and 10TB/hour when restoring four databases.
The Mountain View firm has also, at the same time, confirmed the addition of DirectFlash Fabric NVMe over Fabrics access to its FlashArray.
OE//Cloud is cloud-native software that scales to more than 100TB/hour backup and protects more than 100TB of data. It has 11 nines of durability, a single global namespace, the S3 API and is internally replicated.
Pure did not provide restore speed from OE//Cloud numbers, saying it was "highly dependent on the cloud provider's virtual hardware and networking infrastructure". It added: "We'll provide numbers closer to the OE//Cloud general availability."
The California storage firm said its ObjectEngine tech could help with data reuse for test and dev, analytics etc. taking a punt at the Actifio, Cohesity and Delphix copy data management products.
After providing NVMe drives and NVMe access inside its flagship FlashArray kit, Pure is now adding the final stage, with NVMe over Fabrics access to FlashArray.
It suggested its DirectFlash Fabric (RDMA over Converged Ethernet) could provide a 250μs latency – a 50 per cent latency reduction compared to iSCSI, and 20 per cent compared to Fibre Channel.
A server app's overall database query time to FlashArray//X with NVMe drives and shelves could take five seconds with a 500μs latency, which would be cut to 2.5 secs with DirectFlash Fabric added.
In the future, it suggested that could be cut to less than 100μs latency with a one second overall database query time. We imagine storage-class memory is added to the mix, possibly in the accessing server, for this further reduction.
ObjectEngine//A will be generally available in March with CloudDirect coming in May. OE//Cloud will come in the second half of 2019. RDMA over converged Ethernet became generally available in January. Pure will add NVMe over Fibre Channel and TCP in the future. ®
Sponsored: Beyond the Data Frontier