Our storage reporter has breaking news about Data Fabrics. Chris?
Thanks team, there's a lot going on here with FlashRay and semi-detached objects
Recent NetApp Data Fabric literature presages a return of FlashRay and an apparently semi-detached integration of its StorageGRID product.
Data Fabric is NetApp's over-arching concept of a virtual fabric with which data in different clouds (stores) can be seamlessly connected across different data management (product) environments across on-premises, private cloud, public clouds, and hybrid private/public clouds into a cohesive, integrated, virtual whole.
NetApp has produced a 30-page Data Fabric white paper1 which describes typical problems associated with hybrid cloud deployments and shows how its Data Fabric idea fixes them.
A crucial part of NetAp's Data Fabric idea is data mobility, and a diagram showing this in a NetApp document presents a hybrid cloud using SnapMirror to interconnect public and private clouds with the private component including FlashRay high-availability pairs.
NetApp Data Fabric diagram showing FlashRay
FlashRay and the Data Fabric
FlashRay is NetApp's in-house development of an all-flash array that accompanies the AFF8000 and EF540 in its all-flash array lineup. It is still in development after having been introduced in a semi-prototype single controller form. The AFF8000 is an all-flash version of NetApp's FAS array hardware running the Clustered DataONTAP (CDOT) operating system. The EF540 is an all-flash E-Series array aimed at HPC and high-perfomance applications not needing CDOT data management services.
How FlashRay with its Mars operating system will be positioned is unclear. Generally, in the industry new-design all-flash arrays are typically positioned as being faster, making better use of flash, and having a lower TCO than all-flash implementations of legacy disk drive arrays.
At present you can't buy FlashRay high-availability pairs from NetApp. We understand they will come, when they come, in a 6U rack enclosure.
We asked Lee Caswell, NetApp VP for Product, Solutions and Services Marketing when are FlashRay high-availability pairs coming from NetApp?
Caswell told us; "I've previously said that (of course!) HA is a requirement for enterprise storage, so you can expect the product will include this when it is complete, consistent with our AFF and EF products. FlashRay is not a released product so I am not commenting on timing." He talked about "the thorough qual steps that a product of this type deserves."
How will FlashRay high-availability pairs be positioned against the AFF8000 and EF56?
Lee said; "FlashRay targets an incremental market for NetApp – namely the application owner who values low cost and simplicity and who is comfortable with flash as a new data silo and who may be considering hyper-converged infrastructure. This contrasts with NetApp's traditional customer – the enterprise infrastructure owner – who values the full range of data management features available in today's AFF product."
A separate Common Transport diagram shows NetApp's various purpose-built storage products, including FlashRay, sharing a data mobility space in the fabric:
Data Fabric Common Transport diagram
NetApp's document states: "The transport layer provides a common data protocol that connects all platforms in the data fabric to seamlessly transfer data in bulk. This transport mechanism enables applications to access data where it is needed most: to a cloud, across storage tiers, or across clusters. Applications are not aware of any data movement."
It is the case that "The Data ONTAP family of products (FAS, Cloud ONTAP, Edge) share a common WAFL file system format. All other platforms shown in the fabric have their own native file system formats."
"Common data transport enables the platforms to interoperate and move data efficiently between them, using SnapMirror replication or SnapVault backup capabilities. Not only does this transport enable interoperability, it also preserves deduplication and compression storage efficiencies, meaning data does not rehydrate when moving from one endpoint to another."
"A data fabric transport provides cross-cluster data movement, where the data can be consumed in a form native to each endpoint. For example, FlashRay serves a given dataset to clients using its Fiber Channel protocol, but when the data is moved to FAS using SnapMirror, the FAS may serve the dataset to clients using iSCSI without transformation."
Today, common data transport is available only for Data ONTAP endpoints. It is being expanded to include E-Series and AltaVault.
In a NetApp Data Fabric layers table the company says it will connect all platforms with a common data transport.
NetApp's Data Fabric layers
However, FlashRay is not mentioned as a potential common data transport platform in the white paper text, and neither is StorageGRID.
Caswell said, "StorageGRID Webscale doesn't need the common data transport, as Automated Data Tiering is an efficient mechanism to bring it (and other S3 targets) more tightly into the data fabric. FlashRay is not listed because it is not a released product. That's a change from our past practice where we included unreleased products."
Data Fabric and StorageGRID
StorageGRID is NetApp's object storage product. It features a multi-site architecture, geo-dispersed erasure coding, a single global namespace, and policy-based object management and movement. It can provide an object store for OpenStack Swift and is massively scalable, supporting billions of objects and tens of petabytes of storage across multiple locations.
The product supports S3 and CDMI access protocols. It has NFS gateway services, and OpenStack Swift support is coming soon.
NetApp's white paper states; "The Data Fabric supports multiple use cases for object repositories, including a target for AltaVault backups and for object store data tiering. For these use cases, either Data ONTAP or AltaVault manages the data stored in the repository."
NetApp's Data Fabric with apparently semi-detached object storage
NetApp Private Storage in public cloud co-location facilities lowers the access latency of public cloud compute to data in the array, compared to accessing the data in an on-premises NetApp array. This is limited to CDOT systems and StorageGRID arays can't directly benefit from this. Neither can FlashRay or E-Series arrays.
The NetApp diagram shows StorageGRID being used as a data tier in an object storage data tiering arrangement. CloudONTAP can similarly tier object data to a public cloud object storage facility.
AltaVault can be used to back up FAS array data to an object store on-premises or in the public cloud.
StorageGRID and aggregates
ONTAP has a FlashPool concept, in which SSDs are used as a storage pool by the array controller. An aggregate construct "consists of a set of SSDs and HDDs. The HDDs are the slower, less costly, high-capacity tier. The SSDs are the faster but more expensive tier. With such a hybrid configuration, Data ONTAP makes sure that the hot data automatically gravitates to the SSD tier, allowing for the highest performance data access."
NetApp's white paper has an Envision The Future section which says: "In the case of FAS, consider an aggregate consisting of a set of SSDs and an object store bucket. Like with Flash Pool aggregates, the hot data gravitates to the SSDs, while the object store operates as the less costly, deeper, slower-capacity tier."
The data movement is not done with SnapMirror or SnapVault, as StorageGRID supports S3 and CDMI protocols, and doesn't support a snapshot facility or a SnapMirror replication capability.
The white paper says; "SnapVault can be used with the common data transport to back up Snapshot copies from any endpoint in the Data Fabric," except it doesn't support StorageGRID as that is not an endpoint in the Data Fabric.
We can understand that snapshotting emerged to provide a data protection facility to files and block storage volumes. Object storage, with its inherent content-addressing, distributed objects and typical erasure coding, was self-protecting and didn't need snapshotting.
But NetApp is not explicitly saying object storage isn't supported by its common data transport scheme, although that seems to be the case.
However, it might, as the white paper lists four platforms: CDOT, StorageGRID, AltaVault, and E-Series, and says, "As the various platforms become data fabric-enabled, common data transport facilitates the movement of data between them and enables data to be served by the protocols native to each system."
Again, FlashRay is absent.
Caswell said a lot about how the Data Fabric and StorageGRID are integrated and interoperate: "The Data Fabric supports a common data format and transport used for vaulting and mirroring between systems. We connect to S3-based systems (StorageGrid as well as other S3 targets) with object store data tiering as shown [in the diagram above.] This approach is chosen to optimize the transport between SAN, NAS, and object systems from NetApp and other vendors."
"StorageGRID offers software-defined object storage, which can be mixed and matched into a single cluster. Depending on how it is used, StorageGRID can be an integrated component of the data fabric, a standalone object fabric, or both."
"StorageGRID [can be] an integrated part of the Data Fabric serving as a target for FAS object store data tiering and AltaVault backups.
"In addition to those use cases, StorageGRID is an object-based fabric in its own right. Cloud applications can store and retrieve objects directly over its S3 or CDMI protocols. StorageGRID is a true multi-site architecture and has advanced object management capabilities, such as geo-dispersed erasure coding, single global namespace, and policy-based object management and movement."
Data Fabric public cloud support
From the diagrams above, it's apparent that NetApp's Data Fabric supports Amazon, Azure, and IBM's SoftLayer public clouds. We're told for example, "A single NPS deployment can serve multiple clouds, including AWS, Azure, and SoftLayer." There can be dedicated links from appropriate co-location facilities between a CDOT array in the co-lo and any one of these clouds.
The Google cloud is not supported by the Data Fabric today.
However AltaVault, NetApp's purpose-built backup-to-the cloud or object store appliance does support Google Cloud as a target.
We're told "The NetApp Data Fabric supports a wide array of virtualized environments and clouds today, and will continue to expand to more clouds." We would envisage Google's cloud as a natural extension for the Data Fabric.
We asked Lee Caswell if NetApp is going to extend its Data Fabric to embrace Google's cloud?
He replied; "We already have. NetApp's Data Fabric is designed to work with all IaaS clouds. Google's is supported by AltaVault today. We announced a partnership with Google at Insight LV and look forward to strengthening support for their cloud services over time. Over time, we plan to connect all the IaaS and SaaS clouds our customers want to use." ®
1. "NetApp Data Fabric Fundamentals: Building a Data Fabric Today" by Joe CaraDonna and Arthur Lent, NetApp, October 2015 | WP-7218.