The object of the game: NetApp 'Amazon-izes' StorageGRID
Web-scale object storage with geo-distributed erasure coding
NetApp has announced a new version of its object storage software, StorageGRID Webscale, and extended its hybrid public:private facilities by "Amazon-izing" it with the addition of an interface with AWS's online file storage web service S3.
Geo-distributed erasure coding technology is coming.
NetApp views object storage as a good means of storing massive amounts of unstructured data that does not need FAS ONTAP-level data management services, requiring secure data management at a reduced cost.
It was more than four years ago, in April 2010, that NetApp bought Canadian firm Bycast along with its StorageGrid technology – which provided object-based storage across heterogeneous arrays and geographic boundaries. There were then more than 250 StorageGRID customers, with NetApp saying the product was good for petabyte-scale, globally distributed repositories of images, video and records for enterprises and service providers.
The software runs inside a virtual machine running on a server, obviously, and handles the metadata processing and policy-driven work, writing and reading objects to/from attached storage resources.
It was thought StorageGRID would be integrated into the NetApp storage mothership, the ONTAP arrays, but this as not happened and doesn't look likely to happen. That separation was prompted, we feel, by the March 2011 $480m Engenio acquisition which gained NetApp its E-Series arrays running the SANtricity OS, targeted at video surveillance-type markets and high-performance computing. This means fast and straightforward access to data with applications often providing the data management services.
The E-Series became seen as an array suited to StorageGRID use, but with StorageGRID still available to work with third-party storage arrays.
Version 9.0 of StorageGRID came in August 2012 with an added cloud interface, CDMI, joining the existing NFS, CIFS, and RESTful HTTP API, along with the ability to be twinned with tens of petabytes - 35PB were mentioned - in a single namespace covering billions of files across hundreds of sites. It can send object data to tape.
Now, two years later we have version 10.0, with new branding and an Amazon S3 interface added. NetApp says StorageGRID can hold 100 billion objects in a single flat address space, or container, that can be distributed across data centres around the world.
We can envisage object storage in two tiers: a private on-premises store, and an off-premises store in the public cloud for lower value data needing lower cost storage. This is a single self-healing data store which does not need a separate disaster recovery facility.
The target use-cases are for on-premise and public cloud storage of:
- Data archives storing larger objects with long retention periods, low transaction loads and latency-tolerant access
- Media repositories with streaming data access to globally distributed large object stores and large throughput rates
- Web data-stores with billions of small objects and high transaction rates
NetApp says data placement is decided upon "according to cost, compliance, availability, and performance requirements," and this is policy-driven. There is an intelligent policy engine which "determines the durability and physical placement of data to comply with business requirements". The policies are customisable and "help set data protection and storage tiering to easily adjust to changing cost models".
Policies are defined by resource availability and latency, data retention requirements, geo-location requirements and network cost. NetApp says policies are automatically re-evaluated and objects will be brought into compliance.
Data integrity and availability features include:
- Hash or digital fingerprint created on data ingestion
- Levels of integrity protection including hashes, checksums and authentications
- Data object integrity verification is run on ingest, retrieval, replication migration and at rest
- Suspect objects automatically regenerated
- Fault-tolerant architecture supports non-disruptive operations, upgrades and infrastructure refreshes
- Load balancing automatically distributes workloads during normal operations and failures
- AutoSupport feature can automatically alert NetApp support when problems detected
- Node-level erasure coding improves single-node availability when used with E-Series Dynamic Disk Pools
Further StorageGRID releases are planned, with one early next year looking at geo-distributed erasure coding and cloud tiering for on-premises and off-premises repositories. There will be an early-adopter programme for this, with general availability following later on in 2015.
NetApp is now being more energetic in marketing and developing StorageGRID. It has joined the Object Storage Alliance and is presenting StorageGRID as software-defined storage, since it is not tied to particular hardware.
Riverbed has a supporting quote in NetApp's release: "Our [SteelStore] technology, coupled with NetApp’s new StorageGRID Webscale object-storage software, provides our customers with the tools needed to take advantage of the cloud.”
SteelStore was previously called Whitewater and the technology is in the cloud storage gateway arena.
CommVault is also in the-provide-a-quote area, saying: "StorageGRID Webscale represents the next phase of NetApp’s object-enabled data management strategy. NetApp and CommVault remain committed to providing technology and solutions that are cost effective, highly scalable, and deliver our customers a competitive advantage.”
StorageGRID Webscale is now generally available through NetApp’s distribution and reseller channels. ®