Archiving and the cloud
SNIA works up some best practices
Deep dive Cloud is everywhere. Every day we read news about new cloud applications and new cloud providers. But will it really solve all our problems?
When we need more processing power or software services, we use Software as a Service (SaaS) providers. What if we need more storage space? We use Data Storage as a Service (DaaS) providers. It really seems that today’s IT issues can be solved by turning to the cloud.
It is simple, less expensive than traditional in-house models and it eliminates the challenge of increasing IT infrastructure costs. Specifically for cloud storage, some studies reveal that it could be up to 75 per cent less expensive than keeping the data in internal storage.
The Storage Network Industry Association (SNIA) defines cloud storage (pdf) as “the delivery of virtualised storage on demand”, or “delivery over a network of appropriately configured virtual storage and related data services, based on a request for a given service level.” Cloud storage is a fast-growing business, estimated to reach $10B by 2014 (pdf).
In addition, organisations are facing growing pressure to store information for long periods of time, generally years. Internal company requirements and compliance regulations introduced by the government such as the Sarbanes-Oxley Act and the Health Insurance Portability and Accountability Act (HIPAA), require companies to keep “cold” data available all the time, with special considerations for data retention, auditing and validation.
The problem is that long-term retention and archiving can be expensive. For the year 2014 it is calculated that over 1 billion diagnostic imaging procedures will be performed in the US, generating about 100PB of data1. Current backup applications are not suitable for long-term retention and archiving, since they are designed for fast recovery of data and archived data must be retained for years.
Cloud storage for archiving and long-term preservation
The cost and transfer of IT infrastructure responsibilities to external providers are among the reasons why organisations are moving their archived data to the cloud. Archiving for compliance, e-discovery, application efficiency and email archiving, are additional key drivers for migrating archived organisation data to the cloud.
Consumers are also seeing an increased need for data archiving. With the move to digitisation of information (photos and videos are usually kept in electronic format) the end user’s need for storage is also growing quickly. The calculated storage needed for all digital content and associated metadata in 2015 is estimated at a massive 8,000 Exabytes.
Forecasts for 2014 also indicate that cloud archiving will grow between 28-36 per cent per year, the fastest-growing storage service segment after basic cloud storage services 2.
Initially the solution for all these issues seems to be easy: when we need to archive data we move it to a cloud-based archive. However, if we take a closer look the situation is not that simple. The storage of sensitive data in a public cloud requires a series of considerations from different perspectives. Cost, security, availability and integrity of the data are important aspects organisations need to evaluate before selecting a service provider.
Different providers can meet this criterion in different ways so that companies can face the need of migrating their archived data from one provider to another. Data migration also carries a number of risks for the stored data: the technologies used here can in fact cause corruption due to error rates inherent to the migration. Even data not being migrated can be corrupted via bit-rot, malicious attack or human error.
Long-term preservation and archiving in public clouds also involves the need for a long-term and effective relationship with the provider and this can lead to a number of challenges such as the supplier going out of business, a change of infrastructure and interfaces, cost increase, or the legal restrictions that apply to the geographic storage of data.
Format changes can also represent an obstacle: data stored and archived in a specific format today might not be readable 20 years from now as the technology used to read and interpret the stored bits could disappear.
The need for industry standards
The adoption of an industry standard for cloud storage makes the inter-cloud data migration extremely important. Each provider has its own set of interfaces to store, access, update and delete data and metadata, which means that the end user will have to rewrite the interfaces when migrating between clouds. If customers want to distribute their data across different providers, they need a set of interfaces for each one. A cloud federation approach without a standard interface seems very difficult to follow.
The SNIA’s Cloud Data Management Interface (CDMI) standardises the access to data in the cloud via a functional interface used by applications to create, retrieve, update and delete elements from the cloud. CDMI can be implemented on top of the provider’s own interface, enabling backwards compatibility with existing interfaces and offering a standardised access to the stored data at the same time.
The CDMI specification uses mostly RESTful principles in the interface design, with some exceptions documented in the specification. Additionally, this interface provides a way to set metadata on containers and their contained data. For archiving and preservation purposes, this metadata is fundamental for fast indexing, searching and information retrieval.
To facilitate the development and adoption of cloud storage SNIA’s Cloud Archive & Preservation Special Interest Group (Cloud Archive SIG) has established a list of requirements service providers need to observe to deliver an archive service:
- Multiple primary copies of the data distributed geographically.
--- CDMI_data_redundancy: the desired minimum number of redundant copies the system needs to maintain.
--- CDMI_infrastructure_redundancy: the number of independent storage infrastructures supporting the data. Used in combination with the CDMI_data_redundancy, it is used to state that the primary copies for redundancy shall be stored in separate infrastructures.
- Secondary copies (backup)
--- CDMI_RPO: Used to indicate the desired backup frequency from the primary copies of the data to the secondary copies.
--- CDMI_RTO: Used to indicate the desired maximum acceptable duration to restore the primary copies from secondary copies.
- Data Validation
--- CDMI_value_hash: if present, this metadata indicates the hash algorithms and lengths supported.
- Immutability (Data Retention)
--- CDMI_retention_period: ISO-8601 time interval to specify object retention. When an object is under retention, the object cannot be deleted and its data must remain immutable. Once the retention date expires, the object can be deleted.
How can a cloud user verify that the provider they are considering adheres to these guidelines? And vice-versa, how can a provider communicate to users what is actually being provided? CDMI does this through Capabilities: a type of resource that acts like a service catalogue. When users want to know which level of service the provider offers, they can contact the Capabilities to get a list of Capabilities or functionality delivered.
Through CDMI capabilities, cloud storage providers can specify their level of compliance. In the same way, users can specify the desired level of service.
Cloud archive and backup capabilities
For archiving purposes, users can establish other requirements, such as encryption and quality-of-service in terms of latency and throughput. Due to regulatory laws and confidentiality of the stored data, customers can also have requirements about geographical placement of the data. For example, cloud storage customers in Europe cannot store certain data outside the European Union.
The SNIA’s Cloud Archive SIG is also working to create a description of different profiles for cloud archive and long-term preservation services. This aims to simplify the classification of the services delivered by cloud providers in different profiles like digital cloud archive, digital preservation cloud and backup cloud.
The benefits of industry standards
The adoption of standards by the cloud storage industry will allow vendors and developers to easily integrate with any cloud structure. This integration between heterogeneous systems enables users to migrate their data seamlessly between clouds and cloud storage providers.
A standardised interface for managing the data stored in the cloud will be a differential between vendors in the near future. Forcing the customers to write their own interfaces for each cloud provider will increase the cost and difficulty of cloud archiving adoption and drive customers away from providers with proprietary interfaces.
 IEEE. “A Medical Image Archive Solution in the Cloud”, 2011
 Forrester Research. “Your Enterprise Data Archiving Strategy”, 2011
This article was written by Sebastian Zangaro, Co-chair of SNIA’s Cloud Archive & Preservation Special Interest Group. He works for Hewlett-Packard.
About the SNIA
The Storage Networking Industry Association (SNIA) is a not-for-profit global organisation, made up of some 400 member companies spanning virtually the entire storage industry. SNIA's mission is to lead the storage industry worldwide in developing and promoting standards, technologies, and educational services to empower organisations in the management of information. To this end, the SNIA is uniquely committed to delivering standards, education, and services that will propel open storage networking solutions into the broader market.
About SNIA Europe
SNIA Europe educates the market on the evolution and application of storage infrastructure solutions for the data centre through education, knowledge exchange and industry thought leadership. As a Regional Affiliate of SNIA Worldwide, we represent storage product and solutions manufacturers and the channel community across EMEA.
The mission of the SNIA Cloud Storage Initiative (CSI) Cloud Archive Special Interest Group (Cloud Archive/Preservation SIG) is to advance the use of public, private and hybrid clouds for archival services and long term retention. We are accomplishing these objectives by promoting the adoption of CDMI and associated standards and by participation in initiatives to educate the community about the benefits of cloud-based services for an archival system. The group’s focus includes definitions of best practices and to demonstrate to users, vendors and organisations that the cloud can be a reliable and trusted partner for archival and long-term retention. For more information about the work for this special interest group, please visit: www.snia.org/cloud/archive.
Sponsored: Benefits from the lessons learned in HPC