Object storage: The blob creeping from niche to mainstream
Can you be scalable AND floatable?
Who's in then?
So who else is playing the object storage game? Along with Intel, Hitachi Data Systems (HDS), NetApp, IBM and EMC are all interesting players - with all of them looking at ways to integrate object stores into mainstream applications.
As you might also expect of a company its size, HP has an OpenStack-based object storage proposition, which the firm promotes as a way to store and retrieve objects in a “highly redundant cluster” of publicly accessible physical machines hosted its own HP datacenters. HP plays a hand at accessibility for storage-focused programmers and DevOps professionals who want to remain inside their favourite language environment and not deal with the “guts of a REST API” to achieve their aims. The firm provides a dedicated ‘bindings’ offering for developers to code against HP Cloud Object Storage in this instance.
But should object storage be regarded as some kind of niche storage technology only of interest to massive data environments such as healthcare, media and entertainment and the cloud storage providers themselves? “Object storage is used by over 700 HDS customers worldwide and deployed for many different reasons, for example long-term archives, internet content stores, private cloud repositories, as well as acting as a replacement for traditional backup with its built-in protection services,” says Lynn Collier of HDS.
But how should object storage be accessed - via REST, CDMI, XAM controls or perhaps through a file interface? In the case of the Hitachi Content Platform, Collier explains that the solution is flexible with access via standard file system protocols such as NFS & CIFS as well as REST, WebDAV & SMTP for email to provide open (but secure) access and retrieval to objects. “The adoption of XAM has not yet proved fully successful and the take up from leading ISVs was limited. We are also currently considering CDMI support as an alternative access method,” she says.
Software is eating infrastructure
Is this a new dawn in data storage? Is the failure of filesystems a symptom of specialised storage hardware's inability to scale technically and economically to meet emerging data storage volume? Yes, says Stuart McCaul of Basho, the database specialists. This is because software is eating infrastructure i.e. it is now providing the reliability guarantees of traditional specialised storage hardware on commodity hardware.
“For a time, Distributed File Systems were a reasonable stop-gap for companies struggling to vertically scale their filers. However, businesses also need to keep operations running efficiently while scaling out horizontally. Object storage helps keep operations efficient by simplifying security and eliminating filesystem admin tasks. We believe access to object storage should be customer friendly, which means supporting multiple access methods. Basho's large object storage platform is Riak CS and we've worked to make access to Riak CS compatible with the RESTful S3 API as well as OpenStack Swift,” says McCaul.
And here’s the interesting part in terms of adoption. Basho’s own Riak CS also scales down and is used by companies like DVAG to offer a private file sync-n-share service to internal IT users. So object storage should perhaps be considered “just one part” of a CIO's storage service catalogue i.e. perhaps a third tier online archive, a fourth tier online backup or as a new, distinct web tier for next generation services.
Another significant benefit of object storage is the ability to perform other functions based on object/hash calculations. Exablox’s Derrington says that since files are managed as objects, it's “relatively easy” to perform functions like inline deduplication and encryption. “The benefits of deduplication from a backup/recovery perspective are well understood: now they can apply to primary storage as well,” he says.
Decomposed deduplicated data
“There are many benefits to managing object storage that are particularly attractive to organizations that don’t have the skill set or are looking to increase the ratio of terabytes to admin for their organisation. With object storage there is no notion of RAID, volumes, or LUNs. Every file written is decomposed into a data block and a hash is calculated on the data block so it's treated as an ‘object’. Consequently, erasure coding or replication is used to provide resiliency in the case of device or drive failures,” he adds.
In terms of access then, how do you get over the historical predicament where organisations had to write custom APIs (often proprietary to a storage vendor) for their applications to access object-based storage? Exablox had customer APIs and the public cloud storage providers (e.g. Amazon S3) offered object storage via RESTful APIs - in both cases object storage was out of reach for the vast majority of applications and users. Fortunately, says Derrington, some object storage vendors are providing more common ways to access their storage like CIFS/SMB, NFS, or iSCSI.
As well aligned and positive as much of these object storage developments sound, we have to remember that not all object storage systems are created equal and that the implementation/deployment approaches taken can vary significantly.
Quantum’s Laurent Fanichet reminds us that like every technology play, it’s more than the technology; it’s a total systems approach that makes it a solution. “Turning over all the responsibility for data ‘keys’ to an external application that has not had years of investment carries risk, as some of the users of legacy object storage CAS (content addressable storage) systems have discovered. In some cases, CAS workflow applications lost their keys and suddenly the customer had a CAS system full of data that they could neither read nor delete because the object store itself had no central intelligence of what it was storing. This vulnerability is an area where we’ve made significant investment,” he says.
Fanichet warns that the desire for information quality is driving data ingest ‘granularity’ - increasing the resolution of the data, which generates massive data growth. He says that these ‘high grain’ data sets are being created by more and more users - and they are being kept indefinitely.